Larsen, Svend Arild; Mogensen, Line; Dietz, Rune; Baagøe, Hans Jørgen; Andersen, Mogens; Werge, Thomas; Rasmussen, Henrik Berg
2005-12-01
In this study we have identified and characterized dopamine receptor D4 (DRD4) exon III tandem repeats in 33 public available nucleotide sequences from different mammalian species. We found that the tandem repeat in canids could be described in a novel and simple way, namely, as a structure composed of 15- and 12- bp modules. Tandem repeats composed of 18-bp modules were found in sequences from the horse, zebra, onager, and donkey, Asiatic bear, polar bear, common raccoon, dolphin, harbor porpoise, and domestic cat. Several of these sequences have been analyzed previously without a tandem repeat being found. In the domestic cow and gray seal we identified tandem repeats composed of 36-bp modules, each consisting of two closely related 18-bp basic units. A tandem repeat consisting of 9-bp modules was identified in sequences from mink and ferret. In the European otter we detected an 18-bp tandem repeat, while a tandem repeat consisting of 27-bp modules was identified in a sequence from European badger. Both these tandem repeats were composed of 9-bp basic units, which were closely related with the 9-bp repeat modules identified in the mink and ferret. Tandem repeats could not be identified in sequences from rodents. All tandem repeats possessed a high GC content with a strong bias for C. On phylogenetic analysis of the tandem repeats evolutionary related species were clustered into the same groups. The degree of conservation of the tandem repeats varied significantly between species. The deduced amino acid sequences of most of the tandem repeats exhibited a high propensity for disorder. This was also the case with an amino acid sequence of the human DRD4 exon III tandem repeat, which was included in the study for comparative purposes. We identified proline-containing motifs for SH3 and WW domain binding proteins, potential phosphorylation sites, PDZ domain binding motifs, and FHA domain binding motifs in the amino acid sequences of the tandem repeats. The numbers of potential functional sites varied pronouncedly between species. Our observations provide a platform for future studies of the architecture and evolution of the DRD4 exon III tandem repeat, and they suggest that differences in the structure of this tandem repeat contribute to specialization and generation of diversity in receptor function.
Melters, Daniël P; Bradnam, Keith R; Young, Hugh A; Telis, Natalie; May, Michael R; Ruby, J Graham; Sebra, Robert; Peluso, Paul; Eid, John; Rank, David; Garcia, José Fernando; DeRisi, Joseph L; Smith, Timothy; Tobias, Christian; Ross-Ibarra, Jeffrey; Korf, Ian; Chan, Simon W L
2013-01-30
Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.
2013-01-01
Background Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. Results Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. Conclusions While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes. PMID:23363705
TRAP: automated classification, quantification and annotation of tandemly repeated sequences.
Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur
2006-02-01
TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.
TRedD—A database for tandem repeats over the edit distance
Sokol, Dina; Atagun, Firat
2010-01-01
A ‘tandem repeat’ in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats are common in the genomes of both eukaryotic and prokaryotic organisms. They are significant markers for human identity testing, disease diagnosis, sequence homology and population studies. In this article, we describe a new database, TRedD, which contains the tandem repeats found in the human genome. The database is publicly available online, and the software for locating the repeats is also freely available. The definition of tandem repeats used by TRedD is a new and innovative definition based upon the concept of ‘evolutive tandem repeats’. In addition, we have developed a tool, called TandemGraph, to graphically depict the repeats occurring in a sequence. This tool can be coupled with any repeat finding software, and it should greatly facilitate analysis of results. Database URL: http://tandem.sci.brooklyn.cuny.edu/ PMID:20624712
Small tandemly repeated DNA sequences of higher plants likely originate from a tRNA gene ancestor.
Benslimane, A A; Dron, M; Hartmann, C; Rode, A
1986-01-01
Several monomers (177 bp) of a tandemly arranged repetitive nuclear DNA sequence of Brassica oleracea have been cloned and sequenced. They share up to 95% homology between one another and up to 80% with other satellite DNA sequences of Cruciferae, suggesting a common ancestor. Both strands of these monomers show more than 50% homology with many tRNA genes; the best homologies have been obtained with Lys and His yeast mitochondrial tRNA genes (respectively 64% and 60%). These results suggest that small tandemly repeated DNA sequences of plants may have evolved from a tRNA gene ancestor. These tandem repeats have probably arisen via a process involving reverse transcription of polymerase III RNA intermediates, as is the case for interspersed DNA sequences of mammalians. A model is proposed to explain the formation of such small tandemly repeated DNA sequences. Images PMID:3774553
Typing Clostridium difficile strains based on tandem repeat sequences
2009-01-01
Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124
Molecular characterization and distribution of a 145-bp tandem repeat family in the genus Populus.
Rajagopal, J; Das, S; Khurana, D K; Srivastava, P S; Lakshmikumaran, M
1999-10-01
This report aims to describe the identification and molecular characterization of a 145-bp tandem repeat family that accounts for nearly 1.5% of the Populus genome. Three members of this repeat family were cloned and sequenced from Populus deltoides and P. ciliata. The dimers of the repeat were sequenced in order to confirm the head-to-tail organization of the repeat. Hybridization-based analysis using the 145-bp tandem repeat as a probe on genomic DNA gave rise to ladder patterns which were identified to be a result of methylation and (or) sequence heterogeneity. Analysis of the methylation pattern of the repeat family using methylation-sensitive isoschizomers revealed variable methylation of the C residues and lack of methylation of the A residues. Sequence comparisons between the monomers revealed a high degree of sequence divergence that ranged between 6% and 11% in P. deltoides and between 4.2% and 8.3% in P. ciliata. This indicated the presence of sub-families within the 145-bp tandem family of repeats. Divergence was mainly due to the accumulation of point mutations and was concentrated in the central region of the repeat. The 145-bp tandem repeat family did not show significant homology to known tandem repeats from plants. A short stretch of 36 bp was found to show homology of 66.7% to a centromeric repeat from Chironomus plumosus. Dot-blot analysis and Southern hybridization data revealed the presence of the repeat family in 13 of the 14 Populus species examined. The absence of the 145-bp repeat from P. euphratica suggested that this species is relatively distant from other members of the genus, which correlates with taxonomic classifications. The widespread occurrence of the tandem family in the genus indicated that this family may be of ancient origin.
Kapila, R; Das, S; Srivastava, P S; Lakshmikumaran, M
1996-08-01
DNA sequences representing a tandemly repeated DNA family of the Sinapis arvensis genome were cloned and characterized. The 700-bp tandem repeat family is represented by two clones, pSA35 and pSA52, which are 697 and 709 bp in length, respectively. Dot matrix analysis of the sequences indicates the presence of repeated elements within each monomeric unit. Sequence analysis of the repetitive region of clones pSA35 and pSA52 shows that there are several copies of a 7-bp repeat element organized in tandem. The consensus sequence of this repeat element is 5'-TTTAGGG-3'. These elements are highly mutated and the difference in length between the two clones is due to different copy numbers of these elements. The repetitive region of clone pSA35 has 26 copies of the element TTTAGGG, whereas clone pSA52 has 28 copies. The repetitive region in both clones is flanked on either side by inverted repeats that may be footprints of a transposition event. Sequence comparison indicates that the element TTTAGGG is identical to telomeric repeats present in Arabidopsis, maize, tomato, and other plants. However, Bal31 digestion kinetics indicates non-telomeric localization of the 700-bp tandem repeats. The clones represent a novel repeat family as (i) they contain telomere-like motifs as subrepeats within each unit; and (ii) they do not hybridize to related crucifers and are species-specific in nature.
Kuipers, A G J; Kamstra, S A; de Jeu, M J; Visser, R G F
2002-01-01
Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragments with sizes varying from 68-127 bp, and constituted a larger HinfI repeat of approximately 400 bp. Southern hybridization showed a similar molecular organization of the tandem repeats in each of the Brazilian Alstroemeria species tested. None of the repeats hybridized with DNA from Chilean Alstroemeria species, which indicates that they are specific for the Brazilian species. In-situ localization studies revealed the tandem repeats to be localized in clusters on the chromosomes of A. inodora and A. psittacina: distal hybridization sites were found on chromosome arms 2PS, 6PL, 7PS, 7PL and 8PL, interstitial sites on chromosome arms 2PL, 3PL, 4PL and 5PL. The applicability of the tandem repeats for cytogenetic analysis of interspecific hybrids and their role in heterochromatin organization are discussed.
[Polymorphic loci and polymorphism analysis of short tandem repeats within XNP gene].
Liu, Qi-Ji; Gong, Yao-Qin; Guo, Chen-Hong; Chen, Bing-Xi; Li, Jiang-Xia; Guo, Yi-Shou
2002-01-01
To select polymorphic short tandem repeat markers within X-linked nuclear protein (XNP) gene, genomic clones which contain XNP gene were recognized by homologous analysis with XNP cDNA. By comparing the cDNA with genomic DNA, non-exonic sequences were identified, and short tandem repeats were selected from non-exonic sequences by using BCM search Launcher. Polymorphisms of the short tandem repeats in Chinese population were evaluated by PCR amplification and PAGE. Five short tandem repeats were identified from XNP gene, two of which were polymorphic. Four and 11 alleles were observed in Chinese population for XNPSTR1 and XNPSTR4, respectively. Heterozygosities were 47% for XNPSTR1 and 70% for XNPSTR4. XNPSTR1 and XNPSTR4 localized within 3' end and intron 10, respectively. Two polymorphic short tandem repeats have been identified within XNP gene and will be useful for linkage analysis and gene diagnosis of XNP gene.
Tandemly repeated sequences in mtDNA control region of whitefish, Coregonus lavaretus.
Brzuzan, P
2000-06-01
Length variation of the mitochondrial DNA control region was observed with PCR amplification of a sample of 138 whitefish (Coregonus lavaretus). Nucleotide sequences of representative PCR products showed that the variation was due to the presence of an approximately 100-bp motif tandemly repeated two, three, or five times in the region between the conserved sequence block-3 (CSB-3) and the gene for phenylalanine tRNA. This is the first report on the tandem array composed of long repeat units in mitochondrial DNA of salmonids.
Short Tandem Repeat DNA Internet Database
National Institute of Standards and Technology Data Gateway
SRD 130 Short Tandem Repeat DNA Internet Database (Web, free access) Short Tandem Repeat DNA Internet Database is intended to benefit research and application of short tandem repeat DNA markers for human identity testing. Facts and sequence information on each STR system, population data, commonly used multiplex STR systems, PCR primers and conditions, and a review of various technologies for analysis of STR alleles have been included.
Albornos, Lucía; Martín, Ignacio; Iglesias, Rebeca; Jiménez, Teresa; Labrador, Emilia; Dopico, Berta
2012-11-07
Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.
2012-01-01
Background Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. Results ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. Conclusions We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found. PMID:23134664
Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm
Glunčić, Matko; Paar, Vladimir
2013-01-01
The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes). PMID:22977183
USDA-ARS?s Scientific Manuscript database
Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres comprise of megabase-scale arrays of tandem repeats. The true prevalence of centromere tandem repeats, and whether they exhibit conserved seque...
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.
Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats
Krwawicz, Joanna
2017-01-01
TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832
Sunflower centromeres consist of a centromere-specific LINE and a chromosome-specific tandem repeat.
Nagaki, Kiyotaka; Tanaka, Keisuke; Yamaji, Naoki; Kobayashi, Hisato; Murata, Minoru
2015-01-01
The kinetochore is a protein complex including kinetochore-specific proteins that plays a role in chromatid segregation during mitosis and meiosis. The complex associates with centromeric DNA sequences that are usually species-specific. In plant species, tandem repeats including satellite DNA sequences and retrotransposons have been reported as centromeric DNA sequences. In this study on sunflowers, a cDNA-encoding centromere-specific histone H3 (CENH3) was isolated from a cDNA pool from a seedling, and an antibody was raised against a peptide synthesized from the deduced cDNA. The antibody specifically recognized the sunflower CENH3 (HaCENH3) and showed centromeric signals by immunostaining and immunohistochemical staining analysis. The antibody was also applied in chromatin immunoprecipitation (ChIP)-Seq to isolate centromeric DNA sequences and two different types of repetitive DNA sequences were identified. One was a long interspersed nuclear element (LINE)-like sequence, which showed centromere-specific signals on almost all chromosomes in sunflowers. This is the first report of a centromeric LINE sequence, suggesting possible centromere targeting ability. Another type of identified repetitive DNA was a tandem repeat sequence with a 187-bp unit that was found only on a pair of chromosomes. The HaCENH3 content of the tandem repeats was estimated to be much higher than that of the LINE, which implies centromere evolution from LINE-based centromeres to more stable tandem-repeat-based centromeres. In addition, the epigenetic status of the sunflower centromeres was investigated by immunohistochemical staining and ChIP, and it was found that centromeres were heterochromatic.
Molecular basis of length polymorphism in the human zeta-globin gene complex.
Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J
1983-01-01
The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667
Jiang, W; Gupta, D; Gallagher, D; Davis, S; Bhavanandan, V P
2000-04-01
We previously elucidated five distinct protein domains (I-V) for bovine submaxillary mucin, which is encoded by two genes, BSM1 and BSM2. Using Southern blot analysis, genomic cloning and sequencing of the BSM1 gene, we now show that the central domain (V) consists of approximately 55 tandem repeats of 329 amino acids and that domains III-V are encoded by a 58.4-kb exon, the largest exon known for all genes to date. The BSM1 gene was mapped by fluorescence in situ hybridization to the proximal half of chromosome 5 at bands q2. 2-q2.3. The amino-acid sequence of six tandem repeats (two full and four partial) were found to have only 92-94% identities. We propose that the variability in the amino-acid sequences of the mucin tandem repeat is important for generating the combinatorial library of saccharides that are necessary for the protective function of mucins. The deduced peptide sequences of the central domain match those determined from the purified bovine submaxillary mucin and also show 68-94% identity to published peptide sequences of ovine submaxillary mucin. This indicates that the core protein of ovine submaxillary mucin is closely related to that of bovine submaxillary mucin and contains similar tandem repeats in the central domain. In contrast, the central domain of porcine submaxillary mucin is reported to consist of 81-amino-acid tandem repeats. However, both bovine submaxillary mucin and porcine submaxillary mucin contain similar N-terminal and C-terminal domains and the corresponding genes are in the conserved linkage regions of the respective genomes.
Two tandemly repeated telomere-associated sequences in Nicotiana plumbaginifolia.
Chen, C M; Wang, C T; Wang, C J; Ho, C H; Kao, Y Y; Chen, C C
1997-12-01
Two tandemly repeated telomere-associated sequences, NP3R and NP4R, have been isolated from Nicotiana plumbaginifolia. The length of a repeating unit for NP3R and NP4R is 165 and 180 nucleotides respectively. The abundance of NP3R, NP4R and telomeric repeats is, respectively, 8.4 x 10(4), 6 x 10(3) and 1.5 x 10(6) copies per haploid genome of N. plumbaginifolia. Fluorescence in situ hybridization revealed that NP3R is located at the ends and/or in interstitial regions of all 10 chromosomes and NP4R on the terminal regions of three chromosomes in the haploid genome of N. plumbaginifolia. Sequence homology search revealed that not only are NP3R and NP4R homologous to HRS60 and GRS, respectively, two tandem repeats isolated from N. tabacum, but that NP3R and NP4R are also related to each other, suggesting that they originated from a common ancestral sequence. The role of these repeated sequences in chromosome healing is discussed based on the observation that two to three copies of a telomere-similar sequence were present in each repeating unit of NP3R and NP4R.
A novel tandem repeat sequence located on human chromosome 4p: isolation and characterization.
Kogi, M; Fukushige, S; Lefevre, C; Hadano, S; Ikeda, J E
1997-06-01
In an effort to analyze the genomic region of the distal half of human chromosome 4p, to where Huntington disease and other diseases have been mapped, we have isolated the cosmid clone (CRS447) that was likely to contain a region with specific repeat sequences. Clone CRS447 was subjected to detailed analysis, including chromosome mapping, restriction mapping, and DNA sequencing. Chromosome mapping by both a human-CHO hybrid cell panel and FISH revealed that CRS447 was predominantly located in the 4p15.1-15.3 region. CRS447 was shown to consist of tandem repeats of 4.7-kb units present on chromosome 4p. A single EcoRI unit was subcloned (pRS447), and the complete sequence was determined as 4752 nucleotides. When pRS447 was used as a probe, the number of copies of this repeat per haploid genome was estimated to be 50-70. Sequence analysis revealed that it contained two internal CA repeats and one putative ORF. Database search established that this sequence was unreported. However, two homologous STS markers were found in the database. We concluded that CRS447/pRS447 is a novel tandem repeat sequence that is mainly specific to human chromosome 4p.
A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment
Freschi, Valerio; Bogliolo, Alessandro
2012-01-01
In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment. PMID:22518086
Genetic diversity at variable-number-tandem-repeat (VNTR) loci was examined in the common cattail, Typha latifolia (Typhaceae), using three synthetic DNA probes composed of tandemly repeated "core" sequences (GACA, GATA, and GCAC). The principal objectives of this investigation w...
Rational design of alpha-helical tandem repeat proteins with closed architectures
Doyle, Lindsey; Hallinan, Jazmine; Bolduc, Jill; Parmeggiani, Fabio; Baker, David; Stoddard, Barry L.; Bradley, Philip
2015-01-01
Tandem repeat proteins, which are formed by repetition of modular units of protein sequence and structure, play important biological roles as macromolecular binding and scaffolding domains, enzymes, and building blocks for the assembly of fibrous materials1,2. The modular nature of repeat proteins enables the rapid construction and diversification of extended binding surfaces by duplication and recombination of simple building blocks3,4. The overall architecture of tandem repeat protein structures – which is dictated by the internal geometry and local packing of the repeat building blocks – is highly diverse, ranging from extended, super-helical folds that bind peptide, DNA, and RNA partners5–9, to closed and compact conformations with internal cavities suitable for small molecule binding and catalysis10. Here we report the development and validation of computational methods for de novo design of tandem repeat protein architectures driven purely by geometric criteria defining the inter-repeat geometry, without reference to the sequences and structures of existing repeat protein families. We have applied these methods to design a series of closed alpha-solenoid11 repeat structures (alpha-toroids) in which the inter-repeat packing geometry is constrained so as to juxtapose the N- and C-termini; several of these designed structures have been validated by X-ray crystallography. Unlike previous approaches to tandem repeat protein engineering12–20, our design procedure does not rely on template sequence or structural information taken from natural repeat proteins and hence can produce structures unlike those seen in nature. As an example, we have successfully designed and validated closed alpha-solenoid repeats with a left-handed helical architecture that – to our knowledge – is not yet present in the protein structure database21. PMID:26675735
Liu, Qian; Xu, Xue-Nian; Zhou, Yan; Cheng, Na; Dong, Yu-Ting; Zheng, Hua-Jun; Zhu, Yong-Qiang; Zhu, Yong-Qiang
2013-08-01
To find and clone new antigen genes from the lambda-ZAP cDNA expression library of adult Clonorchis sinensis, and determine the immunological characteristics of the recombinant proteins. The cDNA expression library of adult C. sinensis was screened by pooled sera of clonorchiasis patients. The sequences of the positive phage clones were compared with the sequences in EST database, and the full-length sequence of the gene (Cs22 gene) was obtained by RT-PCR. cDNA fragments containing 2 and 3 times tandem repeat sequences were generated by jumping PCR. The sequence encoding the mature peptide or the tandem repeat sequence was respectively cloned into the prokaryotic expression vector pET28a (+), and then transformed into E. coli Rosetta DE3 cells for expression. The recombinant proteins (rCs22-2r, rCs22-3r, rCs22M-2r, and rCs22M-3r) were purified by His-bind-resin (Ni-NTA) affinity chromatography. The immunogenicity of rCs22-2r and rCs22-3r was identified by ELISA. To evaluate the immunological diagnostic value of rCs22-2r and rCs22-3r, serum samples from 35 clonorchiasis patients, 31 healthy individuals, 15 schistosomiasis patients, 15 paragonimiasis westermani patients and 13 cysticercosis patients were examined by ELISA. To locate antigenic determinants, the pooled sera of clonorchiasis patients and healthy persons were analyzed for specific antibodies by ELISA with recombinant protein rCs22M-2r and rCs22M-3r containing the tandem repeat sequences. The full-length sequence of Cs22 antigen gene of C. sinensis was obtained. It contained 13 times tandem repeat sequences of EQQDGDEEGMGGDGGRGKEKGKVEGEDGAGEQKEQA. Bioinformatics analysis indicated that the protein (Cs22) belonged to GPI-anchored proteins family. The recombinant proteins rCs22-2r and rCs22-3r showed a certain level of immunogenicity. The positive rate by ELISA coated with the purified PrCs22-2r and PrCs22-3r for sera of clonorchiasis patients both were 45.7% (16/35), and 3.2% (1/31) for those of healthy persons. There was no cross reaction with sera of schistosomiasis and cysticercosis patients. The cross reaction with sera of paragonimiasis westermani patients was 1/15. The recombinant proteins rCs22M-2r and rCs22M-3r which only contained tandem repeats were specifically recognized by pooled sera of clonorchiasis patients. The Cs22 antigen gene of Clonorchis sinensis is obtained, and the recombinant proteins have certain diagnostic value. The antigenic determinant is located in tandem repeat sequences.
[Analysis on genetic polymorphism of 5 STR loci selected from X chromosome].
Liu, Qi-ji; Gong, Yao-qin; Zhang, Xi-yu; Gao, Gui-min; Li, Jiang-xia; Guo, Yi-shou
2005-02-01
To select short tandem repeats(STR) from X chromosome. STR is a universal genetic marker that has changeable polymorphism and stable heredity in human genome. It is a specific DNA segment composed of 2-6 base pairs as its core sequence. It is an ideal DNA marker used in linkage analysis and gene mapping. In this study, 8 short tandem repeats were selected from two genomic clones on X chromosome by using BCM Search Launcher. Primers amplifying the STR loci were designed by using Primer 3.0 according to the unique sequence flanking the STRs. Polymorphisms of the short tandem repeats in Chinese population were evaluated by PCR amplification and PAGE. Five of these STRs were polymorphic. Chi-square test indicated that the distribution of genotypes agreed with Hardy-Weinberg equilibrium (P>0.05). Five polymorphic short tandem repeats have been identified on chromosome X and will be useful for linkage analysis and gene mapping.
Witonski, D. ; Stefanova, R.; Ranganathan, A.; Schutze, G. E.; Eisenach, K. D.; Cave, M. D.
2006-01-01
The genome of Salmonella enterica subsp. enterica serovar Typhimurium strain LT2 was analyzed for direct repeats, and 54 sequences containing variable-number tandem repeat loci were identified. Ten primer pairs that anneal upstream and downstream of each selected locus were designed and used to amplify PCR targets in isolates of S. enterica serovars Typhimurium and Newport. Four of the 10 loci did not show polymorphism in the length of products. Six loci were selected for analysis. Isolates of S. enterica serovars Typhimurium and Newport that were related to specific outbreaks and showed identical pulsed-field gel electrophoresis patterns were indistinguishable by the length of the six variable-number tandem repeats. Isolates that differed in their pulsed-field gel electrophoresis patterns showed polymorphism in variable-number tandem repeat profiles. Length of the products was confirmed by DNA sequence analysis. Only 2 of the 10 loci contained exact integers of the direct repeat. Eight loci contained partial copies. The partial copies were maintained at the ends of the variable-number tandem repeat loci in all isolates. In spite of having partial copies that were maintained in all isolates, the number of direct repeats at a locus was polymorphic. Six variable-number tandem repeat loci were useful in distinguishing isolates of S. enterica serovars Typhimurium and Newport that had different pulsed-field gel electrophoresis patterns and in identifying outbreak-associated cases that shared a common pulsed-field gel pattern. PMID:16943354
Pavelitz, T; Rusché, L; Matera, A G; Scharf, J M; Weiner, A M
1995-01-01
In primates, the tandemly repeated genes encoding U2 small nuclear RNA evolve concertedly, i.e. the sequence of the U2 repeat unit is essentially homogeneous within each species but differs somewhat between species. Using chromosome painting and the NGFR gene as an outside marker, we show that the U2 tandem array (RNU2) has remained at the same chromosomal locus (equivalent to human 17q21) through multiple speciation events over > 35 million years leading to the Old World monkey and hominoid lineages. The data suggest that the U2 tandem repeat, once established in the primate lineage, contained sequence elements favoring perpetuation and concerted evolution of the array in situ, despite a pericentric inversion in chimpanzee, a reciprocal translocation in gorilla and a paracentric inversion in orang utan. Comparison of the 11 kb U2 repeat unit found in baboon and other Old World monkeys with the 6 kb U2 repeat unit in humans and other hominids revealed that an ancestral U2 repeat unit was expanded by insertion of a 5 kb retrovirus bearing 1 kb long terminal repeats (LTRs). Subsequent excision of the provirus by homologous recombination between the LTRs generated a 6 kb U2 repeat unit containing a solo LTR. Remarkably, both junctions between the human U2 tandem array and flanking chromosomal DNA at 17q21 fall within the solo LTR sequence, suggesting a role for the LTR in the origin or maintenance of the primate U2 array. Images PMID:7828589
Identification of presumed ancestral DNA sequences of phaseolin in Phaseolus vulgaris.
Kami, J; Velásquez, V B; Debouck, D G; Gepts, P
1995-01-01
Common bean (Phaseolus vulgaris) consists of two major geographic gene pools, one distributed in Mexico, Central America, and Colombia and the other in the southern Andes (southern Peru, Bolivia, and Argentina). Amplification and sequencing of members of the multigene family coding for phaseolin, the major seed storage protein of the common bean, provide evidence for accumulation of tandem direct repeats in both introns and exons during evolution of the multigene family in this species. The presumed ancestral phaseolin sequences, without tandem repeats, were found in recently discovered but nearly extinct wild common bean populations of Ecuador and northern Peru that are intermediate between the two major gene pools of the species based on geographical and molecular arguments. Our results illustrate the usefulness of tandem direct repeats in establishing the polarity of DNA sequence divergence and therefore in proposing phylogenies. Images Fig. 1 Fig. 3 PMID:7862642
Ba, Hengxing; Wu, Lang; Liu, Zongyue; Li, Chunyi
2016-01-01
Tandem repeat units are only detected in the left domain of the mitochondrial DNA control region in sika deer. Previous studies showed that Japanese sika deer have more tandem repeat units than its cousins from the Asian continent and Taiwan, which often have only three repeat units. To determine the origin and evolution of these additional repeat units in Japanese sika deer, we obtained the sequence of repeat units from an expanded dataset of the control region from all sika deer lineages. The functional constraint is inferred to act on the first repeat unit because this repeat has the least sequence divergence in comparison to the other units. Based on slipped-strand mispairing mechanisms, the illegitimate elongation model could account for the addition or deletion of these additional repeat units in the Japanese sika deer population. We also report that these additional repeat units could be occurring in the internal positions of tandem repeat regions, possibly via coupling with a homogenization mechanism within and among these lineages. Moreover, the increased number of repeat units in the Japanese sika deer population could reflect a balance between mutation and selection, as well as genetic drift.
Kim, Min Jee; Im, Hyun Hwak; Lee, Kwang Youll; Han, Yeon Soo; Kim, Iksoo
2014-06-01
Abstract The complete nucleotide sequences of the mitochondrial genome from the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae), was determined. The 20,319-bp long circular genome is the longest among completely sequenced Coleoptera. As is typical in animals, the P. brevitarsis genome consisted of two ribosomal RNAs, 22 transfer RNAs, 13 protein-coding genes and one A + T-rich region. Although the size of the coding genes was typical, the non-coding A + T-rich region was 5654 bp, which is the longest in insects. The extraordinary length of this region was composed of 28,117-bp tandem repeats and 782-bp tandem repeats. These repeat sequences were encompassed by three non-repeat sequences constituting 1804 bp.
Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M
2016-10-19
The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang Yankai; Yan Rong; He Yi
2006-07-14
The {beta}-subunit of human chorionic gonadotropin ({beta}-hCG) is secreted by many kinds of tumors and it has been used as an ideal target antigen to develop vaccines against tumors. In view of the low immunogenicity of this self-peptide,we designed a method based on isocaudamer technique to repeat tandemly the 10-residue sequence X of {beta}-hCG (109-118), then 10 tandemly repeated copies of the 10-residue sequence combined with {beta}-hCG C-terminal 37 peptides were fused to mycobacterial heat-shock protein 65 to construct a fusion protein HSP65-X10-{beta}hCGCTP37 as an immunogen. In this study, we examined the effect of the tandem repeats of this 10-residuemore » sequence in eliciting an immune by comparing the immunogenicity and anti-tumor effects of the two immunogens, HSP65-X10-{beta}hCGCTP37 and HSP65-{beta}hCGCTP37 (without the 10 tandem repeats). Immunization of mice with the fusion protein HSP65-X10-{beta}hCGCTP37 elicited much higher levels of specific anti-{beta}-hCG antibodies and more effectively inhibited the growth of Lewis lung carcinoma (LLC) in vivo than with HSP65-{beta}hCGCTP37, which should suggest that HSP65-X10-{beta}hCGCTP37 may be an effective protein vaccine for the treatment of {beta}-hCG-dependent tumors and multiple tandem repeats of a certain epitope are an efficient method to overcome the low immunogenicity of self-peptide antigens.« less
Tek, Ahmet L; Kashihara, Kazunari; Murata, Minoru; Nagaki, Kiyotaka
2011-11-01
The centromere plays an essential role for proper chromosome segregation during cell division and usually harbors long arrays of tandem repeated satellite DNA sequences. Although this function is conserved among eukaryotes, the sequences of centromeric DNA repeats are variable. Most of our understanding of functional centromeres, which are defined by localization of a centromere-specific histone H3 (CENH3) protein, comes from model organisms. The components of the functional centromere in legumes are poorly known. The genus Astragalus is a member of the legumes and bears the largest numbers of species among angiosperms. Therefore, we studied the components of centromeres in Astragalus sinicus. We identified the CenH3 homolog of A. sinicus, AsCenH3 that is the most compact in size among higher eukaryotes. A CENH3-based assay revealed the functional centromeric DNA sequences from A. sinicus, called CentAs. The CentAs repeat is localized in A. sinicus centromeres, and comprises an AT-rich tandem repeat with a monomer size of 20 nucleotides.
Hou, Wan-ru; Chen, Yu; Wu, Xia; Hu, Jin-chu; Peng, Zheng-song; Yang, Jung; Tang, Zong-xiang; Zhou, Cai-Quan; Li, Yu-ming; Yang, Shi-kui; Du, Yu-jie; Kong, Ling-lu; Ren, Zheng-long; Zhang, Huai-yu; Shuai, Su-rong
2007-01-01
We obtained the complete mitochondrial genome of U.thibetanus mupinensis by DNA sequencing based on the PCR fragments of 18 primers we designed. The results indicate that the mtDNA is 16 868 bp in size, encodes 13 protein genes, 22 tRNA genes, and 2 rRNA genes, with an overall H-strand base composition of 31.2% A, 25.4% C, 15.5% G and 27.9% T. The sequence of the control region (CR) located between tRNA-Pro and tRNA-Phe is 1422 bp in size, consists of 8.43% of the whole genome, GC content is 51.9% and has a 6bp tandem repeat and two 10bp tandem repeats identified by using the Tandem Repeats Finder. U. thibetanus mupinensis mitochondrial genome shares high similarity with those of three other Ursidae: U. americanus (91.46%), U. arctos (89.25%) and U. maritimus (87.66%). PMID:17205108
Unrelated sequences at the 5' end of mouse LINE-1 repeated elements define two distinct subfamilies.
Wincker, P; Jubier-Maurin, V; Roizès, G
1987-01-01
Some full length members of the mouse long interspersed repeated DNA family L1Md have been shown to be associated at their 5' end with a variable number of tandem repetitions, the A repeats, that have been suggested to be transcription controlling elements. We report that the other type of repeat, named F, found at the 5' end of a few L1 elements is also an integral part of full length L1 copies. Sequencing shows that the F repeats are GC rich, and organized in tandem. The L1 copies associated with either A or F repeats can be correlated with two different subsets of L1 sequences distinguished by a series of variant nucleotides specific to each and by unassociated but frequent restriction sites. These findings suggest that sequence replacement has occurred at least once in 5' of L1Md, and is related to the generation of specific subfamilies. Images PMID:3684566
Plant chromosomes from end to end: telomeres, heterochromatin and centromeres.
Lamb, Jonathan C; Yu, Weichang; Han, Fangpu; Birchler, James A
2007-04-01
Recent evidence indicates that heterochromatin in plants is composed of heterogeneous sequences, which are usually composed of transposable elements or tandem repeat arrays. These arrays are associated with chromatin modifications that produce a closed configuration that limits transcription. Centromere sequences in plants are usually composed of tandem repeat arrays that are homogenized across the genome. Analysis of such arrays in closely related taxa suggests a rapid turnover of the repeat unit that is typical of a particular species. In addition, two lines of evidence for an epigenetic component of centromere specification have been reported, namely an example of a neocentromere formed over sequences without the typical repeat array and examples of centromere inactivation. Although the telomere repeat unit is quite prevalent in the plant kingdom, unusual repeats have been found in some families. Recently, it was demonstrated that the introduction of telomere sequences into plants cells causes truncation of the chromosomes, and that this technique can be used to produce artificial chromosome platforms.
Variable Number Of Tandem Repeats (VNTR) and its application in bacterial epidemiology.
Ramazanzadeh, Rashid; McNerney, Ruth
2007-08-15
Molecular epidemiology is the using of molecular techniques to study bacterial distribution in human populations. Recently molecular epidemiologist benefit from several techniques such as Variable Number Tandem Repeat (VNTR) typing method to typing bacterial strains. Variable Number Tandem Repeat (VNTR) typing is a tool for genotyping and provides data in a simple and numeric format based on the number of repetitive sequences. VNTR for first time identified in M. tuberculosis as Mycobacterial Interspersed Repeat Units (MIRUs). General terms of VNTR have now been reported in Bacillus anthracis, Legionella pneumophila, Pseudomonas aeruginosa, Salmonella enterica and Escherichia coli O157.
Quinn, J S; Guglich, E; Seutin, G; Lau, R; Marsolais, J; Parna, L; Boag, P T; White, B N
1992-02-01
The first tandemly repeated sequence examined in a passerine bird, a 431-bp PstI fragment named pMAT1, has been cloned from the genome of the brown-headed cowbird (Molothrus ater). The sequence represents about 5-10% of the genome (about 4 x 10(5) copies) and yields prominent ethidium bromide stained bands when genomic DNA cut with a variety of restriction enzymes is electrophoresed in agarose gels. A particularly striking ladder of fragments is apparent when the DNA is cut with HinfI, indicative of a tandem arrangement of the monomer. The cloned PstI monomer has been sequenced, revealing no internal repeated structure. There are sequences that hybridize with pMAT1 found in related nine-primaried oscines but not in more distantly related oscines, suboscines, or nonpasserine species. Little sequence similarity to tandemly repeated PstI cut sequences from the merlin (Falco columbarius), saurus crane (Grus antigone), or Puerto Rican parrot (Amazona vittata) or to HinfI digested sequence from the Toulouse goose (Anser anser) was detected. The isolated sequence was used as a probe to examine DNA samples of eight members of the tribe Icterini. This examination revealed phylogenetically informative characters. The repeat contains cutting sites from a number of restriction enzymes, which, if sufficiently polymorphic, would provide new phylogenetic characters. Sequences like these, conserved within a species, but variable between closely related species, may be very useful for phylogenetic studies of closely related taxa.
Stabilization of perfect and imperfect tandem repeats by single-strand DNA exonucleases
Feschenko, Vladimir V.; Rajman, Luis A.; Lovett, Susan T.
2003-01-01
Rearrangements between tandemly repeated DNA sequences are a common source of genetic instability. Such rearrangements underlie several human genetic diseases. In many organisms, the mismatch-repair (MMR) system functions to stabilize repeats when the repeat unit is short or when sequence imperfections are present between the repeats. We show here that the action of single-stranded DNA (ssDNA) exonucleases plays an additional, important role in stabilizing tandem repeats, independent of their role in MMR. For perfect repeats of ≈100 bp in Escherichia coli that are not susceptible to MMR, exonuclease (Exo)-I, ExoX, and RecJ exonuclease redundantly inhibit deletion. Our data suggest that >90% of potential deletion events are avoided by the combined action of these three exonucleases. Imperfect tandem repeats, less prone to rearrangements, are stabilized by both the MMR-pathway and ssDNA-specific exonucleases. For 100-bp repeats containing four mispairs, ExoI alone aborts most deletion events, even in the presence of a functional MMR system. By genetic analysis, we show that the inhibitory effect of ssDNA exonucleases on deletion formation is independent of the MutS and UvrD proteins. Exonuclease degradation of DNA displaced during the deletion process may abort slipped misalignment. Exonuclease action is therefore a significant force in genetic stabilization of many forms of repetitive DNA. PMID:12538867
Pan, W J; Blackburn, E H
1995-01-01
The rRNA genes in the somatic macronucleus of Tetrahymena thermophila are normally on 21 kb linear palindromic molecules (rDNA). We examined the effect on rRNA gene dosage of transforming T.thermophila macronuclei with plasmid constructs containing a pair of tandemly repeated rDNA replication origin regions unlinked to the rRNA gene. A significant proportion of the plasmid sequences were maintained as high copy circular molecules, eventually consisting solely of tandem arrays of origin regions. As reported previously for cells transformed by a construct in which the same tandem rDNA origins were linked to the rRNA gene [Yu, G.-L. and Blackburn, E. H. (1990) Mol. Cell. Biol., 10, 2070-2080], origin sequences recombined to form linear molecules bearing several tandem repeats of the origin region, as well as rRNA genes. The total number of rDNA origin sequences eventually exceeded rRNA gene copies by approximately 20- to 40-fold and the number of circular replicons carrying only rDNA origin sequences exceeded rRNA gene copies by 2- to 3-fold. However, the rRNA gene dosage was unchanged. Hence, simply monitoring the total number of rDNA origin regions is not sufficient to regulate rRNA gene copy number. Images PMID:7784211
Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.
Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D
2015-05-01
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.
Rasmussen, Ulla; Svenning, Mette M.
1998-01-01
The presence of repeated DNA (short tandemly repeated repetitive [STRR] and long tandemly repeated repetitive [LTRR]) sequences in the genome of cyanobacteria was used to generate a fingerprint method for symbiotic and free-living isolates. Primers corresponding to the STRR and LTRR sequences were used in the PCR, resulting in a method which generate specific fingerprints for individual isolates. The method was useful both with purified DNA and with intact cyanobacterial filaments or cells as templates for the PCR. Twenty-three Nostoc isolates from a total of 35 were symbiotic isolates from the angiosperm Gunnera species, including isolates from the same Gunnera species as well as from different species. The results show a genetic similarity among isolates from different Gunnera species as well as a genetic heterogeneity among isolates from the same Gunnera species. Isolates which have been postulated to be closely related or identical revealed similar results by the PCR method, indicating that the technique is useful for clustering of even closely related strains. The method was applied to nonheterocystus cyanobacteria from which a fingerprint pattern was obtained. PMID:16349487
Sequence repeats and protein structure
NASA Astrophysics Data System (ADS)
Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos
2012-11-01
Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.
Genome-wide analysis of tandem repeats in plants and green algae
Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang
2014-01-01
Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...
Trofimova, Irina; Krasikova, Alla
2016-12-01
Tandemly organized highly repetitive DNA sequences are crucial structural and functional elements of eukaryotic genomes. Despite extensive evidence, satellite DNA remains an enigmatic part of the eukaryotic genome, with biological role and significance of tandem repeat transcripts remaining rather obscure. Data on tandem repeats transcription in amphibian and avian model organisms is fragmentary despite their genomes being thoroughly characterized. Review systematically covers historical and modern data on transcription of amphibian and avian satellite DNA in somatic cells and during meiosis when chromosomes acquire special lampbrush form. We highlight how transcription of tandemly repetitive DNA sequences is organized in interphase nucleus and on lampbrush chromosomes. We offer LTR-activation hypotheses of widespread satellite DNA transcription initiation during oogenesis. Recent explanations are provided for the significance of high-yield production of non-coding RNA derived from tandemly organized highly repetitive DNA. In many cases the data on the transcription of satellite DNA can be extrapolated from lampbrush chromosomes to interphase chromosomes. Lampbrush chromosomes with applied novel technical approaches such as superresolution imaging, chromosome microdissection followed by high-throughput sequencing, dynamic observation in life-like conditions provide amazing opportunities for investigation mechanisms of the satellite DNA transcription.
Krasikova, Alla
2016-01-01
ABSTRACT Tandemly organized highly repetitive DNA sequences are crucial structural and functional elements of eukaryotic genomes. Despite extensive evidence, satellite DNA remains an enigmatic part of the eukaryotic genome, with biological role and significance of tandem repeat transcripts remaining rather obscure. Data on tandem repeats transcription in amphibian and avian model organisms is fragmentary despite their genomes being thoroughly characterized. Review systematically covers historical and modern data on transcription of amphibian and avian satellite DNA in somatic cells and during meiosis when chromosomes acquire special lampbrush form. We highlight how transcription of tandemly repetitive DNA sequences is organized in interphase nucleus and on lampbrush chromosomes. We offer LTR-activation hypotheses of widespread satellite DNA transcription initiation during oogenesis. Recent explanations are provided for the significance of high-yield production of non-coding RNA derived from tandemly organized highly repetitive DNA. In many cases the data on the transcription of satellite DNA can be extrapolated from lampbrush chromosomes to interphase chromosomes. Lampbrush chromosomes with applied novel technical approaches such as superresolution imaging, chromosome microdissection followed by high-throughput sequencing, dynamic observation in life-like conditions provide amazing opportunities for investigation mechanisms of the satellite DNA transcription. PMID:27763817
Genetic characterization of the UCS and Kex1 loci of Pneumocystis jirovecii.
Esteves, F; Tavares, A; Costa, M C; Gaspar, J; Antunes, F; Matos, O
2009-02-01
Nucleotide variation in the Pneumocystis jirovecii upstream conserved sequence (UCS) and kexin-like serine protease (Kex1) loci was studied in pulmonary specimens from Portuguese HIV-positive patients. DNA was extracted and used for specific molecular sequence analysis. The number of UCS tandem repeats detected in 13 successfully sequenced isolates ranged from three (9 isolates, 69%) to four (4 isolates, 31%). A novel tandem repeat pattern and two novel polymorphisms were detected in the UCS region. For the Kex1 gene, the wild-type (24 isolates, 86%) was the most frequent sequence detected among the 28 sequenced isolates. Nevertheless, a nonsynonymous (1 isolate, 3%) and three synonymous (3 isolates, 11%) polymorphisms were detected and are described here for the first time.
2010-01-01
Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
Molecular tandem repeat strategy for elucidating mechanical properties of high-strength proteins
Jung, Huihun; Pena-Francesch, Abdon; Saadat, Alham; Sebastian, Aswathy; Kim, Dong Hwan; Hamilton, Reginald F.; Albert, Istvan; Allen, Benjamin D.; Demirel, Melik C.
2016-01-01
Many globular and structural proteins have repetitions in their sequences or structures. However, a clear relationship between these repeats and their contribution to the mechanical properties remains elusive. We propose a new approach for the design and production of synthetic polypeptides that comprise one or more tandem copies of a single unit with distinct amorphous and ordered regions. Our designed sequences are based on a structural protein produced in squid suction cups that has a segmented copolymer structure with amorphous and crystalline domains. We produced segmented polypeptides with varying repeat number, while keeping the lengths and compositions of the amorphous and crystalline regions fixed. We showed that mechanical properties of these synthetic proteins could be tuned by modulating their molecular weights. Specifically, the toughness and extensibility of synthetic polypeptides increase as a function of the number of tandem repeats. This result suggests that the repetitions in native squid proteins could have a genetic advantage for increased toughness and flexibility. PMID:27222581
Do, Hoang Dang Khoa; Kim, Joo-Hwan
2017-01-01
Chloroplast genomes (cpDNA) are highly valuable resources for evolutionary studies of angiosperms, since they are highly conserved, are small in size, and play critical roles in plants. Slipped-strand mispairing (SSM) was assumed to be a mechanism for generating repeat units in cpDNA. However, research on the employment of different small repeated sequences through SSM events, which may induce the accumulation of distinct types of repeats within the same region in cpDNA, has not been documented. Here, we sequenced two chloroplast genomes from the endemic species Heloniopsis tubiflora (Korea) and Xerophyllum tenax (USA) to cover the gap between molecular data and explore "hot spots" for genomic events in Melanthiaceae. Comparative analysis of 23 complete cpDNA sequences revealed that there were different stages of deletion in the rps16 region across the Melanthiaceae. Based on the partial or complete loss of rps16 gene in cpDNA, we have firstly reported potential molecular markers for recognizing two sections ( Veratrum and Fuscoveratrum ) of Veratrum . Melathiaceae exhibits a significant change in the junction between large single copy and inverted repeat regions, ranging from trnH_GUG to a part of rps3 . Our results show an accumulation of tandem repeats in the rpl23-ycf2 regions of cpDNAs. Small conserved sequences exist and flank tandem repeats in further observation of this region across most of the examined taxa of Liliales. Therefore, we propose three scenarios in which different small repeated sequences were used during SSM events to generate newly distinct types of repeats. Occasionally, prior to the SSM process, point mutation event and double strand break repair occurred and induced the formation of initial repeat units which are indispensable in the SSM process. SSM may have likely occurred more frequently for short repeats than for long repeat sequences in tribe Parideae (Melanthiaceae, Liliales). Collectively, these findings add new evidence of dynamic results from SSM in chloroplast genomes which can be useful for further evolutionary studies in angiosperms. Additionally, genomics events in cpDNA are potential resources for mining molecular markers in Liliales.
USDA-ARS?s Scientific Manuscript database
Whole genome tandem repeat polymorphisms were evaluated between two closely related Xylella fastidiosa strains, M23 and Temecula1, both cause almond leaf scorch disease (ALSD) and grape Pierce’s disease (PD) in California. Strain M23 was isolated from almond and the genome was sequenced in this stu...
Fernández-Tajes, Juan; Méndez, Josefina
2009-12-01
For a study of 5S ribosomal genes (rDNA) in the razor clam Ensis macha, the 5S rDNA region was amplified and sequenced. Two variants, so-called type I or short repeat (approximately 430 bp) and type II or long repeat (approximately 735 bp), appeared to be the main components of the 5S rDNA of this species. Their spacers differed markedly, both in length and nucleotide composition. The organization of the two variants was investigated by amplifying the genomic DNA with primers based on the sequence of the type I and type II spacers. PCR amplification products with primers EMLbF and EMSbR showed that the long and short repeats are associated within the same tandem array, suggesting an intermixed arrangement of both spacers. Nevertheless, amplifications carried out with inverse primers EMSinvF/R and EMLinvF/R revealed that some short and long repeats are contiguous in the same tandem array. This is the first report of the coexistence of two variable spacers in the same tandem array in bivalve mollusks.
Li, Jia; Gao, Lei; Chen, Shanshan; Tao, Ke; Su, Yingjuan; Wang, Ting
2016-02-11
Sciadopitys verticillata is an evergreen conifer and an economically valuable tree used in construction, which is the only member of the family Sciadopityaceae. Acquisition of the S. verticillata chloroplast (cp) genome will be useful for understanding the evolutionary mechanism of conifers and phylogenetic relationships among gymnosperm. In this study, we have first reported the complete chloroplast genome of S. verticillata. The total genome is 138,284 bp in length, consisting of 118 unique genes. The S. verticillata cp genome has lost one copy of the canonical inverted repeats and shown distinctive genomic structure comparing with other cupressophytes. Fifty-three simple sequence repeat loci and 18 forward tandem repeats were identified in the S. verticillata cp genome. According to the rearrangement of cupressophyte cp genome, we proposed one mechanism for the formation of inverted repeat: tandem repeat occured first, then rearrangement divided the tandem repeat into inverted repeats located at different regions. Phylogenetic estimates inferred from 59-gene sequences and cpDNA organizations have both shown that S. verticillata was sister to the clade consisting of Cupressaceae, Taxaceae, and Cephalotaxaceae. Moreover, accD gene was found to be lost in the S. verticillata cp genome, and a nucleus copy was identified from two transcriptome data.
Length and sequence variability in mitochondrial control region of the milkfish, Chanos chanos.
Ravago, Rachel G; Monje, Virginia D; Juinio-Meñez, Marie Antonette
2002-01-01
Extensive length variability was observed in the mitochondrial control region of the milkfish, Chanos chanos. The nucleotide sequence of the control region and flanking regions was determined. Length variability and heteroplasmy was due to the presence of varying numbers of a 41-bp tandemly repeated sequence and a 48-bp insertion/deletion (indel). The structure and organization of the milkfish control region is similar to that of other teleost fish and vertebrates. However, extensive variation in the copy number of tandem repeats (4-20 copies) and the presence of a relatively large (48-bp) indel, are apparently uncommon in teleost fish control region sequences reported to date. High sequence variability of control region peripheral domains indicates the potential utility of selected regions as markers for population-level studies.
Stability of Tandem Repeats in the Drosophila Melanogaster HSR-Omega Nuclear RNA
Hogan, N. C.; Slot, F.; Traverse, K. L.; Garbe, J. C.; Bendena, W. G.; Pardue, M. L.
1995-01-01
The Drosophila melanogaster Hsr-omega locus produces a nuclear RNA containing >5 kb of tandem repeat sequences. These repeats are unique to Hsr-omega and show concerted evolution similar to that seen with classical satellite DNAs. In D. melanogaster the monomer is ~280 bp. Sequences of 191/2 monomers differ by 8 +/- 5% (mean +/- SD), when all pairwise comparisons are considered. Differences are single nucleotide substitutions and 1-3 nucleotide deletions/insertions. Changes appear to be randomly distributed over the repeat unit. Outer repeats do not show the decrease in monomer homogeneity that might be expected if homogeneity is maintained by recombination. However, just outside the last complete repeat at each end, there are a few fragments of sequence similar to the monomer. The sequences in these flanking regions are not those predicted for sequences decaying in the absence of recombination. Instead, the fragmentation of the sequence homology suggests that flanking regions have undergone more severe disruptions, possibly during an insertion or amplification event. Hsr-omega alleles differing in the number of repeats are detected and appear to be stable over a few thousand generations; however, both increases and decreases in repeat numbers have been observed. The new alleles appear to be as stable as their predecessors. No alleles of less than ~5 kb nor more than ~16 kb of repeats were seen in any stocks examined. The evidence that there is a limit on the minimum number of repeats is consistent with the suggestion that these repeats are important in the function of the unusual Hsr-omega nuclear RNA. PMID:7540581
High Quality Maize Centromere 10 Sequence Reveals Evidence of Frequent Recombination Events
Wolfgruber, Thomas K.; Nakashima, Megan M.; Schneider, Kevin L.; Sharma, Anupma; Xie, Zidian; Albert, Patrice S.; Xu, Ronghui; Bilinski, Paul; Dawe, R. Kelly; Ross-Ibarra, Jeffrey; Birchler, James A.; Presting, Gernot G.
2016-01-01
The ancestral centromeres of maize contain long stretches of the tandemly arranged CentC repeat. The abundance of tandem DNA repeats and centromeric retrotransposons (CR) has presented a significant challenge to completely assembling centromeres using traditional sequencing methods. Here, we report a nearly complete assembly of the 1.85 Mb maize centromere 10 from inbred B73 using PacBio technology and BACs from the reference genome project. The error rates estimated from overlapping BAC sequences are 7 × 10−6 and 5 × 10−5 for mismatches and indels, respectively. The number of gaps in the region covered by the reassembly was reduced from 140 in the reference genome to three. Three expressed genes are located between 92 and 477 kb from the inferred ancestral CentC cluster, which lies within the region of highest centromeric repeat density. The improved assembly increased the count of full-length CR from 5 to 55 and revealed a 22.7 kb segmental duplication that occurred approximately 121,000 years ago. Our analysis provides evidence of frequent recombination events in the form of partial retrotransposons, deletions within retrotransposons, chimeric retrotransposons, segmental duplications including higher order CentC repeats, a deleted CentC monomer, centromere-proximal inversions, and insertion of mitochondrial sequences. Double-strand DNA break (DSB) repair is the most plausible mechanism for these events and may be the major driver of centromere repeat evolution and diversity. In many cases examined here, DSB repair appears to be mediated by microhomology, suggesting that tandem repeats may have evolved to efficiently repair frequent DSBs in centromeres. PMID:27047500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jackson, P.J.; Walthers, E.A.; Richmond, K.L.
1997-04-01
PCR analysis of 198 Bacillus anthracis isolates revealed a variable region of DNA sequence differing in length among the isolates. Five Polymorphisms differed by the presence Of two to six copies of the 12-bp tandem repeat 5{prime}-CAATATCAACAA-3{prime}. This variable-number tandem repeat (VNTR) region is located within a larger sequence containing one complete open reading frame that encodes a putative 30-kDa protein. Length variation did not change the reading frame of the encoded protein and only changed the copy number of a 4-amino-acid sequence (QYQQ) from 2 to 6. The structure of the VNTR region suggests that these multiple repeats aremore » generated by recombination or polymerase slippage. Protein structures predicted from the reverse-translated DNA sequence suggest that any structural changes in the encoded protein are confined to the region encoded by the VNTR sequence. Copy number differences in the VNTR region were used to define five different B. anthracis alleles. Characterization of 198 isolates revealed allele frequencies of 6.1, 17.7, 59.6, 5.6, and 11.1% sequentially from shorter to longer alleles. The high degree of polymorphism in the VNTR region provides a criterion for assigning isolates to five allelic categories. There is a correlation between categories and geographic distribution. Such molecular markers can be used to monitor the epidemiology of anthrax outbreaks in domestic and native herbivore populations. 22 refs., 4 figs., 3 tabs.« less
Novel variable number of tandem repeats of gibbon MAOA gene and its evolutionary significance.
Choi, Yuri; Jung, Yi-Deun; Ayarpadikannan, Selvam; Koga, Akihiko; Imai, Hiroo; Hirai, Hirohisa; Roos, Christian; Kim, Heui-Soo
2014-08-01
Variable number of tandem repeats (VNTRs) are scattered throughout the primate genome, and genetic variation of these VNTRs have been accumulated during primate radiation. Here, we analyzed VNTRs upstream of the monoamine oxidase A (MAOA) gene in 11 different gibbon species. An abundance of truncated VNTR sequences and copy number differences were observed compared to those of human VNTR sequences. To better understand the biological role of these VNTRs, a luciferase activity assay was conducted and results indicated that selected VNTR sequences of the MAOA gene from human and three different gibbon species (Hylobates klossii, Hylobates lar, and Nomascus concolor) showed silencing ability. Together, these data could be useful for understanding the evolutionary history and functional significance of MAOA VNTR sequences in gibbon species.
STRBase: a short tandem repeat DNA database for the human identity testing community
Ruitberg, Christian M.; Reeder, Dennis J.; Butler, John M.
2001-01-01
The National Institute of Standards and Technology (NIST) has compiled and maintained a Short Tandem Repeat DNA Internet Database (http://www.cstl.nist.gov/biotech/strbase/) since 1997 commonly referred to as STRBase. This database is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. STRBase consolidates and organizes the abundant literature on this subject to facilitate on-going efforts in DNA typing. Observed alleles and annotated sequence for each STR locus are described along with a review of STR analysis technologies. Additionally, commercially available STR multiplex kits are described, published polymerase chain reaction (PCR) primer sequences are reported, and validation studies conducted by a number of forensic laboratories are listed. To supplement the technical information, addresses for scientists and hyperlinks to organizations working in this area are available, along with the comprehensive reference list of over 1300 publications on STRs used for DNA typing purposes. PMID:11125125
Hirata, Satoshi; Kojima, Kaname; Misawa, Kazuharu; Gervais, Olivier; Kawai, Yosuke; Nagasaki, Masao
2018-05-01
Forensic DNA typing is widely used to identify missing persons and plays a central role in forensic profiling. DNA typing usually uses capillary electrophoresis fragment analysis of PCR amplification products to detect the length of short tandem repeat (STR) markers. Here, we analyzed whole genome data from 1,070 Japanese individuals generated using massively parallel short-read sequencing of 162 paired-end bases. We have analyzed 843,473 STR loci with two to six basepair repeat units and cataloged highly polymorphic STR loci in the Japanese population. To evaluate the performance of the cataloged STR loci, we compared 23 STR loci, widely used in forensic DNA typing, with capillary electrophoresis based STR genotyping results in the Japanese population. Seventeen loci had high correlations and high call rates. The other six loci had low call rates or low correlations due to either the limitations of short-read sequencing technology, the bioinformatics tool used, or the complexity of repeat patterns. With these analyses, we have also purified the suitable 218 STR loci with four basepair repeat units and 53 loci with five basepair repeat units both for short read sequencing and PCR based technologies, which would be candidates to the actual forensic DNA typing in Japanese population.
Waye, J S; Willard, H F
1986-09-01
The centromeric regions of all human chromosomes are characterized by distinct subsets of a diverse tandemly repeated DNA family, alpha satellite. On human chromosome 17, the predominant form of alpha satellite is a 2.7-kilobase-pair higher-order repeat unit consisting of 16 alphoid monomers. We present the complete nucleotide sequence of the 16-monomer repeat, which is present in 500 to 1,000 copies per chromosome 17, as well as that of a less abundant 15-monomer repeat, also from chromosome 17. These repeat units were approximately 98% identical in sequence, differing by the exclusion of precisely 1 monomer from the 15-monomer repeat. Homologous unequal crossing-over is suggested as a probable mechanism by which the different repeat lengths on chromosome 17 were generated, and the putative site of such a recombination event is identified. The monomer organization of the chromosome 17 higher-order repeat unit is based, in part, on tandemly repeated pentamers. A similar pentameric suborganization has been previously demonstrated for alpha satellite of the human X chromosome. Despite the organizational similarities, substantial sequence divergence distinguishes these subsets. Hybridization experiments indicate that the chromosome 17 and X subsets are more similar to each other than to the subsets found on several other human chromosomes. We suggest that the chromosome 17 and X alpha satellite subsets may be related components of a larger alphoid subfamily which have evolved from a common ancestral repeat into the contemporary chromosome-specific subsets.
Robinett, C C; O'Connor, A; Dunaway, M
1997-01-01
We have identified a novel activity for the region of the intergenic spacer of the Xenopus laevis rRNA genes that contains the 35- and 100-bp repeats. We devised a new assay for this region by constructing DNA plasmids containing a tandem repeat of rRNA reporter genes that were separated by the 35- and 100-bp repeat region and a rRNA gene enhancer. When the 35- and 100-bp repeat region is present in its normal position and orientation at the 3' end of the rRNA reporter genes, the enhancer activates the adjacent downstream promoter but not the upstream rRNA promoter on the same plasmid. Because this element can restrict the range of an enhancer's activity in the context of tandem genes, we have named it the repeat organizer (RO). The ability to restrict enhancer action is a feature of insulator elements, but unlike previously described insulator elements the RO does not block enhancer action in a simple enhancer-blocking assay. Instead, the activity of the RO requires that it be in its normal position and orientation with respect to the other sequence elements of the rRNA genes. The enhancer-binding transcription factor xUBF also binds to the repetitive sequences of the RO in vitro, but these sequences do not activate transcription in vivo. We propose that the RO is a specialized insulator element that organizes the tandem array of rRNA genes into single-gene expression units by promoting activation of a promoter by its proximal enhancers. PMID:9111359
Optimization of sequence alignment for simple sequence repeat regions.
Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C
2011-07-20
Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
Tandem Repeat Proteins Inspired By Squid Ring Teeth
NASA Astrophysics Data System (ADS)
Pena-Francesch, Abdon
Proteins are large biomolecules consisting of long chains of amino acids that hierarchically assemble into complex structures, and provide a variety of building blocks for biological materials. The repetition of structural building blocks is a natural evolutionary strategy for increasing the complexity and stability of protein structures. However, the relationship between amino acid sequence, structure, and material properties of protein systems remains unclear due to the lack of control over the protein sequence and the intricacies of the assembly process. In order to investigate the repetition of protein building blocks, a recently discovered protein from squids is examined as an ideal protein system. Squid ring teeth are predatory appendages located inside the suction cups that provide a strong grasp of prey, and are solely composed of a group of proteins with tandem repetition of building blocks. The objective of this thesis is the understanding of sequence, structure and property relationship in repetitive protein materials inspired in squid ring teeth for the first time. Specifically, this work focuses on squid-inspired structural proteins with tandem repeat units in their sequence (i.e., repetition of alternating building blocks) that are physically cross-linked via beta-sheet structures. The research work presented here tests the hypothesis that, in these systems, increasing the number of building blocks in the polypeptide chain decreases the protein network defects and improves the material properties. Hence, the sequence, nanostructure, and properties (thermal, mechanical, and conducting) of tandem repeat squid-inspired protein materials are examined. Spectroscopic structural analysis, advanced materials characterization, and entropic elasticity theory are combined to elucidate the structure and material properties of these repetitive proteins. This approach is applied not only to native squid proteins but also to squid-inspired synthetic polypeptides that allow for a fine control of the sequence and network morphology. The results provided in this work establish a clear dependence between the repetitive building blocks, the network morphology, and the properties of squid-inspired repetitive protein materials. Increasing the number of tandem repeat units in SRT-inspired proteins led to more effective protein networks with superior properties. Through increasing tandem repetition and optimization of network morphology, highly efficient protein materials capable of withstanding deformations up to 400% of their original length, with MPa-GPa modulus, high energy absorption (50 MJ m-3), peak proton conductivity of 3.7 mS cm-1 (at pH 7, highest reported to date for biological materials), and peak thermal conductivity of 1.4 W m-1 K -1 (which exceeds that of most polymer materials) were developed. These findings introduce new design rules in the engineering of proteins based on tandem repetition and morphology control, and provide a novel framework for tailoring and optimizing the properties of protein-based materials.
Medium-sized tandem repeats represent an abundant component of the Drosophila virilis genome.
Abdurashitov, Murat A; Gonchar, Danila A; Chernukhin, Valery A; Tomilov, Victor N; Tomilova, Julia E; Schostak, Natalia G; Zatsepina, Olga G; Zelentsova, Elena S; Evgen'ev, Michael B; Degtyarev, Sergey K H
2013-11-09
Previously, we developed a simple method for carrying out a restriction enzyme analysis of eukaryotic DNA in silico, based on the known DNA sequences of the genomes. This method allows the user to calculate lengths of all DNA fragments that are formed after a whole genome is digested at the theoretical recognition sites of a given restriction enzyme. A comparison of the observed peaks in distribution diagrams with the results from DNA cleavage using several restriction enzymes performed in vitro have shown good correspondence between the theoretical and experimental data in several cases. Here, we applied this approach to the annotated genome of Drosophila virilis which is extremely rich in various repeats. Here we explored the combined approach to perform the restriction analysis of D. virilis DNA. This approach enabled to reveal three abundant medium-sized tandem repeats within the D. virilis genome. While the 225 bp repeats were revealed previously in intergenic non-transcribed spacers between ribosomal genes of D. virilis, two other families comprised of 154 bp and 172 bp repeats were not described. Tandem Repeats Finder search demonstrated that 154 bp and 172 bp units are organized in multiple clusters in the genome of D. virilis. Characteristically, only 154 bp repeats derived from Helitron transposon are transcribed. Using in silico digestion in combination with conventional restriction analysis and sequencing of repeated DNA fragments enabled us to isolate and characterize three highly abundant families of medium-sized repeats present in the D. virilis genome. These repeats comprise a significant portion of the genome and may have important roles in genome function and structural integrity. Therefore, we demonstrated an approach which makes possible to investigate in detail the gross arrangement and expression of medium-sized repeats basing on sequencing data even in the case of incompletely assembled and/or annotated genomes.
Efficient production of artificially designed gelatins with a Bacillus brevis system.
Kajino, T; Takahashi, H; Hirai, M; Yamada, Y
2000-01-01
Artificially designed gelatins comprising tandemly repeated 30-amino-acid peptide units derived from human alphaI collagen were successfully produced with a Bacillus brevis system. The DNA encoding the peptide unit was synthesized by taking into consideration the codon usage of the host cells, but no clones having a tandemly repeated gene were obtained through the above-mentioned strategy. Minirepeat genes could be selected in vivo from a mixture of every possible sequence encoding an artificial gelatin by randomly ligating the mixed sequence unit and transforming it into Escherichia coli. Larger repeat genes constructed by connecting minirepeat genes obtained by in vivo selection were also stable in the expression host cells. Gelatins derived from the eight-unit and six-unit repeat genes were extracellularly produced at the level of 0.5 g/liter and easily purified by ammonium sulfate fractionation and anion-exchange chromatography. The purified artificial gelatins had the predicted N-terminal sequences and amino acid compositions and a solgel property similar to that of the native gelatin. These results suggest that the selection of a repeat unit sequence stable in an expression host is a shortcut for the efficient production of repetitive proteins and that it can conveniently be achieved by the in vivo selection method. This study revealed the possible industrial application of artificially designed repetitive proteins.
Tandem Repeats in Proteins: Prediction Algorithms and Biological Role.
Pellegrini, Marco
2015-01-01
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Combined deficiency of MSH2 and Sμ region abolishes class switch recombination.
Leduc, Claire; Haddad, Dania; Laviolette-Malirat, Nathalie; Nguyen Huu, Ngoc-Sa; Khamlichi, Ahmed Amine
2010-10-01
Class switch recombination (CSR) is mediated by G-rich tandem repeated sequences termed switch regions. Transcription of switch regions generates single-stranded R loops that provide substrates for activation-induced cytidine deaminase. Mice deficient in MSH2 have a mild defect in CSR and analysis of their switch junctions has led to a model in which MSH2 is more critical for switch recombination events outside than within the tandem repeats. It is also known that deletion of the whole Sμ region severely impairs but does not abrogate CSR despite the lack of detectable R loops. Here, we demonstrate that deficiency of both MSH2 and the Sμ region completely abolishes CSR and that the abrogation occurs at the genomic level. This finding further supports the crucial role of MSH2 outside the tandem repeats. It also indicates that during CSR, MSH2 has access to activation-induced cytidine deaminase targets in R-loop-deficient Iμ-Cμ sequences rarely used in CSR, suggesting an MSH2-dependent DNA processing activity at the Iμ exon that may decrease with transcription elongation across the Sμ region.
Histone and ribosomal RNA repetitive gene clusters of the boll weevil are linked in a tandem array.
Roehrdanz, R; Heilmann, L; Senechal, P; Sears, S; Evenson, P
2010-08-01
Histones are the major protein component of chromatin structure. The histone family is made up of a quintet of proteins, four core histones (H2A, H2B, H3 & H4) and the linker histones (H1). Spacers are found between the coding regions. Among insects this quintet of genes is usually clustered and the clusters are tandemly repeated. Ribosomal DNA contains a cluster of the rRNA sequences 18S, 5.8S and 28S. The rRNA genes are separated by the spacers ITS1, ITS2 and IGS. This cluster is also tandemly repeated. We found that the ribosomal RNA repeat unit of at least two species of Anthonomine weevils, Anthonomus grandis and Anthonomus texanus (Coleoptera: Curculionidae), is interspersed with a block containing the histone gene quintet. The histone genes are situated between the rRNA 18S and 28S genes in what is known as the intergenic spacer region (IGS). The complete reiterated Anthonomus grandis histone-ribosomal sequence is 16,248 bp.
Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin
2015-01-01
We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355
Human minisatellite alleles detectable only after PCR amplification.
Armour, J A; Crosier, M; Jeffreys, A J
1992-01-01
We present evidence that a proportion of alleles at two human minisatellite loci is undetected by standard Southern blot hybridization. In each case the missing allele(s) can be identified after PCR amplification and correspond to tandem arrays too short to detect by hybridization. At one locus, there is only one undetected allele (population frequency 0.3), which contains just three repeat units. At the second locus, there are at least five undetected alleles (total population frequency 0.9) containing 60-120 repeats; they are not detected because these tandem repeats give very poor signals when used as a probe in standard Southern blot hybridization, and also cross-hybridize with other sequences in the genome. Under these circumstances only signals from the longest tandemly repeated alleles are detectable above the nonspecific background. The structures of these loci have been compared in human and primate DNA, and at one locus the short human allele containing three repeat units is shown to be an intermediate state in the expansion of a monomeric precursor allele in primates to high copy number in the longer human arrays. We discuss the implications of such loci for studies of human populations, minisatellite isolation by cloning, and the evolution of highly variable tandem arrays.
Machado, Filipe Brum; Machado, Fabricio Brum; Faria, Milena Amendro; Lovatel, Viviane Lamim; Alves da Silva, Antonio Francisco; Radic, Claudia Pamela; De Brasi, Carlos Daniel; Rios, Álvaro Fabricio Lopes; de Sousa Lopes, Susana Marina Chuva; da Silveira, Leonardo Serafim; Ruiz-Miranda, Carlos Ramon; Ramos, Ester Silveira; Medina-Acosta, Enrique
2014-01-01
X-chromosome inactivation (XCI) is the epigenetic transcriptional silencing of an X-chromosome during the early stages of embryonic development in female eutherian mammals. XCI assures monoallelic expression in each cell and compensation for dosage-sensitive X-linked genes between females (XX) and males (XY). DNA methylation at the carbon-5 position of the cytosine pyrimidine ring in the context of a CpG dinucleotide sequence (5meCpG) in promoter regions is a key epigenetic marker for transcriptional gene silencing. Using computational analysis, we revealed an extragenic tandem GAAA repeat 230-bp from the landmark CpG island of the human X-linked retinitis pigmentosa 2 RP2 promoter whose 5meCpG status correlates with XCI. We used this RP2 onshore tandem GAAA repeat to develop an allele-specific 5meCpG-based PCR assay that is highly concordant with the human androgen receptor (AR) exonic tandem CAG repeat-based standard HUMARA assay in discriminating active (Xa) from inactive (Xi) X-chromosomes. The RP2 onshore tandem GAAA repeat contains neutral features that are lacking in the AR disease-linked tandem CAG repeat, is highly polymorphic (heterozygosity rates approximately 0.8) and shows minimal variation in the Xa/Xi ratio. The combined informativeness of RP2/AR is approximately 0.97, and this assay excels at determining the 5meCpG status of alleles at the Xp (RP2) and Xq (AR) chromosome arms in a single reaction. These findings are relevant and directly translatable to nonhuman primate models of XCI in which the AR CAG-repeat is monomorphic. We conducted the RP2 onshore tandem GAAA repeat assay in the naturally occurring chimeric New World monkey marmoset (Callitrichidae) and found it to be informative. The RP2 onshore tandem GAAA repeat will facilitate studies on the variable phenotypic expression of dominant and recessive X-linked diseases, epigenetic changes in twins, the physiology of aging hematopoiesis, the pathogenesis of age-related hematopoietic malignancies and the clonality of cancers in human and nonhuman primates.
Machado, Filipe Brum; Machado, Fabricio Brum; Faria, Milena Amendro; Lovatel, Viviane Lamim; Alves da Silva, Antonio Francisco; Radic, Claudia Pamela; De Brasi, Carlos Daniel; Rios, Álvaro Fabricio Lopes; de Sousa Lopes, Susana Marina Chuva; da Silveira, Leonardo Serafim; Ruiz-Miranda, Carlos Ramon; Ramos, Ester Silveira; Medina-Acosta, Enrique
2014-01-01
X-chromosome inactivation (XCI) is the epigenetic transcriptional silencing of an X-chromosome during the early stages of embryonic development in female eutherian mammals. XCI assures monoallelic expression in each cell and compensation for dosage-sensitive X-linked genes between females (XX) and males (XY). DNA methylation at the carbon-5 position of the cytosine pyrimidine ring in the context of a CpG dinucleotide sequence (5meCpG) in promoter regions is a key epigenetic marker for transcriptional gene silencing. Using computational analysis, we revealed an extragenic tandem GAAA repeat 230-bp from the landmark CpG island of the human X-linked retinitis pigmentosa 2 RP2 promoter whose 5meCpG status correlates with XCI. We used this RP2 onshore tandem GAAA repeat to develop an allele-specific 5meCpG-based PCR assay that is highly concordant with the human androgen receptor (AR) exonic tandem CAG repeat-based standard HUMARA assay in discriminating active (Xa) from inactive (Xi) X-chromosomes. The RP2 onshore tandem GAAA repeat contains neutral features that are lacking in the AR disease-linked tandem CAG repeat, is highly polymorphic (heterozygosity rates approximately 0.8) and shows minimal variation in the Xa/Xi ratio. The combined informativeness of RP2/AR is approximately 0.97, and this assay excels at determining the 5meCpG status of alleles at the Xp (RP2) and Xq (AR) chromosome arms in a single reaction. These findings are relevant and directly translatable to nonhuman primate models of XCI in which the AR CAG-repeat is monomorphic. We conducted the RP2 onshore tandem GAAA repeat assay in the naturally occurring chimeric New World monkey marmoset (Callitrichidae) and found it to be informative. The RP2 onshore tandem GAAA repeat will facilitate studies on the variable phenotypic expression of dominant and recessive X-linked diseases, epigenetic changes in twins, the physiology of aging hematopoiesis, the pathogenesis of age-related hematopoietic malignancies and the clonality of cancers in human and nonhuman primates. PMID:25078280
Laukkanen-Ninios, R.; Ortiz Martínez, P.; Siitonen, A.; Fredriksson-Ahomaa, M.; Korkeala, H.
2013-01-01
Sporadic and epidemiologically linked Yersinia enterocolitica strains (n = 379) isolated from fecal samples from human patients, tonsil or fecal samples from pigs collected at slaughterhouses, and pork samples collected at meat stores were genotyped using multiple-locus variable-number tandem-repeat analysis (MLVA) with six loci, i.e., V2A, V4, V5, V6, V7, and V9. In total, 312 different MLVA types were found. Similar types were detected (i) in fecal samples collected from human patients over 2 to 3 consecutive years, (ii) in samples from humans and pigs, and (iii) in samples from pigs that originated from the same farms. Among porcine strains, we found farm-specific MLVA profiles. Variations in the numbers of tandem repeats from one to four for variable-number tandem-repeat (VNTR) loci V2A, V5, V6, and V7 were observed within a farm. MLVA was applicable for serotypes O:3, O:5,27, and O:9 and appeared to be a highly discriminating tool for distinguishing sporadic and outbreak-related strains. With long-term use, interpretation of the results became more challenging due to variations in more-discriminating loci, as was observed for strains originating from pig farms. Additionally, we encountered unexpectedly short V2A VNTR fragments and sequenced them. According to the sequencing results, updated guidelines for interpreting V2A VNTR results were prepared. PMID:23637293
Diversity and evolution of centromere repeats in the maize genome.
Bilinski, Paul; Distor, Kevin; Gutierrez-Lopez, Jose; Mendoza, Gabriela Mendoza; Shi, Jinghua; Dawe, R Kelly; Ross-Ibarra, Jeffrey
2015-03-01
Centromere repeats are found in most eukaryotes and play a critical role in kinetochore formation. Though centromere repeats exhibit considerable diversity both within and among species, little is understood about the mechanisms that drive centromere repeat evolution. Here, we use maize as a model to investigate how a complex history involving polyploidy, fractionation, and recent domestication has impacted the diversity of the maize centromeric repeat CentC. We first validate the existence of long tandem arrays of repeats in maize and other taxa in the genus Zea. Although we find considerable sequence diversity among CentC copies genome-wide, genetic similarity among repeats is highest within these arrays, suggesting that tandem duplications are the primary mechanism for the generation of new copies. Nonetheless, clustering analyses identify similar sequences among distant repeats, and simulations suggest that this pattern may be due to homoplasious mutation. Although the two ancestral subgenomes of maize have contributed nearly equal numbers of centromeres, our analysis shows that the majority of all CentC repeats derive from one of the parental genomes, with an even stronger bias when examining the largest assembled contiguous clusters. Finally, by comparing maize with its wild progenitor teosinte, we find that the abundance of CentC likely decreased after domestication, while the pericentromeric repeat Cent4 has drastically increased.
Gorkhali, Neena Amatya; Jiang, Lin; Shrestha, Bhola Shankar; He, Xiao-Hong; Junzhao, Qian; Han, Jian-Lin; Ma, Yue-Hui
2016-07-01
Heteroplasmy due to length polymorphism with tandem repeats in mtDNAs within individual was hardly studied in domestic animals. In the present study, we identified intra-individual length variation in the control region of mtDNAs in Nepalese sheep by molecular cloning and sequencing techniques. We observed one to four tandem repeats of a 75-bp nucleotide sequences in the mtDNA control region in 45% of the total Nepalese sheep sampled in contrast to the Chinese sheep, indicating that the heteroplasmy is specific to Nepalese sheep. The high rate of heteroplasmy in Nepalese sheep could be a resultant of the mtDNA mutation and independent segregation at intra-individual level or a strand slippage and mispairing during the replication.
Mitochondrial genome of the tomato clownfish Amphiprion frenatus (Pomacentridae, Amphiprioninae).
Ye, Le; Hu, Jing; Wu, Kaichang; Wang, Yu; Li, Jianlong
2016-01-01
The complete mitochondrial (mt) genome of the tomato clownfish Amphiprion frenatus was obtained in this study. The circular mtDNA molecule was 16,774 bp in size and the overall nucleotide composition of the H-strand was 29.72% A, 25.81% T, 15.38% G and 29.09% C, with an A + T bias. The complete mitogenome encoded 13 protein-coding genes, 2 rRNAs, 22 tRNAs and a control region (D-loop), with the gene arrangement and translation direction basically identical to other typical vertebrate mitogenomes. The D-loop included termination associated sequence (TAS), central conserved domain (CCD) and conserved sequence block (CSB), and was composed of 6 complete continuity tandem repeat units and an imperfect tandem repeat unit.
APE1 incision activity at abasic sites in tandem repeat sequences.
Li, Mengxia; Völker, Jens; Breslauer, Kenneth J; Wilson, David M
2014-05-29
Repetitive DNA sequences, such as those present in microsatellites and minisatellites, telomeres, and trinucleotide repeats (linked to fragile X syndrome, Huntington disease, etc.), account for nearly 30% of the human genome. These domains exhibit enhanced susceptibility to oxidative attack to yield base modifications, strand breaks, and abasic sites; have a propensity to adopt non-canonical DNA forms modulated by the positions of the lesions; and, when not properly processed, can contribute to genome instability that underlies aging and disease development. Knowledge on the repair efficiencies of DNA damage within such repetitive sequences is therefore crucial for understanding the impact of such domains on genomic integrity. In the present study, using strategically designed oligonucleotide substrates, we determined the ability of human apurinic/apyrimidinic endonuclease 1 (APE1) to cleave at apurinic/apyrimidinic (AP) sites in a collection of tandem DNA repeat landscapes involving telomeric and CAG/CTG repeat sequences. Our studies reveal the differential influence of domain sequence, conformation, and AP site location/relative positioning on the efficiency of APE1 binding and strand incision. Intriguingly, our data demonstrate that APE1 endonuclease efficiency correlates with the thermodynamic stability of the DNA substrate. We discuss how these results have both predictive and mechanistic consequences for understanding the success and failure of repair protein activity associated with such oxidatively sensitive, conformationally plastic/dynamic repetitive DNA domains. Published by Elsevier Ltd.
Lim, K Yoong; Kovarik, Ales; Matyasek, Roman; Chase, Mark W; Knapp, Sandra; McCarthy, Elizabeth; Clarkson, James J; Leitch, Andrew R
2006-12-01
Combining phylogenetic reconstructions of species relationships with comparative genomic approaches is a powerful way to decipher evolutionary events associated with genome divergence. Here, we reconstruct the history of karyotype and tandem repeat evolution in species of diploid Nicotiana section Alatae. By analysis of plastid DNA, we resolved two clades with high bootstrap support, one containing N. alata, N. langsdorffii, N. forgetiana and N. bonariensis (called the n = 9 group) and another containing N. plumbaginifolia and N. longiflora (called the n = 10 group). Despite little plastid DNA sequence divergence, we observed, via fluorescent in situ hybridization, substantial chromosomal repatterning, including altered chromosome numbers, structure and distribution of repeats. Effort was focussed on 35S and 5S nuclear ribosomal DNA (rDNA) and the HRS60 satellite family of tandem repeats comprising the elements HRS60, NP3R and NP4R. We compared divergence of these repeats in diploids and polyploids of Nicotiana. There are dramatic shifts in the distribution of the satellite repeats and complete replacement of intergenic spacers (IGSs) of 35S rDNA associated with divergence of the species in section Alatae. We suggest that sequence homogenization has replaced HRS60 family repeats at sub-telomeric regions, but that this process may not occur, or occurs more slowly, when the repeats are found at intercalary locations. Sequence homogenization acts more rapidly (at least two orders of magnitude) on 35S rDNA than 5S rDNA and sub-telomeric satellite sequences. This rapid rate of divergence is analogous to that found in polyploid species, and is therefore, in plants, not only associated with polyploidy.
Rouhiainen, L; Sivonen, K; Buikema, W J; Haselkorn, R
1995-01-01
Cyanobacteria produce toxins that kill animals. The two main classes of cyanobacterial toxins are cyclic peptides that cause liver damage and alkaloids that block nerve transmission. Many toxin-producing strains from Finnish lakes were brought into axenic culture, and their toxins were characterized. Restriction fragment length polymorphism analysis, probing with a short tandemly repeated DNA sequence found at many locations in the chromosome of Anabaena sp. strain PCC 7120, distinguishes hepatotoxic Anabaena isolates from neurotoxin-producing strains and from Nostoc spp. PMID:7592362
RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins.
Hirsh, Layla; Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C E
2018-05-09
RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.
Vioque, A
1997-01-01
The RNase P RNA gene (rnpB) from 10 cyanobacteria has been characterized. These new RNAs, together with the previously available ones, provide a comprehensive data set of RNase P RNA from diverse cyanobacterial lineages. All heterocystous cyanobacteria, but none of the non-heterocystous strains analyzed, contain short tandemly repeated repetitive (STRR) sequences that increase the length of helix P12. Site-directed mutagenesis experiments indicate that the STRR sequences are not required for catalytic activity in vitro. STRR sequences seem to have recently and independently invaded the RNase P RNA genes in heterocyst-forming cyanobacteria because closely related strains contain unrelated STRR sequences. Most cyanobacteria RNase P RNAs lack the sequence GGU in the loop connecting helices P15 and P16 that has been established to interact with the 3'-end CCA in precursor tRNA substrates in other bacteria. This character is shared with plastid RNase P RNA. Helix P6 is longer than usual in most cyanobacteria as well as in plastid RNase P RNA. PMID:9254706
Deucher, Anne; Chiang, Tsoyu; Schrijver, Iris
2010-01-01
Typing of STR (short tandem repeat) alleles is used in a variety of applications in clinical molecular pathology, including evaluations for maternal cell contamination. Using a commercially available STR typing assay for maternal cell contamination performed in conjunction with prenatal diagnostic testing, we were posed with apparent nonmaternity when the two fetal samples did not demonstrate the expected maternal allele at one locus. By designing primers external to the region amplified by the primers from the commercial assay and by performing direct sequencing of the resulting amplicon, we were able to determine that a guanine to adenine sequence variation led to primer mismatch and allele dropout. This explained the apparent null allele shared between the maternal and fetal samples. Therefore, although rare, allele dropout must be considered whenever unexplained homozygosity at an STR locus is observed. PMID:20203001
Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Neumann, Pavel; Novák, Petr; Fojtová, Miloslava; Vu, Giang T H; Macas, Jiří; Fajkus, Jiří; Schubert, Ingo; Fuchs, Joerg
2015-12-01
Linear chromosomes of eukaryotic organisms invariably possess centromeres and telomeres to ensure proper chromosome segregation during nuclear divisions and to protect the chromosome ends from deterioration and fusion, respectively. While centromeric sequences may differ between species, with arrays of tandemly repeated sequences and retrotransposons being the most abundant sequence types in plant centromeres, telomeric sequences are usually highly conserved among plants and other organisms. The genome size of the carnivorous genus Genlisea (Lentibulariaceae) is highly variable. Here we study evolutionary sequence plasticity of these chromosomal domains at an intrageneric level. We show that Genlisea nigrocaulis (1C = 86 Mbp; 2n = 40) and G. hispidula (1C = 1550 Mbp; 2n = 40) differ as to their DNA composition at centromeres and telomeres. G. nigrocaulis and its close relative G. pygmaea revealed mainly 161 bp tandem repeats, while G. hispidula and its close relative G. subglabra displayed a combination of four retroelements at centromeric positions. G. nigrocaulis and G. pygmaea chromosome ends are characterized by the Arabidopsis-type telomeric repeats (TTTAGGG); G. hispidula and G. subglabra instead revealed two intermingled sequence variants (TTCAGG and TTTCAGG). These differences in centromeric and, surprisingly, also in telomeric DNA sequences, uncovered between groups with on average a > 9-fold genome size difference, emphasize the fast genome evolution within this genus. Such intrageneric evolutionary alteration of telomeric repeats with cytosine in the guanine-rich strand, not yet known for plants, might impact the epigenetic telomere chromatin modification. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Sequence Effect on the Formation of DNA Minidumbbells.
Liu, Yuan; Lam, Sik Lok
2017-11-16
The DNA minidumbbell (MDB) is a recently identified non-B structure. The reported MDBs contain two TTTA, CCTG, or CTTG type II loops. At present, the knowledge and understanding of the sequence criteria for MDB formation are still limited. In this study, we performed a systematic high-resolution nuclear magnetic resonance (NMR) and native gel study to investigate the effect of sequence variations in tandem repeats on the formation of MDBs. Our NMR results reveal the importance of hydrogen bonds, base-base stacking, and hydrophobic interactions from each of the participating residues. We conclude that in the MDBs formed by tandem repeats, C-G loop-closing base pairs are more stabilizing than T-A loop-closing base pairs, and thymine residues in both the second and third loop positions are more stabilizing than cytosine residues. The results from this study enrich our knowledge on the sequence criteria for the formation of MDBs, paving a path for better exploring their potential roles in biological systems and DNA nanotechnology.
Guizard, Sébastien; Piégu, Benoît; Arensburger, Peter; Guillou, Florian; Bigot, Yves
2016-08-19
The program RepeatMasker and the database Repbase-ISB are part of the most widely used strategy for annotating repeats in animal genomes. They have been used to show that avian genomes have a lower repeat content (8-12 %) than the sequenced genomes of many vertebrate species (30-55 %). However, the efficiency of such a library-based strategies is dependent on the quality and completeness of the sequences in the database that is used. An alternative to these library based methods are methods that identify repeats de novo. These alternative methods have existed for a least a decade and may be more powerful than the library based methods. We have used an annotation strategy involving several complementary de novo tools to determine the repeat content of the model genome galGal4 (1.04 Gbp), including identifying simple sequence repeats (SSRs), tandem repeats and transposable elements (TEs). We annotated over one Gbp. of the galGal4 genome and showed that it is composed of approximately 19 % SSRs and TEs repeats. Furthermore, we estimate that the actual genome of the red jungle fowl contains about 31-35 % repeats. We find that library-based methods tend to overestimate TE diversity. These results have a major impact on the current understanding of repeats distributions throughout chromosomes in the red jungle fowl. Our results are a proof of concept of the reliability of using de novo tools to annotate repeats in large animal genomes. They have also revealed issues that will need to be resolved in order to develop gold-standard methodologies for annotating repeats in eukaryote genomes.
Exploring the repeat protein universe through computational protein design
Brunette, TJ; Parmeggiani, Fabio; Huang, Po-Ssu; ...
2015-12-16
A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications. In this paper, we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix–loop–helix–loop structural motif. Eighty-three designs with sequences unrelatedmore » to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Finally, our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.« less
Chicken microsatellite markers isolated from libraries enriched for simple tandem repeats.
Gibbs, M; Dawson, D A; McCamley, C; Wardle, A F; Armour, J A; Burke, T
1997-12-01
The total number of microsatellite loci is considered to be at least 10-fold lower in avian species than in mammalian species. Therefore, efficient large-scale cloning of chicken microsatellites, as required for the construction of a high-resolution linkage map, is facilitated by the construction of libraries using an enrichment strategy. In this study, a plasmid library enriched for tandem repeats was constructed from chicken genomic DNA by hybridization selection. Using this technique the proportion of recombinant clones that cross-hybridized to probes containing simple tandem repeats was raised to 16%, compared with < 0.1% in a non-enriched library. Primers were designed from 121 different sequences. Polymerase chain reaction (PCR) analysis of two chicken reference pedigrees enabled 72 loci to be localized within the collaborative chicken genetic map, and at least 30 of the remaining loci have been shown to be informative in these or other crosses.
Richard, François D; Kajava, Andrey V
2014-06-01
The dramatic growth of sequencing data evokes an urgent need to improve bioinformatics tools for large-scale proteome analysis. Over the last two decades, the foremost efforts of computer scientists were devoted to proteins with aperiodic sequences having globular 3D structures. However, a large portion of proteins contain periodic sequences representing arrays of repeats that are directly adjacent to each other (so called tandem repeats or TRs). These proteins frequently fold into elongated fibrous structures carrying different fundamental functions. Algorithms specific to the analysis of these regions are urgently required since the conventional approaches developed for globular domains have had limited success when applied to the TR regions. The protein TRs are frequently not perfect, containing a number of mutations, and some of them cannot be easily identified. To detect such "hidden" repeats several algorithms have been developed. However, the most sensitive among them are time-consuming and, therefore, inappropriate for large scale proteome analysis. To speed up the TR detection we developed a rapid filter that is based on the comparison of composition and order of short strings in the adjacent sequence motifs. Tests show that our filter discards up to 22.5% of proteins which are known to be without TRs while keeping almost all (99.2%) TR-containing sequences. Thus, we are able to decrease the size of the initial sequence dataset enriching it with TR-containing proteins which allows a faster subsequent TR detection by other methods. The program is available upon request. Copyright © 2014 Elsevier Inc. All rights reserved.
Stress-induced rearrangement of Fusarium retrotransposon sequences.
Anaya, N; Roncero, M I
1996-11-27
Rearrangement of fusarium oxysporum retrotransposon skippy was induced by growth in the presence of potassium chlorate. Three fungal strains, one sensitive to chlorate (Co60) and two resistant to chlorate and deficient for nitrate reductase (Co65 and Co94), were studied by Southern analysis of their genomic DNA. Polymorphism was detected in their hybridization banding pattern, relative to the wild type grown in the absence of chlorate, using various enzymes with or without restriction sites within the retrotransposon. Results were consistent with the assumption that three different events had occurred in strain Co60: genomic amplification of skippy yielding tandem arrays of the element, generation of new skippy sequences, and deletion of skippy sequences. Amplification of Co60 genomic DNA using the polymerase chain reaction and divergent primers derived from the retrotransposon generated a new band, corresponding to one long terminal repeat plus flanking sequences, that was not present in the wild-type strain. Molecular analysis of nitrate reductase-deficient mutants showed that generation and deletion of skippy sequences, but not genomic amplification in tandem repeats, had occurred in their genomes.
The evolution of filamin – A protein domain repeat perspective
Light, Sara; Sagit, Rauan; Ithychanda, Sujay S.; Qin, Jun; Elofsson, Arne
2013-01-01
Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin β3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates. PMID:22414427
The evolution of filamin-a protein domain repeat perspective.
Light, Sara; Sagit, Rauan; Ithychanda, Sujay S; Qin, Jun; Elofsson, Arne
2012-09-01
Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin β3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates. Copyright © 2012 Elsevier Inc. All rights reserved.
The cotton centromere contains a Ty3-gypsy-like LTR retroelement.
Luo, Song; Mach, Jennifer; Abramson, Bradley; Ramirez, Rolando; Schurr, Robert; Barone, Pierluigi; Copenhaver, Gregory; Folkerts, Otto
2012-01-01
The centromere is a repeat-rich structure essential for chromosome segregation; with the long-term aim of understanding centromere structure and function, we set out to identify cotton centromere sequences. To isolate centromere-associated sequences from cotton, (Gossypium hirsutum) we surveyed tandem and dispersed repetitive DNA in the genus. Centromere-associated elements in other plants include tandem repeats and, in some cases, centromere-specific retroelements. Examination of cotton genomic survey sequences for tandem repeats yielded sequences that did not localize to the centromere. However, among the repetitive sequences we also identified a gypsy-like LTR retrotransposon (Centromere Retroelement Gossypium, CRG) that localizes to the centromere region of all chromosomes in domestic upland cotton, Gossypium hirsutum, the major commercially grown cotton. The location of the functional centromere was confirmed by immunostaining with antiserum to the centromere-specific histone CENH3, which co-localizes with CRG hybridization on metaphase mitotic chromosomes. G. hirsutum is an allotetraploid composed of A and D genomes and CRG is also present in the centromere regions of other AD cotton species. Furthermore, FISH and genomic dot blot hybridization revealed that CRG is found in D-genome diploid cotton species, but not in A-genome diploid species, indicating that this retroelement may have invaded the A-genome centromeres during allopolyploid formation and amplified during evolutionary history. CRG is also found in other diploid Gossypium species, including B and E2 genome species, but not in the C, E1, F, and G genome species tested. Isolation of this centromere-specific retrotransposon from Gossypium provides a probe for further understanding of centromere structure, and a tool for future engineering of centromere mini-chromosomes in this important crop species.
The Cotton Centromere Contains a Ty3-gypsy-like LTR Retroelement
Luo, Song; Mach, Jennifer; Abramson, Bradley; Ramirez, Rolando; Schurr, Robert; Barone, Pierluigi; Copenhaver, Gregory; Folkerts, Otto
2012-01-01
The centromere is a repeat-rich structure essential for chromosome segregation; with the long-term aim of understanding centromere structure and function, we set out to identify cotton centromere sequences. To isolate centromere-associated sequences from cotton, (Gossypium hirsutum) we surveyed tandem and dispersed repetitive DNA in the genus. Centromere-associated elements in other plants include tandem repeats and, in some cases, centromere-specific retroelements. Examination of cotton genomic survey sequences for tandem repeats yielded sequences that did not localize to the centromere. However, among the repetitive sequences we also identified a gypsy-like LTR retrotransposon (Centromere Retroelement Gossypium, CRG) that localizes to the centromere region of all chromosomes in domestic upland cotton, Gossypium hirsutum, the major commercially grown cotton. The location of the functional centromere was confirmed by immunostaining with antiserum to the centromere-specific histone CENH3, which co-localizes with CRG hybridization on metaphase mitotic chromosomes. G. hirsutum is an allotetraploid composed of A and D genomes and CRG is also present in the centromere regions of other AD cotton species. Furthermore, FISH and genomic dot blot hybridization revealed that CRG is found in D-genome diploid cotton species, but not in A-genome diploid species, indicating that this retroelement may have invaded the A-genome centromeres during allopolyploid formation and amplified during evolutionary history. CRG is also found in other diploid Gossypium species, including B and E2 genome species, but not in the C, E1, F, and G genome species tested. Isolation of this centromere-specific retrotransposon from Gossypium provides a probe for further understanding of centromere structure, and a tool for future engineering of centromere mini-chromosomes in this important crop species. PMID:22536361
A naturally occurring, noncanonical GTP aptamer made of simple tandem repeats
Curtis, Edward A; Liu, David R
2014-01-01
Recently, we used in vitro selection to identify a new class of naturally occurring GTP aptamer called the G motif. Here we report the discovery and characterization of a second class of naturally occurring GTP aptamer, the “CA motif.” The primary sequence of this aptamer is unusual in that it consists entirely of tandem repeats of CA-rich motifs as short as three nucleotides. Several active variants of the CA motif aptamer lack the ability to form consecutive Watson-Crick base pairs in any register, while others consist of repeats containing only cytidine and adenosine residues, indicating that noncanonical interactions play important roles in its structure. The circular dichroism spectrum of the CA motif aptamer is distinct from that of A-form RNA and other major classes of nucleic acid structures. Bioinformatic searches indicate that the CA motif is absent from most archaeal and bacterial genomes, but occurs in at least 70 percent of approximately 400 eukaryotic genomes examined. These searches also uncovered several phylogenetically conserved examples of the CA motif in rodent (mouse and rat) genomes. Together, these results reveal the existence of a second class of naturally occurring GTP aptamer whose sequence requirements, like that of the G motif, are not consistent with those of a canonical secondary structure. They also indicate a new and unexpected potential biochemical activity of certain naturally occurring tandem repeats. PMID:24824832
The Peculiar Landscape of Repetitive Sequences in the Olive (Olea europaea L.) Genome
Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea
2014-01-01
Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome. PMID:24671744
The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.
Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea
2014-04-01
Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.
Complex structure of knob DNA on maize chromosome 9. Retrotransposon invasion into heterochromatin.
Ananiev, E V; Phillips, R L; Rines, H W
1998-01-01
The recovery of maize (Zea mays L.) chromosome addition lines of oat (Avena sativa L.) from oat x maize crosses enables us to analyze the structure and composition of specific regions, such as knobs, of individual maize chromosomes. A DNA hybridization blot panel of eight individual maize chromosome addition lines revealed that 180-bp repeats found in knobs are present in each of these maize chromosomes, but the copy number varies from approximately 100 to 25, 000. Cosmid clones with knob DNA segments were isolated from a genomic library of an oat-maize chromosome 9 addition line with the help of the 180-bp knob-associated repeated DNA sequence used as a probe. Cloned knob DNA segments revealed a complex organization in which blocks of tandemly arranged 180-bp repeating units are interrupted by insertions of other repeated DNA sequences, mostly represented by individual full size copies of retrotransposable elements. There is an obvious preference for the integration of retrotransposable elements into certain sites (hot spots) of the 180-bp repeat. Sequence microheterogeneity including point mutations and duplications was found in copies of 180-bp repeats. The 180-bp repeats within an array all had the same polarity. Restriction maps constructed for 23 cloned knob DNA fragments revealed the positions of polymorphic sites and sites of integration of insertion elements. Discovery of the interspersion of retrotransposable elements among blocks of tandem repeats in maize and some other organisms suggests that this pattern may be basic to heterochromatin organization for eukaryotes. PMID:9691055
Bhatia, S; Singh Negi, M; Lakshmikumaran, M
1996-11-01
EcoRI restriction of the B. nigra rDNA recombinants, isolated from a lambda genomic library, showed that the 3.9-kb fragment corresponded to the Intergenic Spacer (IGS), which was sequenced and found to be 3,928 bp in size. Sequence and dot-matrix analyses showed that the organization of the B. nigra rDNA IGS was typical of most rDNA spacers, consisting of a central repetitive region and flanking unique sequences on either side. The repetitive region was composed of two repeat families-RF 'A' and RF 'B.' The B. nigra RF 'A' consisted of a tandem array of three full-length copies of a 106-bp sequence element. RF 'B' was composed of 66 tandemly repeated elements. Each 'B' element was only 21-bp in size and this is the smallest repeat unit identified in plant rDNA to date. The putative transcription initiation site (TIS) was identified as nucleotide position 3,110. Based on the sequence analysis it was suggested that the present organization of the repeat families was generated by successive cycles of deletions and amplifications and was being maintained by homogenization processes such as gene conversion and crossing-over.A detailed comparison of the rDNA IGS sequences of the three diploid Brassica species-namely, B. nigra, B. campestris, and B. oleracea-was carried out. First, comparisons revealed that B. campestris and B. oleracea were close to each other as the repeat families in both showed high sequence homology between each other. Second, the repeat elements in both the species were organized in an interspersed manner. Third, a 52-bp sequence, present just downstream of the repeats in B. campestris, was found to be identical to the B. oleracea repeats, thereby suggesting a common progenitor. On the other hand, in B. nigra no interspersion pattern of organization of repeats was observed. Further, the B. nigra RF 'A' was identified as distinct from the repeat families of B. campestris and B. oleracea. Based on this analysis, it was suggested that during speciation B. campestris and B. oleracea evolved in one lineage whereas B. nigra diverged into a separate lineage. The comparative analysis of the IGS helped in identifying not only conserved ancestral sequence motifs of possible functional significance such as promoters and enhancers, but also sequences which showed variation between the three diploid species and were therefore identified as species-specific sequences.
Microsatellite diversity of isolates of the parasitic nematode Haemonchus contortus.
Otsen, M; Plas, M E; Lenstra, J A; Roos, M H; Hoekstra, R
2000-09-01
The alarming development of anthelmintic resistance in important gastrointestinal nematode parasites of man and live-stock is caused by selection for specific genotypes. In order to provide genetic tools to study the nematode populations and the consequences of anthelmintic treatment, we isolated and sequenced 59 microsatellites of the sheep and goat parasite Haemonchus contortus. These microsatellites consist typically of 2-10 tandems CA/GT repeats that are interrupted by sequences of 1-10 bp. A predominant cause of the imperfect structure of the microsatellites appeared mutations of G/C bp in the tandem repeat. About 44% of the microsatellites were associated with the HcREP1 direct repeat, and it was demonstrated that a generic HcREP1 primer could be used to amplify HcREP1-associated microsatellites. Thirty microsatellites could be typed by polymerase chain reaction (PCR) of which 27 were polymorphic. A number of these markers were used to detect genetic contamination of an experimental inbred population. The microsatellites may also contribute to the genetic mapping of drug resistance genes.
Ponting, C P; Mott, R; Bork, P; Copley, R R
2001-12-01
Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.
Parson, Walther; Ballard, David; Budowle, Bruce; Butler, John M; Gettings, Katherine B; Gill, Peter; Gusmão, Leonor; Hares, Douglas R; Irwin, Jodi A; King, Jonathan L; Knijff, Peter de; Morling, Niels; Prinz, Mechthild; Schneider, Peter M; Neste, Christophe Van; Willuweit, Sascha; Phillips, Christopher
2016-05-01
The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that provide a precise description of the repeat allele structure of a STR marker and variants that may reside in the flanking areas of the repeat region. When a STR contains a complex arrangement of repeat motifs, the level of genetic polymorphism revealed by the sequence data can increase substantially. As repeat structures can be complex and include substitutions, insertions, deletions, variable tandem repeat arrangements of multiple nucleotide motifs, and flanking region SNPs, established capillary electrophoresis (CE) allele descriptions must be supplemented by a new system of STR allele nomenclature, which retains backward compatibility with the CE data that currently populate national DNA databases and that will continue to be produced for the coming years. Thus, there is a pressing need to produce a standardized framework for describing complex sequences that enable comparison with currently used repeat allele nomenclature derived from conventional CE systems. It is important to discern three levels of information in hierarchical order (i) the sequence, (ii) the alignment, and (iii) the nomenclature of STR sequence data. We propose a sequence (text) string format the minimal requirement of data storage that laboratories should follow when adopting MPS of STRs. We further discuss the variant annotation and sequence comparison framework necessary to maintain compatibility among established and future data. This system must be easy to use and interpret by the DNA specialist, based on a universally accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need to follow updated rules and be generated by expert systems that translate MPS sequences to match CE conventions in order to guarantee compatibility between the different generations of STR data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Mycosphaerella fijiensis, the causal agent of banana leaf streak disease (commonly known as black Sigatoka), is the most devastating pathogen attacking bananas (Musa spp). Recently the whole genome sequence of M. fijiensis became available. This sequence was screened for the presence of Variable Num...
Production of monoclonal antibody, PR81, recognizing the tandem repeat region of MUC1 mucin.
Paknejad, M; Rasaee, M J; Tehrani, F Karami; Kashanian, S; Mohagheghi, M A; Omidfar, K; Bazl, M Rajabi
2003-06-01
A monoclonal antibody (MAb) was generated by immunizing BALB/c mice with homogenized breast cancerous tissues. This antibody (PR81) was found to be of IgG(1) class and subclass, containing kappa light chain. PR81 reacted with either the membrane extracts of several breast cancerous tissues or the cell surface of some MUC1 positive cell lines (MCF-7, BT-20 and T-47D) tested by enzyme immunoassay and for MCF-7 by immunofluorescence method. PR81 also reacted with two synthetic 27 and 16-amino acid peptides, TSA-P1-24 and A-P1-15, respectively, which included the core tandem repeat sequence of MUC1. However, this antibody did not react with a synthetic 14 amino acid peptide that has no similarity with tandem repeat found in MUC1. The generated antibody had good and similar affinities (2.19 x 10(8) M(-1)) toward TSA-P1-24 and A-P1-15, which are mainly shared in the hydrophilic sequence of PDTRPAP. Through Western blot analysis of homogenized breast tissues, PR81 recognized only a major band of 250 kDa. This band is stronger in malignant tissue than benign and normal tissues.
Analysis of an "off-ladder" allele at the Penta D short tandem repeat locus.
Yang, Y L; Wang, J G; Wang, D X; Zhang, W Y; Liu, X J; Cao, J; Yang, S L
2015-11-25
Kinship testing of a father and his son from Guangxi, China, the location of the Zhuang minority people, was performed using the PowerPlex® 18D System with a short tandem repeat typing kit. The results indicated that both the father and his son had an off-ladder allele at the Penta D locus, with a genetic size larger than that of the maximal standard allelic ladder. To further identify this locus, monogenic amplification, gene cloning, and genetic sequencing were performed. Sequencing analysis demonstrated that the fragment size of the Penta D-OL locus was 469 bp and the core sequence was [AAAGA]21, also called Penta D-21. The rare Penta D-21 allele was found to be distributed among the Zhuang population from the Guangxi Zhuang Autonomous Region of China; therefore, this study improved the range of DNA data available for this locus and enhanced our ability for individual identification of gene loci.
An annotated genetic map of loblolly pine based on microsatellite and cDNA markers
USDA-ARS?s Scientific Manuscript database
Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective o...
Pourcel, Christine; Minandri, Fabrizia; Hauck, Yolande; D'Arezzo, Silvia; Imperi, Francesco; Vergnaud, Gilles; Visca, Paolo
2011-01-01
Acinetobacter baumannii is an important opportunistic pathogen responsible for nosocomial outbreaks, mostly occurring in intensive care units. Due to the multiplicity of infection sources, reliable molecular fingerprinting techniques are needed to establish epidemiological correlations among A. baumannii isolates. Multiple-locus variable-number tandem-repeat analysis (MLVA) has proven to be a fast, reliable, and cost-effective typing method for several bacterial species. In this study, an MLVA assay compatible with simple PCR- and agarose gel-based electrophoresis steps as well as with high-throughput automated methods was developed for A. baumannii typing. Preliminarily, 10 potential polymorphic variable-number tandem repeats (VNTRs) were identified upon bioinformatic screening of six annotated genome sequences of A. baumannii. A collection of 7 reference strains plus 18 well-characterized isolates, including unique types and representatives of the three international A. baumannii lineages, was then evaluated in a two-center study aimed at validating the MLVA assay and comparing it with other genotyping assays, namely, macrorestriction analysis with pulsed-field gel electrophoresis (PFGE) and PCR-based sequence group (SG) profiling. The results showed that MLVA can discriminate between isolates with identical PFGE types and SG profiles. A panel of eight VNTR markers was selected, all showing the ability to be amplified and good amounts of polymorphism in the majority of strains. Independently generated MLVA profiles, composed of an ordered string of allele numbers corresponding to the number of repeats at each VNTR locus, were concordant between centers. Typeability, reproducibility, stability, discriminatory power, and epidemiological concordance were excellent. A database containing information and MLVA profiles for several A. baumannii strains is available from http://mlva.u-psud.fr/. PMID:21147956
Techaruvichit, Punnida; Vesaratchavest, Mongkol; Keeratipibul, Suwimon; Kuda, Takashi; Kimura, Bon
2015-01-01
Campylobacter jejuni is a common cause of the frequently reported food-borne diseases in developed and developing nations. This study describes the development of multiple-locus variable-number tandem-repeat (VNTR) analysis (MLVA) using capillary electrophoresis as a novel typing method for microbial source tracking and epidemiological investigation of C. jejuni. Among 36 tandem repeat loci detected by the Tandem Repeat Finder program, 7 VNTR loci were selected and used for characterizing 60 isolates recovered from chicken meat samples from retail shops, samples from chicken meat processing factory, and stool samples. The discrimination ability of MLVA was compared with that of multilocus sequence typing (MLST). MLVA (diversity index of 0.97 with 31 MLVA types) provided slightly higher discrimination than MLST (diversity index of 0.95 with 25 MLST types). The overall concordance between MLVA and MLST was estimated at 63% by adjusted Rand coefficient. MLVA predicted MLST type better than MLST predicted MLVA type, as reflected by Wallace coefficient (Wallace coefficient for MLVA to MLST versus MLST to MLVA, 86% versus 51%). MLVA is a useful tool and can be used for effective monitoring of C. jejuni and investigation of epidemics caused by C. jejuni. PMID:26025899
ACCA phosphopeptide recognition by the BRCT repeats of BRCA1.
Ray, Hind; Moreau, Karen; Dizin, Eva; Callebaut, Isabelle; Venezia, Nicole Dalla
2006-06-16
The tumour suppressor gene BRCA1 encodes a 220 kDa protein that participates in multiple cellular processes. The BRCA1 protein contains a tandem of two BRCT repeats at its carboxy-terminal region. The majority of disease-associated BRCA1 mutations affect this region and provide to the BRCT repeats a central role in the BRCA1 tumour suppressor function. The BRCT repeats have been shown to mediate phospho-dependant protein-protein interactions. They recognize phosphorylated peptides using a recognition groove that spans both BRCT repeats. We previously identified an interaction between the tandem of BRCA1 BRCT repeats and ACCA, which was disrupted by germ line BRCA1 mutations that affect the BRCT repeats. We recently showed that BRCA1 modulates ACCA activity through its phospho-dependent binding to ACCA. To delineate the region of ACCA that is crucial for the regulation of its activity by BRCA1, we searched for potential phosphorylation sites in the ACCA sequence that might be recognized by the BRCA1 BRCT repeats. Using sequence analysis and structure modelling, we proposed the Ser1263 residue as the most favourable candidate among six residues, for recognition by the BRCA1 BRCT repeats. Using experimental approaches, such as GST pull-down assay with Bosc cells, we clearly showed that phosphorylation of only Ser1263 was essential for the interaction of ACCA with the BRCT repeats. We finally demonstrated by immunoprecipitation of ACCA in cells, that the whole BRCA1 protein interacts with ACCA when phosphorylated on Ser1263.
Isolation and characterization of microsatellite markers in Fraser fir (Abies fraseri)
S.A. Josserand; K.M. Potter; G. Johnson; J.A. Bowen; J. Frampton; C.D. Nelson
2006-01-01
We describe the isolation and characterization of 14 microsatellite loci from Fraser fir (Abies fraseri). These markers originated from cloned inserts enriched for DNA sequences containing tandem di- and tri-nucleotide repeats. In total, 36 clones were selected, sequenced and evaluated. Polymerase chain reaction (PCR) primers for 14 of these...
A unique chromatin complex occupies young α-satellite arrays of human centromeres
Henikoff, Jorja G.; Thakur, Jitendra; Kasinathan, Sivakanthan; Henikoff, Steven
2015-01-01
The intractability of homogeneous α-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric α-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized α-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100–base pair (bp) DNA wraps in tandem separated by a CENP-B/CENP-C–containing linker, whereas pericentromeric HORs show diffuse positioning. Precise positioning is largely maintained, whereas abundance decreases exponentially with divergence, which suggests that young α-satellite dimers with paired ~100-bp particles mediate evolution of functional human centromeres. Our unbiased strategy for identifying functional centromeric sequences should be generally applicable to tandem repeat arrays that dominate the centromeres of most eukaryotes. PMID:25927077
Shao, Chengchen; Zhang, Yaqi; Zhou, Yueqin; Zhu, Wei; Xu, Hongmei; Liu, Zhiping; Tang, Qiqun; Shen, Yiwen; Xie, Jianhui
2015-01-01
Aim To systemically select and evaluate short tandem repeats (STRs) on the chromosome 14 and obtain new STR loci as expanded genotyping markers for forensic application. Methods STRs on the chromosome 14 were filtered from Tandem Repeats Database and further selected based on their positions on the chromosome, repeat patterns of the core sequences, sequence homology of the flanking regions, and suitability of flanking regions in primer design. The STR locus with the highest heterozygosity and polymorphism information content (PIC) was selected for further analysis of genetic polymorphism, forensic parameters, and the core sequence. Results Among 26 STR loci selected as candidates, D14S739 had the highest heterozygosity (0.8691) and PIC (0.8432), and showed no deviation from the Hardy-Weinberg equilibrium. 14 alleles were observed, ranging in size from 21 to 34 tetranucleotide units in the core region of (GATA)9-18 (GACA)7-12 GACG (GACA)2 GATA. Paternity testing showed no mutations. Conclusion D14S739 is a highly informative STR locus and could be a suitable genetic marker for forensic applications in the Han Chinese population. PMID:26526885
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacobson, D.P.; Schmeling, P.; Sommer, S.S.
Alternating purine and pyrimidine repeats (RY(i)) are an abundant source of polymorphism. The subset with long tandem repeats of GT or AC (GT(i)) have been studied extensively, but cryptic RY(i) (i.e., no single tandem repeat predominates) have received little attention. The factor IX gene has a polymorphic cryptic RY(i) of 142-216 bp. Previously, there were four known polymorphic alleles, of the form AB, A[sub 2]B, A[sub 2]B[sub 2], and A[sub 3]B[sub 2], where A = (GT)(AC)[sub 3](AT)[sub 3](GT)(AT)[sub 4] and B = A with an additional 3' AT dinucleotide. To further characterize this locus, the authors examined more than 1,700more » additional human chromosomes and determined the sequences of the homologous sites in orangutans and chimpanzees. The novel alleles found in humans expand the repertoire of A/B alleles to A[sub 0-4]B[sub 1] and A[sub 1-3]B[sub 2]. The A[sub n]B[sub 2] series are abundant in Caucasians but are absent in blacks and Asians. Conversely, the A[sub 0]B[sub 1] allele is common in blacks but is not found in more than 1,700 Caucasian chromosomes. The data are compatible with a model in which recombination is more frequent than polymerase slippage at this locus. In orangutans, the RY(i) is present, but the sequence is markedly different. An A/B-type of pattern was discerned in which B differs from A by an additional six (AT) dinucleotides at the 3' end. In chimpanzees, the size of the RY(i) locus was greatly expanded, and the sequence showed a novel pattern of hypervariability in which there are many tandem repeats of the form (GT)[sub n](AC)[sub 0](AT)[sub p](GT)[sub q](AT)[sub s], where n, o, p, q, and s are different integers. The sequences of the factor IX intron 1 cryptic RY(i) in three primates provide perspective on the range of possible patterns of polymorphism. Analysis of the patterns suggests how the RY(i) can be conserved during evolution, while the precise sequence varies. 25 refs., 5 figs., 3 tabs.« less
Molecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods
2016-01-01
Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory conditions and the specific features of the geographic region is important. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis is considered the gold standard for the molecular epidemiologic investigations of tuberculosis. However, other polymerase chain reaction-based methods such as spacer oligonucleotide typing (spoligotyping), which detects 43 spacer sequence-interspersing direct repeats (DRs) in the genomic DR region; mycobacterial interspersed repetitive units–variable number tandem repeats, (MIRU-VNTR), which determines the number and size of tandem repetitive DNA sequences; repetitive-sequence-based PCR (rep-PCR), which provides high-throughput genotypic fingerprinting of multiple Mycobacterium species; and the recently developed genome-based whole genome sequencing methods demonstrate similar discriminatory power and greater convenience. This review focuses on techniques frequently used for the molecular typing of M. tuberculosis and discusses their general aspects and applications. PMID:27709842
Molecular Strain Typing of Mycobacterium tuberculosis: a Review of Frequently Used Methods.
Ei, Phyu Win; Aung, Wah Wah; Lee, Jong Seok; Choi, Go Eun; Chang, Chulhun L
2016-11-01
Tuberculosis, caused by the bacterium Mycobacterium tuberculosis, remains one of the most serious global health problems. Molecular typing of M. tuberculosis has been used for various epidemiologic purposes as well as for clinical management. Currently, many techniques are available to type M. tuberculosis. Choosing the most appropriate technique in accordance with the existing laboratory conditions and the specific features of the geographic region is important. Insertion sequence IS6110-based restriction fragment length polymorphism (RFLP) analysis is considered the gold standard for the molecular epidemiologic investigations of tuberculosis. However, other polymerase chain reaction-based methods such as spacer oligonucleotide typing (spoligotyping), which detects 43 spacer sequence-interspersing direct repeats (DRs) in the genomic DR region; mycobacterial interspersed repetitive units-variable number tandem repeats, (MIRU-VNTR), which determines the number and size of tandem repetitive DNA sequences; repetitive-sequence-based PCR (rep-PCR), which provides high-throughput genotypic fingerprinting of multiple Mycobacterium species; and the recently developed genome-based whole genome sequencing methods demonstrate similar discriminatory power and greater convenience. This review focuses on techniques frequently used for the molecular typing of M. tuberculosis and discusses their general aspects and applications.
Guo, Xianwu; Castillo-Ramírez, Santiago; González, Víctor; Bustos, Patricia; Luís Fernández-Vázquez, José; Santamaría, Rosa Isela; Arellano, Jesús; Cevallos, Miguel A; Dávila, Guillermo
2007-01-01
Background Fabaceae (legumes) is one of the largest families of flowering plants, and some members are important crops. In contrast to what we know about their great diversity or economic importance, our knowledge at the genomic level of chloroplast genomes (cpDNAs or plastomes) for these crops is limited. Results We sequenced the complete genome of the common bean (Phaseolus vulgaris cv. Negro Jamapa) chloroplast. The plastome of P. vulgaris is a 150,285 bp circular molecule. It has gene content similar to that of other legume plastomes, but contains two pseudogenes, rpl33 and rps16. A distinct inversion occurred at the junction points of trnH-GUG/rpl14 and rps19/rps8, as in adzuki bean [1]. These two pseudogenes and the inversion were confirmed in 10 varieties representing the two domestication centers of the bean. Genomic comparative analysis indicated that inversions generally occur in legume plastomes and the magnitude and localization of insertions/deletions (indels) also vary. The analysis of repeat sequences demonstrated that patterns and sequences of tandem repeats had an important impact on sequence diversification between legume plastomes and tandem repeats did not belong to dispersed repeats. Interestingly, P. vulgaris plastome had higher evolutionary rates of change on both genomic and gene levels than G. max, which could be the consequence of pressure from both mutation and natural selection. Conclusion Legume chloroplast genomes are widely diversified in gene content, gene order, indel structure, abundance and localization of repetitive sequences, intracellular sequence exchange and evolutionary rates. The P. vulgaris plastome is a rapidly evolving genome. PMID:17623083
Interpreting short tandem repeat variations in humans using mutational constraint
Gymrek, Melissa; Willems, Thomas; Reich, David; Erlich, Yaniv
2017-01-01
Identifying regions of the genome that are depleted of mutations can reveal potentially deleterious variants. Short tandem repeats (STRs), also known as microsatellites, are among the largest contributors of de novo mutations in humans. However, per-locus studies of STR mutations have been limited to highly ascertained panels of several dozen loci. Here, we harnessed bioinformatics tools and a novel analytical framework to estimate mutation parameters for each STR in the human genome by correlating STR genotypes with local sequence heterozygosity. We applied our method to obtain robust estimates of the impact of local sequence features on mutation parameters and used this to create a framework for measuring constraint at STRs by comparing observed vs. expected mutation rates. Constraint scores identified known pathogenic variants with early onset effects. Our metric will provide a valuable tool for prioritizing pathogenic STRs in medical genetics studies. PMID:28892063
Single-Stranded Condensation Stochastically Blocks G-Quadruplex Assembly in Human Telomeric RNA.
Gutiérrez, Irene; Garavís, Miguel; de Lorenzo, Sara; Villasante, Alfredo; González, Carlos; Arias-Gonzalez, J Ricardo
2018-05-17
TERRA is an RNA molecule transcribed from human subtelomeric regions toward chromosome ends potentially involved in regulation of heterochromatin stability, semiconservative replication, and telomerase inhibition, among others. TERRA contains tandem repeats of the sequence GGGUUA, with a strong tendency to fold into a four-stranded arrangement known as a parallel G-quadruplex. Here, we demonstrate by using single-molecule force spectroscopy that this potential is limited by the inherent capacity of RNA to self-associate randomly and further condense into entropically more favorable structures. We stretched RNA constructions with more than four and less than eight hexanucleotide repeats, thus unable to form several G-quadruplexes in tandem, flanked by non-G-rich overhangs of random sequence by optical tweezers on a one by one basis. We found that condensed RNA stochastically blocks G-quadruplex folding pathways with a near 20% probability, a behavior that is not found in DNA analogous molecules.
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).
Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo
2013-12-01
The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zweifel,M.; Leahy, D.; Barrick, D.
Deltex is a cytosolic effector of Notch signaling thought to bind through its N-terminal domain to the Notch receptor. Here we report the structure of the Drosophila Deltex N-terminal domain, which contains two tandem WWE sequence repeats. The WWE repeats, which adopt a novel fold, are related by an approximate two-fold axis of rotation. Although the WWE repeats are structurally distinct, they interact extensively and form a deep cleft at their junction that appears well suited for ligand binding. The two repeats are thermodynamically coupled; this coupling is mediated in part by a conserved segment that is immediately C-terminal tomore » the second WWE domain. We demonstrate that although the Deltex WWE tandem is monomeric in solution, it forms a heterodimer with the ankyrin domain of the Notch receptor. These results provide structural and functional insight into how Deltex modulates Notch signaling, and how WWE modules recognize targets for ubiquitination.« less
Expanded complexity of unstable repeat diseases
Polak, Urszula; McIvor, Elizabeth; Dent, Sharon Y.R.; Wells, Robert D.; Napierala, Marek
2015-01-01
Unstable Repeat Diseases (URDs) share a common mutational phenomenon of changes in the copy number of short, tandemly repeated DNA sequences. More than 20 human neurological diseases are caused by instability, predominantly expansion, of microsatellite sequences. Changes in the repeat size initiate a cascade of pathological processes, frequently characteristic of a unique disease or a small subgroup of the URDs. Understanding of both the mechanism of repeat instability and molecular consequences of the repeat expansions is critical to developing successful therapies for these diseases. Recent technological breakthroughs in whole genome, transcriptome and proteome analyses will almost certainly lead to new discoveries regarding the mechanisms of repeat instability, the pathogenesis of URDs, and will facilitate development of novel therapeutic approaches. The aim of this review is to give a general overview of unstable repeats diseases, highlight the complexities of these diseases, and feature the emerging discoveries in the field. PMID:23233240
Divergence, differential methylation and interspersion of melon satellite DNA sequences.
Shmookler Reis, R; Timmis, J N; Ingle, J
1981-01-01
Melon (Cucumis melo) satellite DNA consists of two components, Q and S, each with a buoyant density in CsCl of 1.707 g/ml, but differing by 9 degrees C in "melting" temperature. These physical properties appear to be in contradiction, since both depend on G + C content. In order to resolve this anomaly, base compositions were directly determined for isolated fractions. the low-"melting" component S contains 41.8% G + C, with 6% of C present as 5-methylcytosine, whereas Q DNA contains 54% G + C, with 41% of C methylated. Analyses of restriction site loss agreed well with the direct determinations of methylation and divergence, and indicated some clustering of methylated sites in Q DNA. Analysis of restricted main-band DNA by hydridization with RNA complementary to Q satellite DNA ("Southern transfer") showed satellite Q tandem arrays interspersed in DNA of main-band density. Sequence divergence and extent of methylation did not appear to depend on whether a repeat array was present as satellite or interspersed in main-band DNA. Hydridization in situ indicated considerable heterogeneity in the genomic proportion of the Q-DNA sequences in melon fruit nuclei, implying over- and under-representation consistent with extensive unequal recombination in satellite Q tandem arrays. The cucumber, Cucumis sativus, contains less than 8% as much Q-homologous DNA per genome as the melon, suggesting rapid evolutionary gain or loss of these tandem repeat sequences. Images Fig. 2. PLATE 1 Fig. 4. Fig. 10. PMID:6172117
Nowacka-Woszuk, J; Switonski, M
2010-02-01
Numerous mutations of the human androgen receptor (AR) gene cause an intersexual phenotype, called the androgen insensitivity syndrome. The intersexual phenotype is also quite often diagnosed in dogs. The aim of this study was to conduct a comparative analysis of the entire coding sequence (eight exons) of the AR gene in healthy and four intersex dogs, as well as in three other canids (the red fox, arctic fox and Chinese raccoon dog). The coding sequence of the studied species appeared to be conserved (similarity above 97%) and polymorphism was found in exon 1 only. Altogether, 2 SNPs were identified in healthy dogs, 14 in red foxes, 16 in arctic foxes and 6 were found in Chinese raccoon dogs, respectively. Moreover, a variable number of tandem repeats (CAG and CAA), encoding an array of glutamines, was also observed in this exon. The CAA codon numbers were invariable within species, but the CAG repeats were polymorphic. The highest number of the CAG and CAA repeats was found in dogs (from 40 to 42) and the observed variability was similar in intersex and healthy dogs. In the other canids the variability fell within the following ranges: 29-37 (red fox), 37-39 (arctic fox) and 29-32 (Chinese raccoon dog). In addition, a polymorphic microsatellite marker in intron 2 was found in the dog, red fox and Chinese raccoon dog. It was concluded that the polymorphism level of the AR gene in the dog was lower than in the other canids and none of the detected polymorphisms, including variability of the CAG tandem repeats, could be related with the intersexual phenotype of the studied dogs.
Larracuente, Amanda M
2014-11-25
Satellite DNA can make up a substantial fraction of eukaryotic genomes and has roles in genome structure and chromosome segregation. The rapid evolution of satellite DNA can contribute to genomic instability and genetic incompatibilities between species. Despite its ubiquity and its contribution to genome evolution, we currently know little about the dynamics of satellite DNA evolution. The Responder (Rsp) satellite DNA family is found in the pericentric heterochromatin of chromosome 2 of Drosophila melanogaster. Rsp is well-known for being the target of Segregation Distorter (SD)- an autosomal meiotic drive system in D. melanogaster. I present an evolutionary genetic analysis of the Rsp family of repeats in D. melanogaster and its closely-related species in the melanogaster group (D. simulans, D. sechellia, D. mauritiana, D. erecta, and D. yakuba) using a combination of available BAC sequences, whole genome shotgun Sanger reads, Illumina short read deep sequencing, and fluorescence in situ hybridization. I show that Rsp repeats have euchromatic locations throughout the D. melanogaster genome, that Rsp arrays show evidence for concerted evolution, and that Rsp repeats exist outside of D. melanogaster, in the melanogaster group. The repeats in these species are considerably diverged at the sequence level compared to D. melanogaster, and have a strikingly different genomic distribution, even between closely-related sister taxa. The genomic organization of the Rsp repeat in the D. melanogaster genome is complex-it exists of large blocks of tandem repeats in the heterochromatin and small blocks of tandem repeats in the euchromatin. My discovery of heterochromatic Rsp-like sequences outside of D. melanogaster suggests that SD evolved after its target satellite and that the evolution of the Rsp satellite family is highly dynamic over a short evolutionary time scale (<240,000 years).
Katoh, Hiroshi; Subandiyah, Siti; Tomimura, Kenta; Okuda, Mitsuru; Su, Hong-Ji; Iwanami, Toru
2011-01-01
Four highly polymorphic simple sequence repeat (SSR) loci were selected and used to differentiate 84 Japanese isolates of “Candidatus Liberibacter asiaticus.” The Nei's measure of genetic diversity values for these four SSRs ranged from 0.60 to 0.86. The four SSR loci were also highly polymorphic in four isolates from Taiwan and 12 isolates from Indonesia. PMID:21239554
Okimoto, R; Chamberlin, H M; Macfarlane, J L; Wolstenholme, D R
1991-01-01
Within a 7 kb segment of the mtDNA molecule of the root knot nematode, Meloidogyne javanica, that lacks standard mitochondrial genes, are three sets of strictly tandemly arranged, direct repeat sequences: approximately 36 copies of a 102 ntp sequence that contains a TaqI site; 11 copies of a 63 ntp sequence, and 5 copies of an 8 ntp sequence. The 7 kb repeat-containing segment is bounded by putative tRNAasp and tRNAf-met genes and the arrangement of sequences within this segment is: the tRNAasp gene; a unique 1,528 ntp segment that contains two highly stable hairpin-forming sequences; the 102 ntp repeat set; the 8 ntp repeat set; a unique 1,068 ntp segment; the 63 ntp repeat set; and the tRNAf-met gene. The nucleotide sequences of the 102 ntp copies and the 63 ntp copies have been conserved among the species examined. Data from Southern hybridization experiments indicate that 102 ntp and 63 ntp repeats occur in the mtDNAs of three, two and two races of M.incognita, M.hapla and M.arenaria, respectively. Nucleotide sequences of the M.incognita Race-3 102 ntp repeat were found to be either identical or highly similar to those of the M.javanica 102 ntp repeat. Differences in migration distance and number of 102 ntp repeat-containing bands seen in Southern hybridization autoradiographs of restriction-digested mtDNAs of M.javanica and the different host races of M.incognita, M.hapla and M.arenaria are sufficient to distinguish the different host races of each species. Images PMID:2027769
A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library
Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying
2013-01-01
DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate. PMID:23840477
A simple and efficient method for assembling TALE protein based on plasmid library.
Zhang, Zhiqiang; Li, Duo; Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying
2013-01-01
DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.
Laser mass spectrometry for DNA fingerprinting for forensic applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, C.H.; Tang, K.; Taranenko, N.I.
The application of DNA fingerprinting has become very broad in forensic analysis, patient identification, diagnostic medicine, and wildlife poaching, since every individual`s DNA structure is identical within all tissues of their body. DNA fingerprinting was initiated by the use of restriction fragment length polymorphisms (RFLP). In 1987, Nakamura et al. found that a variable number of tandem repeats (VNTR) often occurred in the alleles. The probability of different individuals having the same number of tandem repeats in several different alleles is very low. Thus, the identification of VNTR from genomic DNA became a very reliable method for identification of individuals.more » DNA fingerprinting is a reliable tool for forensic analysis. In DNA fingerprinting, knowledge of the sequence of tandem repeats and restriction endonuclease sites can provide the basis for identification. The major steps for conventional DNA fingerprinting include (1) specimen processing (2) amplification of selected DNA segments by PCR, and (3) gel electrophoresis to do the final DNA analysis. In this work we propose to use laser desorption mass spectrometry for fast DNA fingerprinting. The process and advantages are discussed.« less
An annotated genetic map of loblolly pine based on microsatellite and cDNA markers
Craig S. Echt; Surya Saha; Konstantin V. Krutovsky; Kokulapalan Wimalanathan; John E. Erpelding; Chun Liang; C Dana Nelson
2011-01-01
Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective of this study was to integrate a large set of SSR markers from a variety...
[Standard algorithm of molecular typing of Yersinia pestis strains].
Eroshenko, G A; Odinokov, G N; Kukleva, L M; Pavlova, A I; Krasnov, Ia M; Shavina, N Iu; Guseva, N P; Vinogradova, N A; Kutyrev, V V
2012-01-01
Development of the standard algorithm of molecular typing of Yersinia pestis that ensures establishing of subspecies, biovar and focus membership of the studied isolate. Determination of the characteristic strain genotypes of plague infectious agent of main and nonmain subspecies from various natural foci of plague of the Russian Federation and the near abroad. Genotyping of 192 natural Y. pestis strains of main and nonmain subspecies was performed by using PCR methods, multilocus sequencing and multilocus analysis of variable tandem repeat number. A standard algorithm of molecular typing of plague infectious agent including several stages of Yersinia pestis differentiation by membership: in main and nonmain subspecies, various biovars of the main subspecies, specific subspecies; natural foci and geographic territories was developed. The algorithm is based on 3 typing methods--PCR, multilocus sequence typing and multilocus analysis of variable tandem repeat number using standard DNA targets--life support genes (terC, ilvN, inv, glpD, napA, rhaS and araC) and 7 loci of variable tandem repeats (ms01, ms04, ms06, ms07, ms46, ms62, ms70). The effectiveness of the developed algorithm is shown on the large number of natural Y. pestis strains. Characteristic sequence types of Y. pestis strains of various subspecies and biovars as well as MLVA7 genotypes of strains from natural foci of plague of the Russian Federation and the near abroad were established. The application of the developed algorithm will increase the effectiveness of epidemiologic monitoring of plague infectious agent, and analysis of epidemics and outbreaks of plague with establishing the source of origin of the strain and routes of introduction of the infection.
Featured Article: Nuclear export of opioid growth factor receptor is CRM1 dependent.
Kren, Nancy P; Zagon, Ian S; McLaughlin, Patricia J
2016-02-01
Opioid growth factor receptor (OGFr) facilitates growth inhibition in the presence of its specific ligand opioid growth factor (OGF), chemically termed [Met(5)]-enkephalin. The function of the OGF-OGFr axis requires the receptor to translocate to the nucleus. However, the mechanism of nuclear export of OGFr is unknown. In this study, endogenous OGFr, as well as exogenously expressed OGFr-EGFP, demonstrated significant nuclear accumulation in response to leptomycin B (LMB), an inhibitor of CRM1-dependent nuclear export, suggesting that OGFr is exported in a CRM1-dependent manner. One consensus sequence for a nuclear export signal (NES) was identified. Mutation of the associated leucines, L217 L220 L223 and L225, to alanine resulted in decreased nuclear accumulation. NES-EGFP responded to LMB, indicating that this sequence is capable of functioning as an export signal in isolation. To determine why the sequence functions differently in isolation than as a full length protein, the localization of subNES was evaluated in the presence and absence of MG132, a potent inhibitor of proteosomal degradation. MG132 had no effect of subNES localization. The role of tandem repeats located at the C-terminus of OGFr was examined for their role in nuclear trafficking. Six of seven tandem repeats were removed to form deltaTR. DeltaTR localized exclusively to the nucleus indicating that the tandem repeats may contribute to the localization of the receptor. Similar to the loss of cellular proliferation activity (i.e. inhibition) recorded with subNES, deltaTR also demonstrated a significant loss of inhibitory activity indicating that the repeats may be integral to receptor function. These experiments reveal that OGFr contains one functional NES, L217 L220 L223 and L225 and can be exported from the nucleus in a CRM1-dependent manner. © 2015 by the Society for Experimental Biology and Medicine.
Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A
2011-01-01
PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
Poulin, L.; Grygiel, P.; Magne, M.; Rodriguez-R, L. M.; Forero Serna, N.; Zhao, S.; El Rafii, M.; Dao, S.; Tekete, C.; Wonni, I.; Koita, O.; Pruvost, O.; Verdier, V.; Vernière, C.
2014-01-01
Multilocus variable-number tandem-repeat analysis (MLVA) is efficient for routine typing and for investigating the genetic structures of natural microbial populations. Two distinct pathovars of Xanthomonas oryzae can cause significant crop losses in tropical and temperate rice-growing countries. Bacterial leaf streak is caused by X. oryzae pv. oryzicola, and bacterial leaf blight is caused by X. oryzae pv. oryzae. For the latter, two genetic lineages have been described in the literature. We developed a universal MLVA typing tool both for the identification of the three X. oryzae genetic lineages and for epidemiological analyses. Sixteen candidate variable-number tandem-repeat (VNTR) loci were selected according to their presence and polymorphism in 10 draft or complete genome sequences of the three X. oryzae lineages and by VNTR sequencing of a subset of loci of interest in 20 strains per lineage. The MLVA-16 scheme was then applied to 338 strains of X. oryzae representing different pathovars and geographical locations. Linkage disequilibrium between MLVA loci was calculated by index association on different scales, and the 16 loci showed linear Mantel correlation with MLSA data on 56 X. oryzae strains, suggesting that they provide a good phylogenetic signal. Furthermore, analyses of sets of strains for different lineages indicated the possibility of using the scheme for deeper epidemiological investigation on small spatial scales. PMID:25398857
Structural features of the rice chromosome 4 centromere.
Zhang, Yu; Huang, Yuchen; Zhang, Lei; Li, Ying; Lu, Tingting; Lu, Yiqi; Feng, Qi; Zhao, Qiang; Cheng, Zhukuan; Xue, Yongbiao; Wing, Rod A; Han, Bin
2004-01-01
A complete sequence of a chromosome centromere is necessary for fully understanding centromere function. We reported the sequence structures of the first complete rice chromosome centromere through sequencing a large insert bacterial artificial chromosome clone-based contig, which covered the rice chromosome 4 centromere. Complete sequencing of the 124-kb rice chromosome 4 centromere revealed that it consisted of 18 tracts of 379 tandemly arrayed repeats known as CentO and a total of 19 centromeric retroelements (CRs) but no unique sequences were detected. Four tracts, composed of 65 CentO repeats, were located in the opposite orientation, and 18 CentO tracts were flanked by 19 retroelements. The CRs were classified into four types, and the type I retroelements appeared to be more specific to rice centromeres. The preferential insert of the CRs among CentO repeats indicated that the centromere-specific retroelements may contribute to centromere expansion during evolution. The presence of three intact retrotransposons in the centromere suggests that they may be responsible for functional centromere initiation through a transcription-mediated mechanism.
De novo generation of plant centromeres at tandem repeats.
Teo, Chee How; Lermontova, Inna; Houben, Andreas; Mette, Michael Florian; Schubert, Ingo
2013-06-01
Artificial minichromosomes are highly desirable tools for basic research, breeding, and biotechnology purposes. We present an option to generate plant artificial minichromosomes via de novo engineering of plant centromeres in Arabidopsis thaliana by targeting kinetochore proteins to tandem repeat arrays at non-centromeric positions. We employed the bacterial lactose repressor/lactose operator system to guide derivatives of the centromeric histone H3 variant cenH3 to LacO operator sequences. Tethering of cenH3 to non-centromeric loci led to de novo assembly of kinetochore proteins and to dicentric carrier chromosomes which potentially form anaphase bridges. This approach will be further developed and may contribute to generating minichromosomes from preselected genomic regions, potentially even in a diploid background.
Zhou, Lijuan; Powell, Charles A.; Hoffman, Michele T.; Li, Wenbin; Fan, Guocheng; Liu, Bo; Lin, Hong; Duan, Yongping
2011-01-01
“Candidatus Liberibacter asiaticus” is a psyllid-transmitted, phloem-limited alphaproteobacterium and the most prevalent species of “Ca. Liberibacter” associated with a devastating worldwide citrus disease known as huanglongbing (HLB). Two related and hypervariable genes (hyvI and hyvII) were identified in the prophage regions of the Psy62 “Ca. Liberibacter asiaticus” genome. Sequence analyses of the hyvI and hyvII genes in 35 “Ca. Liberibacter asiaticus” DNA isolates collected globally revealed that the hyvI gene contains up to 12 nearly identical tandem repeats (NITRs, 132 bp) and 4 partial repeats, while hyvII contains up to 2 NITRs and 4 partial repeats and shares homology with hyvI. Frequent deletions or insertions of these repeats within the hyvI and hyvII genes were observed, none of which disrupted the open reading frames. Sequence conservation within the individual repeats but an extensive variation in repeat numbers, rearrangement, and the sequences flanking the repeat region indicate the diversity and plasticity of “Ca. Liberibacter asiaticus” bacterial populations in the world. These differences were found not only in samples of distinct geographical origins but also in samples from a single origin and even from a single “Ca. Liberibacter asiaticus”-infected sample. This is the first evidence of different “Ca. Liberibacter asiaticus” populations coexisting in a single HLB-affected sample. The Florida “Ca. Liberibacter asiaticus” isolates contain both hyvI and hyvII, while all other global “Ca. Liberibacter asiaticus” isolates contain either one or the other. Interclade assignments of the putative HyvI and HyvII proteins from Florida isolates with other global isolates in phylogenetic trees imply multiple “Ca. Liberibacter asiaticus” populations in the world and a multisource introduction of the “Ca. Liberibacter asiaticus” bacterium into Florida. PMID:21784907
Molecular architecture of classical cytological landmarks: Centromeres and telomeres
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meyne, J.
1994-11-01
Both the human telomere repeat and the pericentromeric repeat sequence (GGAAT)n were isolated based on evolutionary conservation. Their isolation was based on the premise that chromosomal features as structurally and functionally important as telomeres and centromeres should be highly conserved. Both sequences were isolated by high stringency screening of a human repetitive DNA library with rodent repetitive DNA. The pHuR library (plasmid Human Repeat) used for this project was enriched for repetitive DNA by using a modification of the standard DNA library preparation method. Usually DNA for a library is cut with restriction enzymes, packaged, infected, and the library ismore » screened. A problem with this approach is that many tandem repeats don`t have any (or many) common restriction sites. Therefore, many of the repeat sequences will not be represented in the library because they are not restricted to a viable length for the vector used. To prepare the pHuR library, human DNA was mechanically sheared to a small size. These relatively short DNA fragments were denatured and then renatured to C{sub o}t 50. Theoretically only repetitive DNA sequences should renature under C{sub o}t 50 conditions. The single-stranded regions were digested using S1 nuclease, leaving the double-stranded, renatured repeat sequences.« less
Holmes, A; Perry, N; Willshaw, G; Hanson, M; Allison, L
2015-01-01
Multi-locus variable number tandem repeat analysis (MLVA) is used in clinical and reference laboratories for subtyping verocytotoxin-producing Escherichia coli O157 (VTEC O157). However, as yet there is no common allelic or profile nomenclature to enable laboratories to easily compare data. In this study, we carried out an inter-laboratory comparison of an eight-loci MLVA scheme using a set of 67 isolates of VTEC O157. We found all but two isolates were identical in profile in the two laboratories, and repeat units were homogeneous in size but some were incomplete. A subset of the isolates (n = 17) were sequenced to determine the actual copy number of representative alleles, thereby enabling alleles to be named according to international consensus guidelines. This work has enabled us to realize the potential of MLVA as a portable, highly discriminatory and convenient subtyping method.
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.
Pellegrini, Marco; Renda, M Elena; Vecchio, Alessio
2010-06-15
Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events. We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the 'generalized median string' that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences. TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it. Supplementary data are available at Bioinformatics online.
Novel variants of the 5S rRNA genes in Eruca sativa.
Singh, K; Bhatia, S; Lakshmikumaran, M
1994-02-01
The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Vandersmissen, Liesbeth; De Buck, Emmy; Saels, Veerle; Coil, David A; Anné, Jozef
2010-05-01
Legionella pneumophila is a Gram-negative, facultative intracellular pathogen and the causative agent of Legionnaires' disease, a severe pneumonia in humans. Analysis of the Legionella sequenced genomes revealed a gene with a variable number of tandem repeats (VNTRs), whose number varies between strains. We examined the strain distribution of this gene among a collection of 108 clinical, environmental and hot spring serotype I strains. Twelve variants were identified, but no correlation was observed between the number of repeat units and clinical and environmental strains. The encoded protein contains the C-terminal consensus motif of outer membrane proteins and has a large region of collagen-like repeats that is encoded by the VNTR region. We have therefore annotated this protein Lcl for Legionella collagen-like protein. Lcl was shown to contribute to the adherence and invasion of host cells and it was demonstrated that the number of repeat units present in lcl had an influence on these adhesion characteristics.
Simultaneous Differentiation and Typing of Entamoeba histolytica and Entamoeba dispar
Zaki, Mehreen; Meelu, Parool; Sun, Wei; Clark, C. Graham
2002-01-01
Sequences corresponding to some of the polymorphic loci previously reported from Entamoeba histolytica have been detected in Entamoeba dispar. Comparison of nucleotide sequences of two loci between E. dispar strain SAW760 and E. histolytica strain HM-1:IMSS revealed significant differences in both repeat and flanking regions. The tandem repeat units varied not only in sequence but also in number and arrangement between the two species at both the loci. Using the sequences obtained, primer pairs aimed at amplifying species-specific products were designed and tested on a variety of E. histolytica and E. dispar samples. Amplification results were in complete agreement with the original species classification in all cases, and the PCR products displayed discernible size and pattern variations among the isolates. PMID:11923344
Herdewyn, Sarah; Zhao, Hui; Moisse, Matthieu; Race, Valérie; Matthijs, Gert; Reumers, Joke; Kusters, Benno; Schelhaas, Helenius J; van den Berg, Leonard H; Goris, An; Robberecht, Wim; Lambrechts, Diether; Van Damme, Philip
2012-06-01
Motor neuron degeneration in amyotrophic lateral sclerosis (ALS) has a familial cause in 10% of patients. Despite significant advances in the genetics of the disease, many families remain unexplained. We performed whole-genome sequencing in five family members from a pedigree with autosomal-dominant classical ALS. A family-based elimination approach was used to identify novel coding variants segregating with the disease. This list of variants was effectively shortened by genotyping these variants in 2 additional unaffected family members and 1500 unrelated population-specific controls. A novel rare coding variant in SPAG8 on chromosome 9p13.3 segregated with the disease and was not observed in controls. Mutations in SPAG8 were not encountered in 34 other unexplained ALS pedigrees, including 1 with linkage to chromosome 9p13.2-23.3. The shared haplotype containing the SPAG8 variant in this small pedigree was 22.7 Mb and overlapped with the core 9p21 linkage locus for ALS and frontotemporal dementia. Based on differences in coverage depth of known variable tandem repeat regions between affected and non-affected family members, the shared haplotype was found to contain an expanded hexanucleotide (GGGGCC)(n) repeat in C9orf72 in the affected members. Our results demonstrate that rare coding variants identified by whole-genome sequencing can tag a shared haplotype containing a non-coding pathogenic mutation and that changes in coverage depth can be used to reveal tandem repeat expansions. It also confirms (GGGGCC)n repeat expansions in C9orf72 as a cause of familial ALS.
Schmidt, Mirko H H; Broll, Rainer; Bruch, Hans-Peter; Duchrow, Michael
2002-11-01
The Ki-67 antigen, pKi-67, is one of the most commonly used markers of proliferating cells. The protein can only be detected in dividing cells (G(1)-, S-, G(2)-, and M-phase) but not in quiescent cells (G(0)). The standard antibody to detect pKi-67 is MIB-1, which detects the so-called 'Ki-67 motif' FKELF in 9 of the protein's 16 tandem repeats. To investigate the function of these repeats we expressed three of them in an inducible gene expression system in HeLa cells. Surprisingly, addition of a nuclear localization sequence led to a complete absence of signal in the nuclei of MIB-1-stained cells. At the same time antibodies directed against different epitopes of pKi-67 did not fail to detect the protein. We conclude that the overexpression of the 'Ki-67 motif', which is present in the repeats, can lead to inability of MIB-1 to detect its antigen as demonstrated in adenocarcinoma tissue samples. Thereafter, in order to prevent the underestimation of Ki-67 proliferation indices in MIB-1-labeled preparations, additional antibodies (for example, MIB-21) should be used. Additionally, we could show in a mammalian two-hybrid assay that recombinant pKi-67 repeats are capable of self-associating with endogenous pKi-67. Speculating that the tandem repeats are intimately involved in its protein-protein interactions, this offers new insights in how access to these repeats is regulated by pKi-67 itself.
Centromeres: long intergenic spaces with adaptive features.
Kanizay, Lisa; Dawe, R Kelly
2009-08-01
Centromeres are composed of inner kinetochore proteins, which are largely conserved across species, and repetitive DNA, which shows comparatively little sequence conservation. Due to this fundamental paradox the formation and maintenance of centromeres remains largely a mystery. However, it has become increasingly clear that a long-standing balance between epigenetic and genetic control governs the interactions of centromeric DNA and inner kinetochore proteins. The comparison of classical neocentromeres in plants, which are entirely genetic in their mode of operation, and clinical neocentromeres, which are sequence-independent, illustrates the conflict between genetics and epigenetics in regions that control their own transmission to progeny. Tandem repeat arrays present in centromeres may have an origin in meiotic drive or other selfish patterns of evolution, as is the case for the CENP-B box and CENP-B protein in human. In grasses retrotransposons have invaded centromeres to the point of complete domination, consequently breaking genetic regulation at these centromeres. The accumulation of tandem repeats and transposons causes centromeres to expand in size, effectively pushing genes to the sides and opening the centromere to ever fewer constraints on the DNA sequence. On genetic maps centromeres appear as long intergenic spaces that evolve rapidly and apparently without regard to host fitness.
Nagaki, Kiyotaka; Shibata, Fukashi; Kanatani, Asaka; Kashihara, Kazunari; Murata, Minoru
2012-04-01
The centromere is a multi-functional complex comprising centromeric DNA and a number of proteins. To isolate unidentified centromeric DNA sequences, centromere-specific histone H3 variants (CENH3) and chromatin immunoprecipitation (ChIP) have been utilized in some plant species. However, anti-CENH3 antibody for ChIP must be raised in each species because of its species specificity. Production of the antibodies is time-consuming and costly, and it is not easy to produce ChIP-grade antibodies. In this study, we applied a HaloTag7-based chromatin affinity purification system to isolate centromeric DNA sequences in tobacco. This system required no specific antibody, and made it possible to apply a highly stringent wash to remove contaminated DNA. As a result, we succeeded in isolating five tandem repetitive DNA sequences in addition to the centromeric retrotransposons that were previously identified by ChIP. Three of the tandem repeats were centromere-specific sequences located on different chromosomes. These results confirm the validity of the HaloTag7-based chromatin affinity purification system as an alternative method to ChIP for isolating unknown centromeric DNA sequences. The discovery of more than two chromosome-specific centromeric DNA sequences indicates the mosaic structure of tobacco centromeres. © Springer-Verlag 2011
Length Variation in Mitochondrial DNA of the Minnow Cyprinella Spiloptera
Broughton, R. E.; Dowling, T. E.
1994-01-01
Length differences in animal mitochondrial DNA (mtDNA) are common, frequently due to variation in copy number of direct tandem duplications. While such duplications appear to form without great difficulty in some taxonomic groups, they appear to be relatively short-lived, as typical duplication products are geographically restricted within species and infrequently shared among species. To better understand such length variation, we have studied a tandem and direct duplication of approximately 260 bp in the control region of the cyprinid fish, Cyprinella spiloptera. Restriction site analysis of 38 individuals was used to characterize population structure and the distribution of variation in repeat copy number. This revealed two length variants, including individuals with two or three copies of the repeat, and little geographic structure among populations. No standard length (single copy) genomes were found and heteroplasmy, a common feature of length variation in other taxa, was absent. Nucleotide sequence of tandem duplications and flanking regions localized duplication junctions in the phenylalanine tRNA and near the origin of replication. The locations of these junctions and the stability of folded repeat copies support the hypothesized importance of secondary structures in models of duplication formation. PMID:8001785
Pichia stipitis genomics, transcriptomics, and gene clusters
Thomas W. Jeffries; Jennifer R. Headman Van Vleet
2009-01-01
Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-01-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. PMID:24792163
Isolation of human simple repeat loci by hybridization selection.
Armour, J A; Neumann, R; Gobert, S; Jeffreys, A J
1994-04-01
We have isolated short tandem repeat arrays from the human genome, using a rapid method involving filter hybridization to enrich for tri- or tetranucleotide tandem repeats. About 30% of clones from the enriched library cross-hybridize with probes containing trimeric or tetrameric tandem arrays, facilitating the rapid isolation of large numbers of clones. In an initial analysis of 54 clones, 46 different tandem arrays were identified. Analysis of these tandem repeat loci by PCR showed that 24 were polymorphic in length; substantially higher levels of polymorphism were displayed by the tetrameric repeat loci isolated than by the trimeric repeats. Primary mapping of these loci by linkage analysis showed that they derive from 17 chromosomes, including the X chromosome. We anticipate the use of this strategy for the efficient isolation of tandem repeats from other sources of genomic DNA, including DNA from flow-sorted chromosomes, and from other species.
Li, Zheng; Wang, Shu; Gui, Xiao-Ling; Chang, Xiao-Bei; Gong, Zhen-Hui
2013-01-01
Mature pepper (Capsicum sp.) fruits come in a variety of colors, including red, orange, yellow, brown, and white. To better understand the genetic and regulatory relationships between the yellow fruit phenotype and the capsanthin-capsorubin synthase gene (Ccs), we examined 156 Capsicum varieties, most of which were collected from Northwest Chinese landraces. A new ccs variant was identified in the yellow fruit cultivar CK7. Cluster analysis revealed that CK7, which belongs to the C. annuum species, has low genetic similarity to other yellow C. annuum varieties. In the coding sequence of this ccs allele, we detected a premature stop codon derived from a C to G change, as well as a downstream frame-shift caused by a 1-bp nucleotide deletion. In addition, the expression of the gene was detected in mature CK7 fruit. Furthermore, the promoter sequences of Ccs from some pepper varieties were examined, and we detected a 176-bp tandem repeat sequence in the promoter region. In all C. annuum varieties examined in this study, the repeat number was three, compared with four in two C. chinense accessions. The sequence similarity ranged from 84.8% to 97.7% among the four types of repeats, and some putative cis-elements were also found in every repeat. This suggests that the transcriptional regulation of Ccs expression is complex. Based on the analysis of the novel C. annuum mutation reported here, along with the studies of three mutation types in yellow C. annuum and C. chinense accessions, we suggest that the mechanism leading to the production of yellow color fruit may be not as complex as that leading to orange fruit production.
Gui, Xiao-Ling; Chang, Xiao-Bei; Gong, Zhen-Hui
2013-01-01
Mature pepper (Capsicum sp.) fruits come in a variety of colors, including red, orange, yellow, brown, and white. To better understand the genetic and regulatory relationships between the yellow fruit phenotype and the capsanthin-capsorubin synthase gene (Ccs), we examined 156 Capsicum varieties, most of which were collected from Northwest Chinese landraces. A new ccs variant was identified in the yellow fruit cultivar CK7. Cluster analysis revealed that CK7, which belongs to the C. annuum species, has low genetic similarity to other yellow C. annuum varieties. In the coding sequence of this ccs allele, we detected a premature stop codon derived from a C to G change, as well as a downstream frame-shift caused by a 1-bp nucleotide deletion. In addition, the expression of the gene was detected in mature CK7 fruit. Furthermore, the promoter sequences of Ccs from some pepper varieties were examined, and we detected a 176-bp tandem repeat sequence in the promoter region. In all C. annuum varieties examined in this study, the repeat number was three, compared with four in two C. chinense accessions. The sequence similarity ranged from 84.8% to 97.7% among the four types of repeats, and some putative cis-elements were also found in every repeat. This suggests that the transcriptional regulation of Ccs expression is complex. Based on the analysis of the novel C. annuum mutation reported here, along with the studies of three mutation types in yellow C. annuum and C. chinense accessions, we suggest that the mechanism leading to the production of yellow color fruit may be not as complex as that leading to orange fruit production. PMID:23637942
CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats.
Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine
2007-07-01
Clustered regularly interspaced short palindromic repeats (CRISPRs) constitute a particular family of tandem repeats found in a wide range of prokaryotic genomes (half of eubacteria and almost all archaea). They consist of a succession of highly conserved regions (DR) varying in size from 23 to 47 bp, separated by similarly sized unique sequences (spacer) of usually viral origin. A CRISPR cluster is flanked on one side by an AT-rich sequence called the leader and assumed to be a transcriptional promoter. Recent studies suggest that this structure represents a putative RNA-interference-based immune system. Here we describe CRISPRFinder, a web service offering tools to (i) detect CRISPRs including the shortest ones (one or two motifs); (ii) define DRs and extract spacers; (iii) get the flanking sequences to determine the leader; (iv) blast spacers against Genbank database and (v) check if the DR is found elsewhere in prokaryotic sequenced genomes. CRISPRFinder is freely accessible at http://crispr.u-psud.fr/Server/CRISPRfinder.php.
Chobanu, D; Rudykh, I A; Riabinina, N L; Grechko, V V; Kramerov, D A; Darevskiĭ, I S
2002-01-01
The genetic relatedness of several bisexual and of four unisexual "Lacerta saxicola complex" lizards was studied, using monomer sequences of the complex-specific CLsat tandem repeats and anonymous RAPD markers. Genomes of parthenospecies were shown to include different satellite monomers. The structure of each such monomer is specific for a certain pair of bisexual species. This fact might be interpreted in favor of co-dominant inheritance of these markers in bisexual species hybridogenesis. This idea is supported by the results obtained with RAPD markers; i.e., unisexual species genomes include only the loci characteristic of certain bisexual species. At the same time, in neither case parthenospecies possess specific, autoapomorphic loci that were not present in this or that bisexual species.
Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.
2007-01-01
We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688
Molecular Dynamics Simulations of DNA-Free and DNA-Bound TAL Effectors
Wan, Hua; Hu, Jian-ping; Li, Kang-shun; Tian, Xu-hong; Chang, Shan
2013-01-01
TAL (transcriptional activator-like) effectors (TALEs) are DNA-binding proteins, containing a modular central domain that recognizes specific DNA sequences. Recently, the crystallographic studies of TALEs revealed the structure of DNA-recognition domain. In this article, molecular dynamics (MD) simulations are employed to study two crystal structures of an 11.5-repeat TALE, in the presence and absence of DNA, respectively. The simulated results indicate that the specific binding of RVDs (repeat-variable diresidues) with DNA leads to the markedly reduced fluctuations of tandem repeats, especially at the two ends. In the DNA-bound TALE system, the base-specific interaction is formed mainly by the residue at position 13 within a TAL repeat. Tandem repeats with weak RVDs are unfavorable for the TALE-DNA binding. These observations are consistent with experimental studies. By using principal component analysis (PCA), the dominant motions are open-close movements between the two ends of the superhelical structure in both DNA-free and DNA-bound TALE systems. The open-close movements are found to be critical for the recognition and binding of TALE-DNA based on the analysis of free energy landscape (FEL). The conformational analysis of DNA indicates that the 5′ end of DNA target sequence has more remarkable structural deformability than the other sites. Meanwhile, the conformational change of DNA is likely associated with the specific interaction of TALE-DNA. We further suggest that the arrangement of N-terminal repeats with strong RVDs may help in the design of efficient TALEs. This study provides some new insights into the understanding of the TALE-DNA recognition mechanism. PMID:24130757
An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.
Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J
2017-01-18
The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Ohshima, Chihiro; Takahashi, Hajime; Iwakawa, Ai; Kuda, Takashi; Kimura, Bon
2017-07-17
Listeria monocytogenes, which is responsible for causing food poisoning known as listeriosis, infects humans and animals. Widely distributed in the environment, this bacterium is known to contaminate food products after being transmitted to factories via raw materials. To minimize the contamination of products by food pathogens, it is critical to identify and eliminate factory entry routes and pathways for the causative bacteria. High resolution melting analysis (HRMA) is a method that takes advantage of differences in DNA sequences and PCR product lengths that are reflected by the disassociation temperature. Through our research, we have developed a multiple locus variable-number tandem repeat analysis (MLVA) using HRMA as a simple and rapid method to differentiate L. monocytogenes isolates. While evaluating our developed method, the ability of MLVA-HRMA, MLVA using capillary electrophoresis, and multilocus sequence typing (MLST) was compared for their ability to discriminate between strains. The MLVA-HRMA method displayed greater discriminatory ability than MLST and MLVA using capillary electrophoresis, suggesting that the variation in the number of repeat units, along with mutations within the DNA sequence, was accurately reflected by the melting curve of HRMA. Rather than relying on DNA sequence analysis or high-resolution electrophoresis, the MLVA-HRMA method employs the same process as PCR until the analysis step, suggesting a combination of speed and simplicity. The result of MLVA-HRMA method is able to be shared between different laboratories. There are high expectations that this method will be adopted for regular inspections at food processing facilities in the near future. Copyright © 2017. Published by Elsevier B.V.
Plant centromere organization: a dynamic structure with conserved functions.
Ma, Jianxin; Wing, Rod A; Bennetzen, Jeffrey L; Jackson, Scott A
2007-03-01
Although the structural features of centromeres from most multicellular eukaryotes remain to be characterized, recent analyses of the complete sequences of two centromeric regions of rice, together with data from Arabidopsis thaliana and maize, have illuminated the considerable size variation and sequence divergence of plant centromeres. Despite the severe suppression of meiotic chromosomal exchange in centromeric and pericentromeric regions of rice, the centromere core shows high rates of unequal homologous recombination in the absence of chromosomal exchange, resulting in frequent and extensive DNA rearrangement. Not only is the sequence of centromeric tandem and non-tandem repeats highly variable but also the copy number, spacing, order and orientation, providing ample natural variation as the basis for selection of superior centromere performance. This review article focuses on the structural and evolutionary dynamics of plant centromere organization and the potential molecular mechanisms responsible for the rapid changes of centromeric components.
Repeat-containing protein effectors of plant-associated organisms
Mesarich, Carl H.; Bowen, Joanna K.; Hamiaux, Cyril; Templeton, Matthew D.
2015-01-01
Many plant-associated organisms, including microbes, nematodes, and insects, deliver effector proteins into the apoplast, vascular tissue, or cell cytoplasm of their prospective hosts. These effectors function to promote colonization, typically by altering host physiology or by modulating host immune responses. The same effectors however, can also trigger host immunity in the presence of cognate host immune receptor proteins, and thus prevent colonization. To circumvent effector-triggered immunity, or to further enhance host colonization, plant-associated organisms often rely on adaptive effector evolution. In recent years, it has become increasingly apparent that several effectors of plant-associated organisms are repeat-containing proteins (RCPs) that carry tandem or non-tandem arrays of an amino acid sequence or structural motif. In this review, we highlight the diverse roles that these repeat domains play in RCP effector function. We also draw attention to the potential role of these repeat domains in adaptive evolution with regards to RCP effector function and the evasion of effector-triggered immunity. The aim of this review is to increase the profile of RCP effectors from plant-associated organisms. PMID:26557126
Repeat-containing protein effectors of plant-associated organisms.
Mesarich, Carl H; Bowen, Joanna K; Hamiaux, Cyril; Templeton, Matthew D
2015-01-01
Many plant-associated organisms, including microbes, nematodes, and insects, deliver effector proteins into the apoplast, vascular tissue, or cell cytoplasm of their prospective hosts. These effectors function to promote colonization, typically by altering host physiology or by modulating host immune responses. The same effectors however, can also trigger host immunity in the presence of cognate host immune receptor proteins, and thus prevent colonization. To circumvent effector-triggered immunity, or to further enhance host colonization, plant-associated organisms often rely on adaptive effector evolution. In recent years, it has become increasingly apparent that several effectors of plant-associated organisms are repeat-containing proteins (RCPs) that carry tandem or non-tandem arrays of an amino acid sequence or structural motif. In this review, we highlight the diverse roles that these repeat domains play in RCP effector function. We also draw attention to the potential role of these repeat domains in adaptive evolution with regards to RCP effector function and the evasion of effector-triggered immunity. The aim of this review is to increase the profile of RCP effectors from plant-associated organisms.
Multilocus Variable-Number Tandem Repeat Typing of Mycobacterium ulcerans
Ablordey, Anthony; Swings, Jean; Hubans, Christine; Chemlal, Karim; Locht, Camille; Portaels, Françoise; Supply, Philip
2005-01-01
The apparent genetic homogeneity of Mycobacterium ulcerans contributes to the poorly understood epidemiology of M. ulcerans infection. Here, we report the identification of variable number tandem repeat (VNTR) sequences as novel polymorphic elements in the genome of this species. A total of 19 potential VNTR loci identified in the closely related M. marinum genome sequence were screened in a collection of 23 M. ulcerans isolates, one Mycobacterium species referred to here as an intermediate species, and five M. marinum strains. Nine of the 19 loci were polymorphic in the three species (including the intermediate species) and revealed eight M. ulcerans and five M. marinum genotypes. The results from the VNTR analysis corroborated the genetic relationships of M. ulcerans isolates from various geographical origins, as defined by independent molecular markers. Although these results further highlight the extremely high clonal homogeneity within certain geographic regions, we report for the first time the discrimination of the two South American strains from Surinam and French Guyana. These findings support the potential of a VNTR-based genotyping method for strain discrimination within M. ulcerans and M. marinum. PMID:15814964
Mohammadi, Mohammad; Rasaee, Mohammad Javad; Rajabibazl, Masoumeh; Paknejad, Malihe; Zare, Mehrak; Mohammadzadeh, Sara
2007-08-01
PR81 is an anti-MUC1 monoclonal antibody (MAb) which was generated against human MUC1 mucin that reacted with breast cancerous tissue, MUC1 positive cell line (MCF-7, BT-20, and T-4 7 D), and synthetic peptide, including the tandem repeat sequence of MUC1. Here we characterized the binding properties of PR81 against the tandem repeat of MUC1 by two different epitope mapping techniques, namely, PEPSCAN and phage display. Epitope mapping of PR81 MAb by PEPSCAN revealed a minimal consensus binding sequence, PDTRP, which is found on MUC1 peptide as the most important epitope. Using the phage display peptide library, we identified the motif PD(T/S/G)RP as an epitope and the motif AVGLSPDGSRGV as a mimotope recognized by PR81. Results of these two methods showed that the two residues, arginine and aspartic acid, have important roles in antibody binding and threonine can be substituted by either glycine or serine. These results may be of importance in tailor making antigens used in immunoassay.
Phylogenomics of nonavian reptiles and the structure of the ancestral amniote genome
Shedlock, Andrew M.; Botka, Christopher W.; Zhao, Shaying; Shetty, Jyoti; Zhang, Tingting; Liu, Jun S.; Deschavanne, Patrick J.; Edwards, Scott V.
2007-01-01
We report results of a megabase-scale phylogenomic analysis of the Reptilia, the sister group of mammals. Large-scale end-sequence scanning of genomic clones of a turtle, alligator, and lizard reveals diverse, mammal-like landscapes of retroelements and simple sequence repeats (SSRs) not found in the chicken. Several global genomic traits, including distinctive phylogenetic lineages of CR1-like long interspersed elements (LINEs) and a paucity of A-T rich SSRs, characterize turtles and archosaur genomes, whereas higher frequencies of tandem repeats and a lower global GC content reveal mammal-like features in Anolis. Nonavian reptile genomes also possess a high frequency of diverse and novel 50-bp unit tandem duplications not found in chicken or mammals. The frequency distributions of ≈65,000 8-mer oligonucleotides suggest that rates of DNA-word frequency change are an order of magnitude slower in reptiles than in mammals. These results suggest a diverse array of interspersed and SSRs in the common ancestor of amniotes and a genomic conservatism and gradual loss of retroelements in reptiles that culminated in the minimalist chicken genome. PMID:17307883
Oshima, Masao; Kikuchi, Rie; Imamura, Jun; Handa, Hirokazu
2010-01-01
CMS (cytoplasmic male sterile) rapeseed is produced by asymmetrical somatic cell fusion between the Brassica napus cv. Westar and the Raphanus sativus Kosena CMS line (Kosena radish). The CMS rapeseed contains a CMS gene, orf125, which is derived from Kosena radish. Our sequence analyses revealed that the orf125 region in CMS rapeseed originated from recombination between the orf125/orfB region and the nad1C/ccmFN1 region by way of a 63 bp repeat. A precise sequence comparison among the related sequences in CMS rapeseed, Kosena radish and normal rapeseed showed that the orf125 region in CMS rapeseed consisted of the Kosena orf125/orfB region and the rapeseed nad1C/ccmFN1 region, even though Kosena radish had both the orf125/orfB region and the nad1C/ccmFN1 region in its mitochondrial genome. We also identified three tandem repeat sequences in the regions surrounding orf125, including a 63 bp repeat, which were involved in several recombination events. Interestingly, differences in the recombination activity for each repeat sequence were observed, even though these sequences were located adjacent to each other in the mitochondrial genome. We report results indicating that recombination events within the mitochondrial genomes are regulated at the level of specific repeat sequences depending on the cellular environment.
Okazaki, Satoshi; Schirripa, Marta; Loupakis, Fotios; Cao, Shu; Zhang, Wu; Yang, Dongyun; Ning, Yan; Berger, Martin D; Miyamoto, Yuji; Suenaga, Mitsukuni; Iqubal, Syma; Barzi, Afsaneh; Cremolini, Chiara; Falcone, Alfredo; Battaglin, Francesca; Salvatore, Lisa; Borelli, Beatrice; Helentjaris, Timothy G; Lenz, Heinz-Josef
2017-11-15
The hypermethylated in cancer 1/sirtuin 1 (HIC1/SIRT1) axis plays an important role in regulating the nucleotide excision repair pathway, which is the main oxaliplatin-induced damage-repair system. On the basis of prior evidence that the variable number of tandem repeat (VNTR) sequence located near the promoter lesion of HIC1 is associated with HIC1 gene expression, the authors tested the hypothesis that this VNTR is associated with clinical outcome in patients with metastatic colorectal cancer who receive oxaliplatin-based chemotherapy. Four independent cohorts were tested. Patients who received oxaliplatin-based chemotherapy served as the training cohort (n = 218), and those who received treatment without oxaliplatin served as the control cohort (n = 215). Two cohorts of patients who received oxaliplatin-based chemotherapy were used for validation studies (n = 176 and n = 73). The VNTR sequence near HIC1 was analyzed by polymerase chain reaction analysis and gel electrophoresis and was tested for associations with the response rate, progression-free survival, and overall survival. In the training cohort, patients who harbored at least 5 tandem repeats (TRs) in both alleles had a significantly shorter PFS compared with those who had fewer than 4 TRs in at least 1 allele (9.5 vs 11.6 months; hazard ratio, 1.93; P = .012), and these findings remained statistically significant after multivariate analysis (hazard ratio, 2.00; 95% confidence interval, 1.13-3.54; P = .018). This preliminary association was confirmed in the validation cohort, and patients who had at least 5 TRs in both alleles had a worse PFS compared with the other cohort (7.9 vs 9.8 months; hazard ratio, 1.85; P = .044). The current findings suggest that the VNTR sequence near HIC1 could be a predictive marker for oxaliplatin-based chemotherapy in patients with metastatic colorectal cancer. Cancer 2017;123:4506-14. © 2017 American Cancer Society. © 2017 American Cancer Society.
Crystal structure of tandem type III fibronectin domains from Drosophila neuroglian at 2.0 A.
Huber, A H; Wang, Y M; Bieber, A J; Bjorkman, P J
1994-04-01
We report the crystal structure of two adjacent fibronectin type III repeats from the Drosophila neural cell adhesion molecule neuroglian. Each domain consists of two antiparallel beta sheets and is folded topologically identically to single fibronectin type III domains from the extracellular matrix proteins tenascin and fibronectin. beta bulges and left-handed polyproline II helices disrupt the regular beta sheet structure of both neuroglian domains. The hydrophobic interdomain interface includes a metal-binding site, presumably involved in stabilizing the relative orientation between domains and predicted by sequence comparision to be present in the vertebrate homolog molecule L1. The neuroglian domains are related by a near perfect 2-fold screw axis along the longest molecular dimension. Using this relationship, a model for arrays of tandem fibronectin type III repeats in neuroglian and other molecules is proposed.
Han, Yonghua; Wang, Guixiang; Liu, Zhao; Liu, Jinhua; Yue, Wei; Song, Rentao; Zhang, Xueyong; Jin, Weiwei
2010-02-01
Knowledge about the composition and structure of centromeres is critical for understanding how centromeres perform their functional roles. Here, we report the sequences of one centromere-associated bacterial artificial chromosome clone from a Coix lacryma-jobi library. Two Ty3/gypsy-class retrotransposons, centromeric retrotransposon of C. lacryma-jobi (CRC) and peri-centromeric retrotransposon of C. lacryma-jobi, and a (peri)centromere-specific tandem repeat with a unit length of 153 bp were identified. The CRC is highly homologous to centromere-specific retrotransposons reported in grass species. An 80-bp DNA region in the 153-bp satellite repeat was found to be conserved to centromeric satellite repeats from maize, rice, and pearl millet. Fluorescence in situ hybridization showed that the three repetitive sequences were located in (peri-)centromeric regions of both C. lacryma-jobi and Coix aquatica. However, the 153-bp satellite repeat was only detected on 20 out of the 30 chromosomes in C. aquatica. Immunostaining with an antibody against rice CENH3 indicates that the 153-bp satellite repeat and CRC might be both the major components for functional centromeres, but not all the 153-bp satellite repeats or CRC sequences are associated with CENH3. The evolution of centromeric repeats of C. lacryma-jobi during the polyploidization was discussed.
The Evolution of Dark Matter in the Mitogenome of Seed Beetles
Sayadi, Ahmed; Immonen, Elina; Tellgren-Roth, Christian
2017-01-01
Abstract Animal mitogenomes are generally thought of as being economic and optimized for rapid replication and transcription. We use long-read sequencing technology to assemble the remarkable mitogenomes of four species of seed beetles. These are the largest circular mitogenomes ever assembled in insects, ranging from 24,496 to 26,613 bp in total length, and are exceptional in that some 40% consists of non-coding DNA. The size expansion is due to two very long intergenic spacers (LIGSs), rich in tandem repeats. The two LIGSs are present in all species but vary greatly in length (114–10,408 bp), show very low sequence similarity, divergent tandem repeat motifs, a very high AT content and concerted length evolution. The LIGSs have been retained for at least some 45 my but must have undergone repeated reductions and expansions, despite strong purifying selection on protein coding mtDNA genes. The LIGSs are located in two intergenic sites where a few recent studies of insects have also reported shorter LIGSs (>200 bp). These sites may represent spaces that tolerate neutral repeat array expansions or, alternatively, the LIGSs may function to allow a more economic translational machinery. Mitochondrial respiration in adult seed beetles is based almost exclusively on fatty acids, which reduces the need for building complex I of the oxidative phosphorylation pathway (NADH dehydrogenase). One possibility is thus that the LIGSs may allow depressed transcription of NAD genes. RNA sequencing showed that LIGSs are partly transcribed and transcriptional profiling suggested that all seven mtDNA NAD genes indeed show low levels of transcription and co-regulation of transcription across sexes and tissues. PMID:29048527
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-06-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Phillips, Anastasia; Sotomayor, Cristina; Wang, Qinning; Holmes, Nadine; Furlong, Catriona; Ward, Kate; Howard, Peter; Octavia, Sophie; Lan, Ruiting; Sintchenko, Vitali
2016-09-15
Salmonella Typhimurium (STM) is an important cause of foodborne outbreaks worldwide. Subtyping of STM remains critical to outbreak investigation, yet current techniques (e.g. multilocus variable number tandem repeat analysis, MLVA) may provide insufficient discrimination. Whole genome sequencing (WGS) offers potentially greater discriminatory power to support infectious disease surveillance. We performed WGS on 62 STM isolates of a single, endemic MLVA type associated with two epidemiologically independent, food-borne outbreaks along with sporadic cases in New South Wales, Australia, during 2014. Genomes of case and environmental isolates were sequenced using HiSeq (Illumina) and the genetic distance between them was assessed by single nucleotide polymorphism (SNP) analysis. SNP analysis was compared to the epidemiological context. The WGS analysis supported epidemiological evidence and genomes of within-outbreak isolates were nearly identical. Sporadic cases differed from outbreak cases by a small number of SNPs, although their close relationship to outbreak cases may represent an unidentified common food source that may warrant further public health follow up. Previously unrecognised mini-clusters were detected. WGS of STM can discriminate foodborne community outbreaks within a single endemic MLVA clone. Our findings support the translation of WGS into public health laboratory surveillance of salmonellosis.
Chromosome ends: different sequences may provide conserved functions.
Louis, Edward J; Vershinin, Alexander V
2005-07-01
The structures of specific chromosome regions, centromeres and telomeres, present a number of puzzles. As functions performed by these regions are ubiquitous and essential, their DNA, proteins and chromatin structure are expected to be conserved. Recent studies of centromeric DNA from human, Drosophila and plant species have demonstrated that a hidden universal centromere-specific sequence is highly unlikely. The DNA of telomeres is more conserved consisting of a tandemly repeated 6-8 bp Arabidopsis-like sequence in a majority of organisms as diverse as protozoan, fungi, mammals and plants. However, there are alternatives to short DNA repeats at the ends of chromosomes and for telomere elongation by telomerase. Here we focus on the similarities and diversity that exist among the structural elements, DNA sequences and proteins, that make up terminal domains (telomeres and subtelomeres), and how organisms use these in different ways to fulfil the functions of end-replication and end-protection. Copyright (c) 2005 Wiley Periodicals, Inc.
The sequence and de novo assembly of the giant panda genome
Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2013-01-01
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
Schnare, Murray N.; Collings, James C.; Spencer, David F.; Gray, Michael W.
2000-01-01
In Crithidia fasciculata, the ribosomal RNA (rRNA) gene repeats range in size from ∼11 to 12 kb. This length heterogeneity is localized to a region of the intergenic spacer (IGS) that contains tandemly repeated copies of a 19mer sequence. The IGS also contains four copies of an ∼55 nt repeat that has an internal inverted repeat and is also present in the IGS of Leishmania species. We have mapped the C.fasciculata transcription initiation site as well as two other reverse transcriptase stop sites that may be analogous to the A0 and A′ pre-rRNA processing sites within the 5′ external transcribed spacer (ETS) of other eukaryotes. Features that could influence processing at these sites include two stretches of conserved primary sequence and three secondary structure elements present in the 5′ ETS. We also characterized the C.fasciculata U3 snoRNA, which has the potential for base-pairing with pre-rRNA sequences. Finally, we demonstrate that biosynthesis of large subunit rRNA in both C.fasciculata and Trypanosoma brucei involves 3′-terminal addition of three A residues that are not present in the corresponding DNA sequences. PMID:10982863
Kawano, Mitsuoki; Oshima, Taku; Kasai, Hiroaki; Mori, Hirotada
2002-07-01
Genome sequence analyses of Escherichia coli K-12 revealed four copies of long repetitive elements. These sequences are designated as long direct repeat (LDR) sequences. Three of the repeats (LDR-A, -B, -C), each approximately 500 bp in length, are located as tandem repeats at 27.4 min on the genetic map. Another copy (LDR-D), 450 bp in length and nearly identical to LDR-A, -B and -C, is located at 79.7 min, a position that is directly opposite the position of LDR-A, -B and -C. In this study, we demonstrate that LDR-D encodes a 35-amino-acid peptide, LdrD, the overexpression of which causes rapid cell killing and nucleoid condensation of the host cell. Northern blot and primer extension analysis showed constitutive transcription of a stable mRNA (approximately 370 nucleotides) encoding LdrD and an unstable cis-encoded antisense RNA (approximately 60 nucleotides), which functions as a trans-acting regulator of ldrD translation. We propose that LDR encodes a toxin-antitoxin module. LDR-homologous sequences are not pre-sent on any known plasmids but are conserved in Salmonella and other enterobacterial species.
Resurgence of Pertussis and Emergence of the Ptxp3 Toxin Promoter Allele in South Italy.
Loconsole, Daniela; De Robertis, Anna Lisa; Morea, Anna; Metallo, Angela; Lopalco, Pier Luigi; Chironna, Maria
2018-05-01
Despite universal immunization programs, pertussis remains a major public health concern. This study aimed to describe the pertussis epidemiology in the Puglia region in 2006-2015 and to identify recent polymorphisms in Bordetella pertussis virulence-associated genes. The pertussis cases in 2006-2015 were identified from the National Hospital Discharge Database and the Information System of Infectious Diseases. Samples of pertussis cases in 2014-2016 that were confirmed by the Regional Reference Laboratory were subjected to ptxA, ptxP and prn gene sequencing and, in 10 cases, multiple-locus variable-number tandem repeat analysis. In Puglia in 2006-2015, the pertussis incidence rose from an average of 1.39/100,000 inhabitants in 2006-2013 to 2.56-2.54/100,000 in 2014-2015. In infants <1 year of age, the incidence rose from an average of 60.4/100,000 infants in 2006-2013 to 149.9/100,000 in 2015. Of the 661 cases recorded in 2006-2015, 80.3% required hospitalization; of these, 45.4% were <1 year of age. Of the 80 sequenced samples, the allelic profile ptxA1-ptxP3-prn2 was detected in 74. This variant was detected in both vaccinated and unvaccinated people. Six Bordetella pertussis samples were prn deficient. The multiple-locus variable-number tandem repeat analysis cases exhibited multiple-locus variable-number tandem repeat analysis-type 27. The pertussis incidence in Puglia has risen. The hypervirulent strain was also found in vaccinated people. This suggests bacterial adaptation to the vaccine and raises questions about acellular vaccine effectiveness. Prevention of infant pertussis cases is best achieved by immunizing the pregnant mother. Enhanced surveillance and systematic laboratory confirmation of pertussis should be improved in Italy.
Production of monoclonal antibodies recognising the peptide core of MUC2 intestinal mucin.
Durrant, L G; Jacobs, E; Price, M R
1994-01-01
A peptide based on the tandem repeat sequence of MUC2 mucin was used to produce a series of monoclonal antibodies (MAb). The fine specificity of these antibodies and their implications for MUC2 expression are presented. Three of the MAbs, 996/1, 996/7 and 995/25, were specific to the MUC2p and failed to bind to peptides based on the MUC1,3,4 tandem repeat sequences whereas three others, 994/152, 994/91 and 996/36, cross reacted with the MUC2p and the MUC3 tandem repeat peptide but not the MUC1 and MUC4 peptides. An antigen, affinity purified from a colorectal tumour on one of the MUC2p-specific MAbs, 996/1, was shown to be a high molecular weight polydisperse, mucin-like antigen. Two of the MAbs, 996/1 and 994/152, recognised MUC2 in tissue sections, although the fine specificity varied between the two MAbs, with 994/152 strongly staining gastric, ileum and kidney epithelia, and MAb 996/1 intensely staining colon, liver and prostate tissues. These antibodies also stained a colorectal cell line, and MAb 994/152 also stained a gastric and an ovarian cell line. Six of the MAbs were used to stain colorectal tumour and adjacent 'normal' colonic mucosa sections. All six stained normal mucosa, but only two of the MAbs, 996/1 and 994/91, stained tumour tissue. The staining probably reflects exposure of cryptic epitopes due to varying levels of glycosylation in different tissues. These anti-MUC2p MAbs may help in determining the normal role of MUC2 mucin and how it is subverted in malignancy.
Clonal origins of Vibrio cholerae O1 El Tor strains, Papua New Guinea, 2009-2011.
Horwood, Paul F; Collins, Deirdre; Jonduo, Marinjho H; Rosewell, Alexander; Dutta, Samir R; Dagina, Rosheila; Ropa, Berry; Siba, Peter M; Greenhill, Andrew R
2011-11-01
We used multilocus sequence typing and variable number tandem repeat analysis to determine the clonal origins of Vibrio cholerae O1 El Tor strains from an outbreak of cholera that began in 2009 in Papua New Guinea. The epidemic is ongoing, and transmission risk is elevated within the Pacific region.
Nagahashi, S; Endoh, H; Suzuki, Y; Okada, N
1991-11-20
A previous report from this laboratory showed that in vitro transcription of total genomic DNA of the newt Cynopus pyrrhogaster resulted in a discrete sized 8 S RNA, which represented highly repetitive and transcribable sequences with a glutamic acid tRNA-like structure in the newt genome. We isolated four independent clones from a newt genomic library and determined the complete sequences of three 2000 to 2400 base-pair PstI fragments spanning the 8 S RNA gene. The glutamic acid tRNA-related segment in the 8 S RNA gene contains the CCA sequence expected as the 3' terminus of a tRNA molecule. Further, the 11 nucleotides located 13 nucleotides upstream from one of the two transcription initiation sites of the 8 S RNA were found to be repeated in the region upstream from the termination site, suggesting that the original unit, which is shorter than the 8 S RNA, was retrotransposed via cDNA intermediates from the PolIII transcript. In the upstream region of the 8 S RNA gene, a 360 nucleotide unit containing the glutamic acid tRNA-related segment was found to be duplicated (clones NE1 and NE10) or triplicated (clone NE3). Except for the difference in the number of the 360 nucleotide unit, the three sequences of the 2000 to 2400 base-pair PstI fragment were essentially the same with only a few mutations and minor deletions. Inverse polymerase chain reaction and sequence determination of the products, together with a Southern hybridization experiment, demonstrated that the family consists of a tandemly repeated unit of 3300, 3700 or 4100 base-pairs. Thus during evolution, this family in the newt was created by retroposition via cDNA intermediates, followed by duplication or triplication of the 360 nucleotide unit and multiplication of the 3300 to 4100 base-pair region at the DNA level.
Slack, Andrew T; Dohnt, Michael F; Symonds, Meegan L; Smythe, Lee D
2005-01-01
Background Leptospirosis is a zoonotic disease caused by the genus, Leptospira. Leptospira interrogans is the most common genomospecies implicated in the disease. Epidemiological investigations are needed to distinguish outbreak situations or to trace reservoirs of the organisms. Current methodologies used for typing Leptospira have significant drawbacks. The development of an easy to perform yet high resolution method is needed for this organism. Methods In this study we have searched the available genomic sequence of L. interrogans serovar Copenhageni strain Fiocruz L1-130 for the presence of tandem repeats [1]. These repeats were evaluated against reference strains for diversity. Six loci were selected to create a Multiple Locus Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) to explore the genetic diversity within L. interrogans serovar Australis clinical isolates from Far North Queensland. Results The 39 reference strains used for the development of the method displayed 39 distinct patterns. Diversity Indexes for the loci varied between 0.80 and 0.93 and the number of repeat units at each locus varied between less than one to 52 repeats. When the MLVA was applied to serovar Australis isolates three large clusters were distinguishable, each comprising various hosts including Rattus species, human and canines. Conclusion The MLVA described in this report, was easy to perform, analyse and was reproducible. The loci selected had high diversity allowing discrimination between serovars and also between strains within a serovar. This method provides a starting point on which improvements to the method and comparisons to other techniques can be made. PMID:15987533
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.
Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi
2017-07-01
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Gupta, Rashmi; Mirdha, Bijay Ranjan; Guleria, Randeep; Kumar, Lalit; Luthra, Kalpana; Agarwal, Sanjay Kumar; Sreenivas, Vishnubhatla
2013-01-01
Pneumocystis jirovecii is an opportunistic pathogen that causes severe pneumonia in immunocompromised patients. To study the genetic diversity of P. jirovecii in India the upstream conserved sequence (UCS) region of Pneumocystis genome was amplified, sequenced and genotyped from a set of respiratory specimens obtained from 50 patients with a positive result for nested mitochondrial large subunit ribosomal RNA (mtLSU rRNA) PCR during the years 2005-2008. Of these 50 cases, 45 showed a positive PCR for UCS region. Variations in the tandem repeats in UCS region were characterized by sequencing all the positive cases. Of the 45 cases, one case showed five repeats, 11 cases showed four repeats, 29 cases showed three repeats and four cases showed two repeats. By running amplified DNA from all these cases on a high-resolution gel, mixed infection was observed in 12 cases (26.7%, 12/45). Forty three of 45 cases included in this study had previously been typed at mtLSU rRNA and internal transcribed spacer (ITS) region by our group. In the present study, the genotypes at those two regions were combined with UCS repeat patterns to construct allelic profiles of 43 cases. A total of 36 allelic profiles were observed in 43 isolates indicating high genetic variability. A statistically significant association was observed between mtLSU rRNA genotype 1, ITS type Ea and UCS repeat pattern 4. Copyright © 2012 Elsevier B.V. All rights reserved.
Evolution and selection of Rhg1, a copy-number variant nematode-resistance locus
Lee, Tong Geon; Kumar, Indrajit; Diers, Brian W; Hudson, Matthew E
2015-01-01
The soybean cyst nematode (SCN) resistance locus Rhg1 is a tandem repeat of a 31.2 kb unit of the soybean genome. Each 31.2-kb unit contains four genes. One allele of Rhg1, Rhg1-b, is responsible for protecting most US soybean production from SCN. Whole-genome sequencing was performed, and PCR assays were developed to investigate allelic variation in sequence and copy number of the Rhg1 locus across a population of soybean germplasm accessions. Four distinct sequences of the 31.2-kb repeat unit were identified, and some Rhg1 alleles carry up to three different types of repeat unit. The total number of copies of the repeat varies from 1 to 10 per haploid genome. Both copy number and sequence of the repeat correlate with the resistance phenotype, and the Rhg1 locus shows strong signatures of selection. Significant linkage disequilibrium in the genome outside the boundaries of the repeat allowed the Rhg1 genotype to be inferred using high-density single nucleotide polymorphism genotyping of 15 996 accessions. Over 860 germplasm accessions were found likely to possess Rhg1 alleles. The regions surrounding the repeat show indications of non-neutral evolution and high genetic variability in populations from different geographic locations, but without evidence of fixation of the resistant genotype. A compelling explanation of these results is that balancing selection is in operation at Rhg1. PMID:25735447
Szymanski, Maciej; Karlowski, Wojciech M
2016-01-01
In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.
DeFranco, D; Yamamoto, K R
1986-01-01
The expression of genes fused downstream of the Moloney murine sarcoma virus (MoMSV) long terminal repeat is stimulated by glucocorticoids. We mapped the glucocorticoid response element that conferred this hormonal regulation and found that it is a hormone-dependent transcriptional enhancer, designated Sg; it resides within DNA fragments that also carry a previously described enhancer element (B. Levinson, G. Khoury, G. Vande Woude, and P. Gruss, Nature [London] 295:568-572, 1982), here termed Sa, whose activity is independent of the hormone. Nuclease footprinting revealed that purified glucocorticoid receptor bound at multiple discrete sites within and at the borders of the tandemly repeated sequence motif that defines Sa. The Sa and Sg activities stimulated the apparent efficiency of cognate or heterologous promoter utilization, individually providing modest enhancement and in concert yielding higher levels of activity. A deletion mutant lacking most of the tandem repeat but retaining a single receptor footprint sequence lost Sa activity but still conferred Sg activity. The two enhancer components could also be distinguished physiologically: both were operative within cultured rat fibroblasts, but only Sg activity was detectable in rat exocrine pancreas cells. Therefore, the sequence determinants of Sa and Sg activity may be interdigitated, and when both components are active, the receptor and a putative Sa factor can apparently bind and act simultaneously. We concluded that MoMSV enhancer activity is effected by at least two distinct binding factors, suggesting that combinatorial regulation of promoter function can be mediated even from a single genetic element. Images PMID:3023887
Aktas, Munir; Özübek, Sezayi
2018-04-01
Anaplasma ovis is a widely distributed tick-borne rickettsial pathogen of sheep, goats, and wild ruminants. The aims of this study were to assess the prevalence, associations of Anaplasma ovis in sheep and goats, as well as its genetic diversity based on analysis of the msp1α gene. A total of 416 DNA samples from sheep (n = 236) and goats (n = 180) from four provinces in southeastern Turkey were analyzed by PCR. The overall A. ovis prevalence was 18% (CI 14.4-22.1). The infection rates of A. ovis varied from 15.9% to 21.8% in sampled provinces, and they were not significantly different. There was no difference between Anaplasma ovis infection in sheep (20.3%, CI 15.4-26.0) and goats (15.0%, CI 10.1-21.1) or in infection rate of animals <1 year (21.8%, CI 14.9-30.1) compared to >1 year (16.4%, CI 12.4-21.2). A significant association between A. ovis infection and the presence of Rhipicephalus bursa and Rhipicephalus turanicus was observed (P < 0.05). Prevalence of A. ovis-positive animals was higher in animals showing co-infection with Babesia and Theileria compared to those not co-infected (P < 0.05). The Msp1a amino acid repeats were identified and used for the characterization of A. ovis strains. Forty partial msp1a gene sequences containing the repeated sequences of A. ovis were obtained, and 14 previously undescribed tandem repeats with 33 to 43 amino acids were found. Thirteen A. ovis genotypes were identified based on the structure of Msp1a tandem repeats. The majority of A. ovis isolates exhibited one Msp1a tandem repeat, with a maximum of three. This study revealed the Msp1a could be used as a marker for genotyping A. ovis, and high genetic diversity of A. ovis were found in small ruminants in Turkey. Copyright © 2018 Elsevier B.V. All rights reserved.
Zhang, Yanan; Song, Tao; Pan, Tao; Sun, Xiaonan; Sun, Zhonglou; Qian, Lifu; Zhang, Baowei
2016-07-01
The complete sequence of the mitochondrial genome was determined for Asio flammeus, which is distributed widely in geography. The length of the complete mitochondrial genome was 18,966 bp, containing 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes (PCGs), and 1 non-coding region (D-loop). All the genes were distributed on the H-strand, except for the ND6 subunit gene and eight tRNA genes which were encoded on the L-strand. The D-loop of A. flammeus contained many tandem repeats of varying lengths and repeat numbers. The molecular-based phylogeny showed that our species acted as the sister group to A. capensis and the supported Asio was the monophyletic group.
MSDB: A Comprehensive Database of Simple Sequence Repeats
Avvaru, Akshay Kumar; Saxena, Saketh; Mishra, Rakesh Kumar
2017-01-01
Abstract Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1–6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. PMID:28854643
Clonal Origins of Vibrio cholerae O1 El Tor Strains, Papua New Guinea, 2009–2011
Collins, Deirdre; Jonduo, Marinjho H.; Rosewell, Alexander; Dutta, Samir R.; Dagina, Rosheila; Ropa, Berry; Siba, Peter M.; Greenhill, Andrew R.
2011-01-01
We used multilocus sequence typing and variable number tandem repeat analysis to determine the clonal origins of Vibrio cholerae O1 El Tor strains from an outbreak of cholera that began in 2009 in Papua New Guinea. The epidemic is ongoing, and transmission risk is elevated within the Pacific region. PMID:22099099
He, Xiaoyuan; Wang, Liqin; Wang, Shuishu
2016-04-15
The transcriptional regulator PhoP is an essential virulence factor in Mycobacterium tuberculosis, and it presents a target for the development of new anti-tuberculosis drugs and attenuated tuberculosis vaccine strains. PhoP binds to DNA as a highly cooperative dimer by recognizing direct repeats of 7-bp motifs with a 4-bp spacer. To elucidate the PhoP-DNA binding mechanism, we determined the crystal structure of the PhoP-DNA complex. The structure revealed a tandem PhoP dimer that bound to the direct repeat. The surprising tandem arrangement of the receiver domains allowed the four domains of the PhoP dimer to form a compact structure, accounting for the strict requirement of a 4-bp spacer and the highly cooperative binding of the dimer. The PhoP-DNA interactions exclusively involved the effector domain. The sequence-recognition helix made contact with the bases of the 7-bp motif in the major groove, and the wing interacted with the adjacent minor groove. The structure provides a starting point for the elucidation of the mechanism by which PhoP regulates the virulence of M. tuberculosis and guides the design of screening platforms for PhoP inhibitors.
Syed, Mudasir Ahmad; Bhat, Farooz Ahmad; Balkhi, Masood-ul Hassan; Bhat, Bilal Ahmad
2016-01-01
Schizothoracine fish commonly called snow trouts inhibit the entire network of snow and spring fed cool waters of Kashmir, India. Over 10 species reported earlier, only five species have been found, these include Schizothorax niger, Schizothorax esocinus, Schizothorax plagiostomus, Schizothorax curvifrons and Schizothorax labiatus. The relationship between these species is contradicting. To understand the evolutionary relation of these species, we examined the sequence information of mitochondrial D-loop of 25 individuals representing five species. Sequence alignment showed D-loop region highly variable and length variation was observed in di-nucleotide (TA)n microsatellite between and within species. Interestingly, all these species have (TA)n microsatellite not associated with longer tandem repeats at the 3' end of the mitochondrial control region and do not show heteroplasmy. Our analysis also indicates the presence of four conserved sequence blocks (CSB), CSB-D, CSB-1, CSB-II and CSB-III, four (Termination Associated Sequence) TAS motifs and 15bp pyrimidine block within the mitochondrial control region, that are highly conserved within genus Schizothorax when compared with other species. The phylogenetic analysis carried by Maximum likelihood (ML), Neighbor Joining (NJ) and Bayesian inference (BI) generated almost identical results. The resultant BI tree showed a close genetic relationship of all the five species and supports two distinct grouping of S. esocinus species. Besides the species relation, the presence of length variation in tandem repeats is attributed to differences in predicting the stability of secondary structures. The role of CSBs and TASs, reported so far as main regulatory signals, would explain the conservation of these elements in evolution.
Gorgé, Olivier; Lopez, Stéphanie; Hilaire, Valérie; Lisanti, Olivier; Ramisse, Vincent; Vergnaud, Gilles
2008-01-01
The Shigella genus has historically been separated into four species, based on biochemical assays. The classification within each species relies on serotyping. Recently, genome sequencing and DNA assays, in particular the multilocus sequence typing (MLST) approach, greatly improved the current knowledge of the origin and phylogenetic evolution of Shigella spp. The Shigella and Escherichia genera are now considered to belong to a unique genomospecies. Multilocus variable-number tandem-repeat (VNTR) analysis (MLVA) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses of highly homogeneous bacterial pathogens. Here, we assess the capability of MLVA for Shigella typing. Thirty-two potentially polymorphic VNTRs were selected by analyzing in silico five Shigella genomic sequences and subsequently evaluated. Eventually, a panel of 15 VNTRs was selected (i.e., MLVA15 analysis). MLVA15 analysis of 78 strains or genome sequences of Shigella spp. and 11 strains or genome sequences of Escherichia coli distinguished 83 genotypes. Shigella population cluster analysis gave consistent results compared to MLST. MLVA15 analysis showed capabilities for E. coli typing, providing classification among pathogenic and nonpathogenic E. coli strains included in the study. The resulting data can be queried on our genotyping webpage (http://mlva.u-psud.fr). The MLVA15 assay is rapid, highly discriminatory, and reproducible for Shigella and Escherichia strains, suggesting that it could significantly contribute to epidemiological trace-back analysis of Shigella infections and pathogenic Escherichia outbreaks. Typing was performed on strains obtained mostly from collections. Further studies should include strains of much more diverse origins, including all pathogenic E. coli types. PMID:18216214
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, D.; Weiner, A.M.
1995-12-10
The RNU2 locus encoding human U2 small nuclear RNA (snRNA) is organized as a nearly perfect tandem array containing 5 to 22 copies of a 5.8-kb repeat unit. Just downstream of the U2 snRNA gene in each 5.8-kb repeat unit lies a large (CT){sub n}{center_dot}(GA){sub n} dinucleotide repeat (n {approx} 70). This form of genomic organization, in which one repeat is embedded within another, provides an unusual opportunity to study the balance of forces maintaining the homogeneity of both kinds of repeats. Using a combination of field inversion gel electrophoresis and polymerase chain reaction, we have been able to studymore » the CT microsatellites within individual U2 tandem arrays. We find that the CT microsatellites within an RNU2 allele exhibit significant length polymorphism, despite the remarkable homogeneity of the surrounding U2 repeat units. Length polymorphism is due primarily to loss or gain of CT dinucleotide repeats, but other types of deletions, insertions, and substitutions are also frequent. Polymorphism is greatly reduced in regions where pure (CT){sub n} tracts are interrupted by occasional G residues, suggesting that irregularities stabilize both the length and the sequence of the dinucleotide repeat. We further show that the RNU2 loci of other catarrhine primates (gorilla, chimpanzee, ogangutan, and baboon) contain orthologous CT microsatellites; these also exhibit length polymorphism, but are highly divergent from each other. Thus, although the CT microsatellite is evolving far more rapidly than the rest of the U2 repeat unit, it has persisted through multiple speciation events spanning >35 Myr. The persistence of the CT microsatellite, despite polymorphism and rapid evolution, suggests that it might play a functional role in concerted evolution of the RNU2 loci, perhaps as an initiation site for recombination and/or gene conversion. 70 refs., 5 figs.« less
Evidence of birth-and-death evolution of 5S rRNA gene in Channa species (Teleostei, Perciformes).
Barman, Anindya Sundar; Singh, Mamta; Singh, Rajeev Kumar; Lal, Kuldeep Kumar
2016-12-01
In higher eukaryotes, minor rDNA family codes for 5S rRNA that is arranged in tandem arrays and comprises of a highly conserved 120 bp long coding sequence with a variable non-transcribed spacer (NTS). Initially the 5S rDNA repeats are considered to be evolved by the process of concerted evolution. But some recent reports, including teleost fishes suggested that evolution of 5S rDNA repeat does not fit into the concerted evolution model and evolution of 5S rDNA family may be explained by a birth-and-death evolution model. In order to study the mode of evolution of 5S rDNA repeats in Perciformes fish species, nucleotide sequence and molecular organization of five species of genus Channa were analyzed in the present study. Molecular analyses revealed several variants of 5S rDNA repeats (four types of NTS) and networks created by a neighbor net algorithm for each type of sequences (I, II, III and IV) did not show a clear clustering in species specific manner. The stable secondary structure is predicted and upstream and downstream conserved regulatory elements were characterized. Sequence analyses also shown the presence of two putative pseudogenes in Channa marulius. Present study supported that 5S rDNA repeats in genus Channa were evolved under the process of birth-and-death.
Submegabase Clusters of Unstable Tandem Repeats Unique to the Tla Region of Mouse T Haplotypes
Uehara, H.; Ebersole, T.; Bennett, D.; Artzt, K.
1990-01-01
We describe here the identification and genomic organization of mouse t haplotype-specific elements (TSEs) 7.8 and 5.8 kb in length. The TSEs exist as submegabase-long clusters of tandem repeats localized in the Tla region of the major histocompatibility complex of all t haplotype chromosomes examined. In contrast, no such clusters were detected among 12 inbred strains of Mus musculus and other Mus species; thus, clusters of TSEs represent the first absolutely qualitative difference between t haplotypes and wild-type chromosomes. Pulsed field gel electrophoresis shows that the number of clusters, and the number of repeats in each cluster are extremely variable. Dramatic quantitative differences of TSEs uniquely distinguish every independent t haplotype from any other. The complete nucleotide sequence of one 7.8-kb TSE reveals significant homology to the ETn (a major transcript in the early embryo of the mouse), and some homologies to intracisternal A-particles and the mammary tumor virus env gene. Apart from the diagnostic relevance to t haplotypes, evolutionary and functional significances are discussed with respect to chromosome structure and genetic recombination. PMID:2076812
Structural analysis of two length variants of the rDNA intergenic spacer from Eruca sativa.
Lakshmikumaran, M; Negi, M S
1994-03-01
Restriction enzyme analysis of the rRNA genes of Eruca sativa indicated the presence of many length variants within a single plant and also between different cultivars which is unusual for most crucifers studied so far. Two length variants of the rDNA intergenic spacer (IGS) from a single individual E. sativa (cv. Itsa) plant were cloned and characterized. The complete nucleotide sequences of both the variants (3 kb and 4 kb) were determined. The intergenic spacer contains three families of tandemly repeated DNA sequences denoted as A, B and C. However, the long (4 kb) variant shows the presence of an additional repeat, denoted as D, which is a duplication of a 224 bp sequence just upstream of the putative transcription initiation site. Repeat units belonging to the three different families (A, B and C) were in the size range of 22 to 30 bp. Such short repeat elements are present in the IGS of most of the crucifers analysed so far. Sequence analysis of the variants (3 kb and 4 kb) revealed that the length heterogeneity of the spacer is located at three different regions and is due to the varying copy numbers of repeat units belonging to families A and B. Length variation of the spacer is also due to the presence of a large duplication (D repeats) in the 4 kb variant which is absent in the 3 kb variant. The putative transcription initiation site was identified by comparisons with the rDNA sequences from other plant species.
Rešková, Z; Koreňová, J; Kuchta, T
2014-04-01
A total of 256 isolates of Staphylococcus aureus were isolated from 98 samples (34 swabs and 64 food samples) obtained from small or medium meat- and cheese-processing plants in Slovakia. The strains were genotypically characterized by multiple locus variable number of tandem repeats analysis (MLVA), involving multiplex polymerase chain reaction (PCR) with subsequent separation of the amplified DNA fragments by an automated flow-through gel electrophoresis. With the panel of isolates, MLVA produced 31 profile types, which was a sufficient discrimination to facilitate the description of spatial and temporal aspects of contamination. Further data on MLVA discrimination were obtained by typing a subpanel of strains by multiple locus sequence typing (MLST). MLVA coupled to automated electrophoresis proved to be an effective, comparatively fast and inexpensive method for tracing S. aureus contamination of food-processing factories. Subspecies genotyping of microbial contaminants in food-processing factories may facilitate identification of spatial and temporal aspects of the contamination. This may help to properly manage the process hygiene. With S. aureus, multiple locus variable number of tandem repeats analysis (MLVA) proved to be an effective method for the purpose, being sufficiently discriminative, yet comparatively fast and inexpensive. The application of automated flow-through gel electrophoresis to separation of DNA fragments produced by multiplex PCR helped to improve the accuracy and speed of the method. © 2013 The Society for Applied Microbiology.
Sindhupriya, M; Saravanan, P; Otta, S K; Amarnath, C Bala; Arulraj, R; Bhuvaneswari, T; Praveena, P Ezhil; Jithendran, K P; Ponniah, A G
2014-08-21
White spot syndrome virus (WSSV) replicates rapidly, can be extremely pathogenic and is a common cause of mass mortality in cultured shrimp. Variable number tandem repeat (VNTR) sequences present in the open reading frame (ORF)94, ORF125 and ORF75 regions of the WSSV genome have been used widely as genetic markers in epidemiological studies. However, reports that VNTRs might evolve rapidly following even a single transmission through penaeid shrimp or other crustacean hosts have created confusion as to how VNTR data is interpreted. To examine VNTR stability again, 2 WSSV strains (PmTN4RU and LvAP11RU) with differing ORF94 tandem repeat numbers and slight differences in apparent virulence were passaged sequentially 6 times through black tiger shrimp Penaeus monodon, Indian white shrimp Feneropenaeus indicus or Pacific white leg shrimp Litopenaeus vannamei. PCR analyses to genotype the ORF94, ORF125 and ORF75 VNTRs did not identify any differences from either of the 2 parental WSSV strains after multiple passages through any of the shrimp species. These data were confirmed by sequence analysis and indicate that the stability of the genome regions containing these VNTRs is quite high at least for the WSSV strains, hosts and number of passages examined and that the VNTR sequences thus represent useful genetic markers for studying WSSV epidemiology.
Xu, Shengyong; Song, Na; Lu, Zhichuang; Wang, Jun; Cai, Shanshan; Gao, Tianxiang
2014-06-01
Scaly hair-fin anchovy (Setipinna tenuifilis) is a small, pelagic and economical species and widely distributed in Chinese coastal water. However, resources of S. tenuifilis have been reduced due to overfishing. For better fishery management, it is necessary to understand the pattern of S. tenuifilis's biogeography. Genetic analyses were taken place to detect their population genetic variation. A total of 153 individuals from 7 locations (Dongying, Yantai, Qingdao, Nantong, Wenzhou, Xiamen and Beibu Bay) were sequenced at the 5' end of mtDNA control region. A 39-bp tandem repeated sequence was found at the 5' end of the segment and a polymorphism of tandem repeated sequence was detected among 7 populations. Both mismatch distribution analysis and neutrality tests showed S. tenuifilis had experienced a recent population expansion. The topology of neighbor-joining tree and Bayesian evolutionary tree showed no significant genealogical branches or clusters of samples corresponding to sampling locality. Hierarchical analysis of molecular variance and conventional pairwise population Fst value at group hierarchical level implied that there might have genetic divergence between southern group (population WZ, XM and BB) and northern group (population DY, YT, QD and NT). We concluded that there might have three different fishery management groups of S. tenuifilis and the late Pleistocene glacial event might have a crucial effect on present-day demography of S. tenuifilis in this region.
Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.
Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D
2016-10-01
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Kouprina, Natalay; Noskov, Vladimir N.; Waterfall, Joshua J.; Walker, Robert L.; Meltzer, Paul S.; Topol, Eric J.; Larionov, Vladimir
2018-01-01
Tandem segmental duplications (SDs) greater than 10 kb are widespread in complex genomes. They provide material for gene divergence and evolutionary adaptation, while formation of specific de novo SDs is a hallmark of cancer and some human diseases. Most SDs map to distinct genomic regions termed ‘duplication blocks’. SDs organization within these blocks is often poorly characterized as they are mosaics of ancestral duplicons juxtaposed with younger duplicons arising from more recent duplication events. Structural and functional analysis of SDs is further hampered as long repetitive DNA structures are underrepresented in existing BAC and YAC libraries. We applied Transformation-Associated Recombination (TAR) cloning, a versatile technique for large DNA manipulation, to selectively isolate the coronary artery disease (CAD) interval sequence within the 9p21.3 chromosome locus from a patient with coronary artery disease and normal individuals. Four tandem head-to-tail duplicons, each ∼50 kb long, were recovered in the patient but not in normal individuals. Sequence analysis revealed that the repeats varied by 10-15 SNPs between each other and by 82 SNPs between the human genome sequence (version hg19). SNPs polymorphism within the junctions between repeats allowed two junction types to be distinguished, Type 1 and Type 2, which were found at a 2:1 ratio. The junction sequences contained an Alu element, a sequence previously shown to play a role in duplication. Knowledge of structural variation in the CAD interval from more patients could help link this locus to cardiovascular diseases susceptibility, and maybe relevant to other cases of regional amplification, including cancer. PMID:29632643
Williams, R R; Hassan-Walker, A F; Lavender, F L; Morgan, M; Faik, P; Ragoussis, J
2001-05-16
Minisatellites are tandemly repeated DNA sequences found throughout the genomes of all eukaryotes. They are regions often prone to instability and hence hypervariability; thus repeat unit sequence is generally not conserved beyond closely related species. We have studied the minisatellite located in intron 9 of the human glucose phosphate isomerase (GPI) gene (also known as neuroleukin, autocrine motility factor, maturation and differentiation factor) and have found, by Zoo blotting coupled with PCR amplification and DNA sequencing, that similar repeat units are present in seven other species of mammal. There is also evidence for the presence of the minisatellite in chicken. The repeat unit does not appear to be present at any other locus in these genomes. Minisatellite DNA has been reported to be involved in recombination activity, control of gene expression of nearby gene(s) (both transcriptional and translational), whilst others form protein coding regions. The high level of conservation exhibited by the GPI minisatellite, coupled with the unique location, strongly suggests a functional role. Our results from transient and stable transfections using luciferase reporter constructs have shown that the GPI minisatellite region can act to increase transcription from the SV40 promoter, CMV promoter and the human GPI promoter.
Genetic diversity of Babesia bovis in virulent and attenuated strains.
Mazuz, M L; Molad, T; Fish, L; Leibovitz, B; Wolkomirsky, R; Fleiderovitz, L; Shkap, V
2012-03-01
The aim of this study was to compare the genetic diversity of the single copy Bv80 gene sequences of Babesia bovis in populations of attenuated and virulent parasites. PCR/ RT-PCR followed by cloning and sequence analyses of 4 attenuated and 4 virulent strains were performed. Multiple fragments in the range of 420 to 744 bp were amplified by PCR or RT-PCR. Cloning of the PCR fragments and sequence analyses revealed the presence of mixed subpopulations in either virulent or attenuated parasites with a total of 19 variants with 12 different sequences that differed in number and type of tandem repeats. High levels of intra- and inter-strain diversity of the Bv80 gene, with the presence of mixed populations of parasites were found in both the virulent field isolates and the attenuated vaccine strains. In addition, during the attenuation process, sequence analyses showed changes in the pattern of the parasite subpopulations. Despite high polymorphism found by sequence analyses, the patterns observed and the number of repeats, order, or motifs found could not discriminate between virulent field isolates and attenuated vaccine strains of the parasite.
Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E
1985-01-01
The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815
Length and sequence heterogeneity in 5S rDNA of Populus deltoides.
Negi, Madan S; Rajagopal, Jyothi; Chauhan, Neeti; Cronn, Richard; Lakshmikumaran, Malathi
2002-12-01
The 5S rRNA genes and their associated non-transcribed spacer (NTS) regions are present as repeat units arranged in tandem arrays in plant genomes. Length heterogeneity in 5S rDNA repeats was previously identified in Populus deltoides and was also observed in the present study. Primers were designed to amplify the 5S rDNA NTS variants from the P. deltoides genome. The PCR-amplified products from the two accessions of P. deltoides (G3 and G48) suggested the presence of length heterogeneity of 5S rDNA units within and among accessions, and the size of the spacers ranged from 385 to 434 bp. Sequence analysis of the non-transcribed spacer (NTS) revealed two distinct classes of 5S rDNA within both accessions: class 1, which contained GAA trinucleotide microsatellite repeats, and class 2, which lacked the repeats. The class 1 spacer shows length variation owing to the microsatellite, with two clones exhibiting 10 GAA repeat units and one clone exhibiting 16 such repeat units. However, distance analysis shows that class 1 spacer sequences are highly similar inter se, yielding nucleotide diversity (pi) estimates that are less than 0.15% of those obtained for class 2 spacers (pi = 0.0183 vs. 0.1433, respectively). The presence of microsatellite in the NTS region leading to variation in spacer length is reported and discussed for the first time in P. deltoides.
Characterization of a highly polymorphic region 5′ to JH in the human immunoglobulin heavy chain
Silva, Alcino J.; Johnson, John P.; White, Raymond L.
1987-01-01
A cloned DNA segment 1.25 kilobases (kb) upstream from the joining segments of the human heavy chain immunoglobulin gene revealed extensive polymorphic variation at this locus, and the polymorphic pattern was stably transmitted to the next generation. Genomic restriction analysis showed that the polymorphism was caused by insertions/deletions within an MspI/BamHI fragment. Sequencing of one allele, 848 base pairs (bp) long, revealed eleven 50-base-pair tandem repeats. A second allele, 648 bp long, was cloned from a human genomic cosmid library, sequenced, and found to contain four fewer repeats than the first allele. A survey of 186 chromosomes from unrelated individuals of primarily northern European descent revealed at least six alleles. Images PMID:2884636
Tochio, Naoya; Umehara, Kohei; Uewaki, Jun-ichi; Flechsig, Holger; Kondo, Masaharu; Dewa, Takehisa; Sakuma, Tetsushi; Yamamoto, Takashi; Saitoh, Takashi; Togashi, Yuichi; Tate, Shin-ichi
2016-01-01
Transcription activator-like effector (TALE) nuclease (TALEN) is widely used as a tool in genome editing. The DNA binding part of TALEN consists of a tandem array of TAL-repeats that form a right-handed superhelix. Each TAL-repeat recognises a specific base by the repeat variable diresidue (RVD) at positions 12 and 13. TALEN comprising the TAL-repeats with periodic mutations to residues at positions 4 and 32 (non-RVD sites) in each repeat (VT-TALE) exhibits increased efficacy in genome editing compared with a counterpart without the mutations (CT-TALE). The molecular basis for the elevated efficacy is unknown. In this report, comparison of the physicochemical properties between CT- and VT-TALEs revealed that VT-TALE has a larger amplitude motion along the superhelical axis (superhelical motion) compared with CT-TALE. The greater superhelical motion in VT-TALE enabled more TAL-repeats to engage in the target sequence recognition compared with CT-TALE. The extended sequence recognition by the TAL-repeats improves site specificity with limiting the spatial distribution of FokI domains to facilitate their dimerization at the desired site. Molecular dynamics simulations revealed that the non-RVD mutations alter inter-repeat hydrogen bonding to amplify the superhelical motion of VT-TALE. The TALEN activity is associated with the inter-repeat hydrogen bonding among the TAL repeats. PMID:27883072
The complete mitochondrial genome sequence of the Tibetan red fox (Vulpes vulpes montana).
Zhang, Jin; Zhang, Honghai; Zhao, Chao; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2015-01-01
In this study, the complete mitochondrial genome of the Tibetan red fox (Vulpes Vulpes montana) was sequenced for the first time using blood samples obtained from a wild female red fox captured from Lhasa in Tibet, China. Qinghai--Tibet Plateau is the highest plateau in the world with an average elevation above 3500 m. Sequence analysis showed it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region (CR). The variable tandem repeats in CR is the main reason of the length variability of mitochondrial genome among canide animals.
NASA Astrophysics Data System (ADS)
Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.
1984-08-01
A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.
Fabre, Michel; Koeck, Jean-Louis; Le Flèche, Philippe; Simon, Fabrice; Hervé, Vincent; Vergnaud, Gilles; Pourcel, Christine
2004-01-01
We have analyzed, using complementary molecular methods, the diversity of 43 strains of “Mycobacterium canettii” originating from the Republic of Djibouti, on the Horn of Africa, from 1998 to 2003. Genotyping by multiple-locus variable-number tandem repeat analysis shows that all the strains belong to a single but very distant group when compared to strains of the Mycobacterium tuberculosis complex (MTBC). Thirty-one strains cluster into one large group with little variability and five strains form another group, whereas the other seven are more diverged. In total, 14 genotypes are observed. The DR locus analysis reveals additional variability, some strains being devoid of a direct repeat locus and others having unique spacers. The hsp65 gene polymorphism was investigated by restriction enzyme analysis and sequencing of PCR amplicons. Four new single nucleotide polymorphisms were discovered. One strain was characterized by three nucleotide changes in 441 bp, creating new restriction enzyme polymorphisms. As no sequence variability was found for hsp65 in the whole MTBC, and as a single point mutation separates M. tuberculosis from the closest “M. canettii” strains, this diversity within “M. canettii” subspecies strongly suggests that it is the most probable source species of the MTBC rather than just another branch of the MTBC. PMID:15243089
Zhao, Guangyu; Li, Hu; Zhao, Ping; Cai, Wanzhi
2015-01-01
In this study, we sequenced four new mitochondrial genomes and presented comparative mitogenomic analyses of five species in the genus Peirates (Hemiptera: Reduviidae). Mitochondrial genomes of these five assassin bugs had a typical set of 37 genes and retained the ancestral gene arrangement of insects. The A+T content, AT- and GC-skews were similar to the common base composition biases of insect mtDNA. Genomic size ranges from 15,702 bp to 16,314 bp and most of the size variation was due to length and copy number of the repeat unit in the putative control region. All of the control region sequences included large tandem repeats present in two or more copies. Our result revealed similarity in mitochondrial genomes of P. atromaculatus, P. fulvescens and P. turpis, as well as the highly conserved genomic-level characteristics of these three species, e.g., the same start and stop codons of protein-coding genes, conserved secondary structure of tRNAs, identical location and length of non-coding and overlapping regions, and conservation of structural elements and tandem repeat unit in control region. Phylogenetic analyses also supported a close relationship between P. atromaculatus, P. fulvescens and P. turpis, which might be recently diverged species. The present study indicates that mitochondrial genome has important implications on phylogenetics, population genetics and speciation in the genus Peirates. PMID:25689825
Characterization of genetic sequence variation of 58 STR loci in four major population groups.
Novroski, Nicole M M; King, Jonathan L; Churchill, Jennifer D; Seah, Lay Hong; Budowle, Bruce
2016-11-01
Massively parallel sequencing (MPS) can identify sequence variation within short tandem repeat (STR) alleles as well as their nominal allele lengths that traditionally have been obtained by capillary electrophoresis. Using the MiSeq FGx Forensic Genomics System (Illumina), STRait Razor, and in-house excel workbooks, genetic variation was characterized within STR repeat and flanking regions of 27 autosomal, 7 X-chromosome and 24 Y-chromosome STR markers in 777 unrelated individuals from four population groups. Seven hundred and forty six autosomal, 227 X-chromosome, and 324 Y-chromosome STR alleles were identified by sequence compared with 357 autosomal, 107 X-chromosome, and 189 Y-chromosome STR alleles that were identified by length. Within the observed sequence variation, 227 autosomal, 156 X-chromosome, and 112 Y-chromosome novel alleles were identified and described. One hundred and seventy six autosomal, 123 X-chromosome, and 93 Y-chromosome sequence variants resided within STR repeat regions, and 86 autosomal, 39 X-chromosome, and 20 Y-chromosome variants were located in STR flanking regions. Three markers, D18S51, DXS10135, and DYS385a-b had 1, 4, and 1 alleles, respectively, which contained both a novel repeat region variant and a flanking sequence variant in the same nucleotide sequence. There were 50 markers that demonstrated a relative increase in diversity with the variant sequence alleles compared with those of traditional nominal length alleles. These population data illustrate the genetic variation that exists in the commonly used STR markers in the selected population samples and provide allele frequencies for statistical calculations related to STR profiling with MPS data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Hirosawa, I; Aritomi, K; Hoshida, H; Kashiwagi, S; Nishizawa, Y; Akada, R
2004-07-01
The commercial application of genetically modified industrial microorganisms has been problematic due to public concerns. We constructed a "self-cloning" sake yeast strain that overexpresses the ATF1 gene encoding alcohol acetyltransferase, to improve the flavor profile of Japanese sake. A constitutive yeast overexpression promoter, TDH3p, derived from the glyceraldehyde-3-phosphate dehydrogenase gene from sake yeast was fused to ATF1; and the 5' upstream non-coding sequence of ATF1 was further fused to TDH3p-ATF1. The fragment was placed on a binary vector, pGG119, containing a drug-resistance marker for transformation and a counter-selection marker for excision of unwanted DNA. The plasmid was integrated into the ATF1 locus of a sake yeast strain. This integration constructed tandem repeats of ATF1 and TDH3p-ATF1 sequences, between which the plasmid was inserted. Loss of the plasmid, which occurs through homologous recombination between either the TDH3p downstream ATF1 repeats or the TDH3p upstream repeat sequences, was selected by growing transformants on counter-selective medium. Recombination between the downstream repeats led to reversion to a wild type strain, but that between the upstream repeats resulted in a strain that possessed TDH3p-ATF1 without the extraneous DNA sequences. The self-cloning TDH3p-ATF1 yeast strain produced a higher amount of isoamyl acetate. This is the first expression-controlled self-cloning industrial yeast.
Roychoudhury, Pavitra; Makhsous, Negar; Hanson, Derek; Chase, Jill; Krueger, Gerhard; Xie, Hong; Huang, Meei-Li; Saunders, Lindsay; Ablashi, Dharam; Koelle, David M.; Cook, Linda; Jerome, Keith R.
2018-01-01
ABSTRACT Quantitative PCR is a diagnostic pillar for clinical virology testing, and reference materials are necessary for accurate, comparable quantitation between clinical laboratories. Accurate quantitation of human herpesvirus 6A/B (HHV-6A/B) is important for detection of viral reactivation and inherited chromosomally integrated HHV-6A/B in immunocompromised patients. Reference materials in clinical virology commonly consist of laboratory-adapted viral strains that may be affected by the culture process. We performed next-generation sequencing to make relative copy number measurements at single nucleotide resolution of eight candidate HHV-6A and seven HHV-6B reference strains and DNA materials from the HHV-6 Foundation and Advanced Biotechnologies Inc. Eleven of 17 (65%) HHV-6A/B candidate reference materials showed multiple copies of the origin of replication upstream of the U41 gene by next-generation sequencing. These large tandem repeats arose independently in culture-adapted HHV-6A and HHV-6B strains, measuring 1,254 bp and 983 bp, respectively. The average copy number measured was between 5 and 10 times the number of copies of the rest of the genome. We also report the first interspecies recombinant HHV-6A/B strain with a HHV-6A backbone and a >5.5-kb region from HHV-6B, from U41 to U43, that covered the origin tandem repeat. Specific HHV-6A reference strains demonstrated duplication of regions at U1/U2, U87, and U89, as well as deletion in the U12-to-U24 region and the U94/U95 genes. HHV-6A/B strains derived from cord blood mononuclear cells from different laboratories on different continents with fewer passages revealed no copy number differences throughout the viral genome. These data indicate that large origin tandem duplications are an adaptation of both HHV-6A and HHV-6B in culture and show interspecies recombination is possible within the Betaherpesvirinae. IMPORTANCE Anything in science that needs to be quantitated requires a standard unit of measurement. This includes viruses, for which quantitation increasingly determines definitions of pathology and guidelines for treatment. However, the act of making standard or reference material in virology can alter its very accuracy through genomic duplications, insertions, and rearrangements. We used deep sequencing to examine candidate reference strains for HHV-6, a ubiquitous human virus that can reactivate in the immunocompromised population and is integrated into the human genome in every cell of the body for 1% of people worldwide. We found large tandem repeats in the origin of replication for both HHV-6A and HHV-6B that are selected for in culture. We also found the first interspecies recombinant between HHV-6A and HHV-6B, a phenomenon that is well known in alphaherpesviruses but to date has not been seen in betaherpesviruses. These data critically inform HHV-6A/B biology and the standard selection process. PMID:29491155
Groves, M R; Hanlon, N; Turowski, P; Hemmings, B A; Barford, D
1999-01-08
The PR65/A subunit of protein phosphatase 2A serves as a scaffolding molecule to coordinate the assembly of the catalytic subunit and a variable regulatory B subunit, generating functionally diverse heterotrimers. Mutations of the beta isoform of PR65 are associated with lung and colon tumors. The crystal structure of the PR65/Aalpha subunit, at 2.3 A resolution, reveals the conformation of its 15 tandemly repeated HEAT sequences, degenerate motifs of approximately 39 amino acids present in a variety of proteins, including huntingtin and importin beta. Individual motifs are composed of a pair of antiparallel alpha helices that assemble in a mainly linear, repetitive fashion to form an elongated molecule characterized by a double layer of alpha helices. Left-handed rotations at three interrepeat interfaces generate a novel left-hand superhelical conformation. The protein interaction interface is formed from the intrarepeat turns that are aligned to form a continuous ridge.
The B chromosomes in Brachycome.
Leach, C R; Houben, A; Timmis, J N
2004-01-01
This review presents a historical account of studies of B chromosomes in the genus Brachycome Cass. (synonym: Brachyscome) from the earliest cytological investigations carried out in the late 1960s though to the most recent molecular analyses. Molecular analyses provide insights into the origin and evolution of the B chromosomes (Bs) of Brachycome dichromosomatica, a species which has Bs of two different sizes. The larger Bs are somatically stable whereas the smaller, or micro, Bs are somatically unstable. Both B types contain clusters of ribosomal RNA genes that have been shown unequivocally to be inactive in the case of the larger Bs. The large Bs carry a family of tandem repeat sequences (Bd49) that are located mainly at the centromere. Multiple copies of sequences related to this repeat are present on the A chromosomes (As) of related species, whereas only a few copies exist in the A chromosomes of B. dichromosomatica. The micro Bs share DNA sequences with the As and the larger Bs, and they also have B-specific repeats (Bdm29 and Bdm54). In some cases repeat sequences on the micro Bs have been shown to occur as clusters on the A chromosomes in a proportion of individuals within a population. It is clear that none of these B types originated by simple excision of segments from the A chromosomes. Copyright 2004 S. Karger AG, Basel
Gao, Tianxiang; Wan, Zhenzhen; Song, Na; Zhang, Xiumei; Han, Zhiqiang
2014-12-01
A number of evolutionary mechanisms have been suggested for generating significant genetic structuring among marine fish populations in Northwestern Pacific. We used mtDNA control region to assess the factors in shaping the genetic structure of Japanese grenadier anchovy, Coilia nasus, an anadromous and estuarine coastal species, in Northwestern Pacific. Sixty seven individuals from four locations in Northwestern Pacific were sequenced for mitochondrial control region, detecting 61 haplotypes. The length of amplified control region varied from 677 to 754 bp. This length variability was due to the presence of varying numbers of a 38-bp tandemly repeated sequence. Two distinct lineages were detected, which might have diverged during Pleistocene low sea levels. There were strong differences in the geographical distribution of the two lineages. Analyses of molecular variance and the population statistic ΦST revealed significant genetic structure between China and Ariake Bay populations. Based on the frequency distribution of tandem repeat units, significant genetic differentiation was also detected between China and Ariake Bay populations. Isolation by distance seems to be the main factor driving present genetic structuring of C. nasus populations, indicating coastal dispersal pattern in this coastal species. Such an evolutionary process agrees well with some of the biological features characterizing this species.
Dugas, Diana V; Hernandez, David; Koenen, Erik J M; Schwarz, Erika; Straub, Shannon; Hughes, Colin E; Jansen, Robert K; Nageswara-Rao, Madhugiri; Staats, Martijn; Trujillo, Joshua T; Hajrah, Nahid H; Alharbi, Njud S; Al-Malki, Abdulrahman L; Sabir, Jamal S M; Bailey, C Donovan
2015-11-23
The Leguminosae has emerged as a model for studying angiosperm plastome evolution because of its striking diversity of structural rearrangements and sequence variation. However, most of what is known about legume plastomes comes from few genera representing a subset of lineages in subfamily Papilionoideae. We investigate plastome evolution in subfamily Mimosoideae based on two newly sequenced plastomes (Inga and Leucaena) and two recently published plastomes (Acacia and Prosopis), and discuss the results in the context of other legume and rosid plastid genomes. Mimosoid plastomes have a typical angiosperm gene content and general organization as well as a generally slow rate of protein coding gene evolution, but they are the largest known among legumes. The increased length results from tandem repeat expansions and an unusual 13 kb IR-SSC boundary shift in Acacia and Inga. Mimosoid plastomes harbor additional interesting features, including loss of clpP intron1 in Inga, accelerated rates of evolution in clpP for Acacia and Inga, and dN/dS ratios consistent with neutral and positive selection for several genes. These new plastomes and results provide important resources for legume comparative genomics, plant breeding, and plastid genetic engineering, while shedding further light on the complexity of plastome evolution in legumes and angiosperms.
MSDB: A Comprehensive Database of Simple Sequence Repeats.
Avvaru, Akshay Kumar; Saxena, Saketh; Sowpati, Divya Tej; Mishra, Rakesh Kumar
2017-06-01
Microsatellites, also known as Simple Sequence Repeats (SSRs), are short tandem repeats of 1-6 nt motifs present in all genomes, particularly eukaryotes. Besides their usefulness as genome markers, SSRs have been shown to perform important regulatory functions, and variations in their length at coding regions are linked to several disorders in humans. Microsatellites show a taxon-specific enrichment in eukaryotic genomes, and some may be functional. MSDB (Microsatellite Database) is a collection of >650 million SSRs from 6,893 species including Bacteria, Archaea, Fungi, Plants, and Animals. This database is by far the most exhaustive resource to access and analyze SSR data of multiple species. In addition to exploring data in a customizable tabular format, users can view and compare the data of multiple species simultaneously using our interactive plotting system. MSDB is developed using the Django framework and MySQL. It is freely available at http://tdb.ccmb.res.in/msdb. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Identification of Streptococcus mitis321A vaccine antigens based on reverse vaccinology
Zhang, Qiao; Lin, Kexiong; Wang, Changzheng; Xu, Zhi; Yang, Li; Ma, Qianli
2018-01-01
Streptococcus mitis (S. mitis) may transform into highly pathogenic bacteria. The aim of the present study was to identify potential antigen targets for designing an effective vaccine against the pathogenic S. mitis321A. The genome of S. mitis321A was sequenced using an Illumina Hiseq2000 instrument. Subsequently, Glimmer 3.02 and Tandem Repeat Finder (TRF) 4.04 were used to predict genes and tandem repeats, respectively, with DNA sequence function analysis using the Basic Local Alignment Search Tool (BLAST) in the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Cluster of Orthologous Groups of proteins (COG) databases. Putative gene antigen candidates were screened with BLAST ahead of phylogenetic tree analysis. The DNA sequence assembly size was 2,110,680 bp with 40.12% GC, 6 scaffolds and 9 contig. Consequently, 1,944 genes were predicted, and 119 TRF, 56 microsatellite DNA, 10 minisatellite DNA and 154 transposons were acquired. The predicted genes were associated with various pathways and functions concerning membrane transport and energy metabolism. Multiple putative genes encoding surface proteins, secreted proteins and virulence factors, as well as essential genes were determined. The majority of essential genes belonged to a phylogenetic lineage, while 321AGL000129 and 321AGL000299 were on the same branch. The current study provided useful information regarding the biological function of the S. mitis321A genome and recommends putative antigen candidates for developing a potent vaccine against S. mitis. PMID:29620181
Analysis of short tandem repeat polymorphisms using infrared fluorescence with M18 tailed primers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oetting, W.S.; Wiesner, G.; Laken, S.
The use of short tandem repeat polymorphisms (STRPs) are becoming increasingly important as markers for linkage analysis due to their large numbers of the human genome and their high degree of polymorphism. Fluorescence based detection of the STRP pattern using the LI-COR model 4000S automated DNA sequencer eliminates the need for radioactivity and produces a digitized image that can be used for the analysis of the polymorphisms. In an effort to reduce the cost of STRP analysis, we have synthesized primers with a 19 bp extension complementary to the sequence of the M13 primer on the 5{prime} end of onemore » of the two primers used in the amplification of the STRP instead of using primers with direct conjugation of the infrared fluorescent dye. Up to 5 primer pairs can be multiplexed together with the M13 primer-dye conjugate as the sole primer conjugated to the fluorescent dye. Comparisons between primers that have been directly conjugated to the fluor with those having the M13 sequence extension show no difference in the ability to determine the STRP pattern. At present, the entire Weber 4A set of STRP markers is available with the M13 5{prime} extension. We are currently using this technique for linkage analysis of familial breast cancer and asthma. The combination of STRP analysis using fluorescence detection will allow this technique to be fully automated for allele scoring and linkage analysis.« less
Ricci, U; Sani, I; Guarducci, S; Biondi, C; Pelagatti, S; Lazzerini, V; Brusaferri, A; Lapini, M; Andreucci, E; Giunti, L; Giovannucci Uzielli, M L
2000-11-01
We used an infrared (IR) automated fluorescence monolaser sequencer for the analysis of 13 autosomal short tandem repeat (STR) systems (TPOX, D3S1358, FGA, CSF1PO, D5S818, D7S820, D8S1179, TH01, vWA, D13S317, D16S359, D18S51, D21S11) and the X-Y homologous gene amelogenin system. These two systems represent the core of the combined DNA index systems (CODIS). Four independent multiplex reactions, based on the polymerase chain reaction (PCR) technique and on the direct labeling of the forward primer of every primer pair, with a new molecule (IRDye800), were set up, permitting the exact characterization of the alleles by comparison with ladders of specific sequenced alleles. This is the first report of the whole analysis of the STRs of the CODIS core using an IR automated DNA sequencer. The protocol was used to solve paternity/maternity tests and for population studies. The electrophoretic system also proved useful for the correct typing of those loci differing in size by only 2 bp. A sensibility study demonstrated that the test can detect an average of 10 pg of undegraded human DNA. We also performed a preliminary study analyzing some forensic samples and mixed stains, which suggested the usefulness of using this analytical system for human identification as well as for forensic purposes.
Ca2+-stabilized adhesin helps an Antarctic bacterium reach out and bind ice.
Vance, Tyler D R; Olijve, Luuk L C; Campbell, Robert L; Voets, Ilja K; Davies, Peter L; Guo, Shuaiqi
2014-07-04
The large size of a 1.5-MDa ice-binding adhesin [MpAFP (Marinomonas primoryensis antifreeze protein)] from an Antarctic Gram-negative bacterium, M. primoryensis, is mainly due to its highly repetitive RII (Region II). MpAFP_RII contains roughly 120 tandem copies of an identical 104-residue repeat. We have previously determined that a single RII repeat folds as a Ca2+-dependent immunoglobulin-like domain. Here, we solved the crystal structure of RII tetra-tandemer (four tandem RII repeats) to a resolution of 1.8 Å. The RII tetra-tandemer reveals an extended (~190-Å × ~25-Å), rod-like structure with four RII-repeats aligned in series with each other. The inter-repeat regions of the RII tetra-tandemer are strengthened by Ca2+ bound to acidic residues. SAXS (small-angle X-ray scattering) profiles indicate the RII tetra-tandemer is significantly rigidified upon Ca2+ binding, and that the protein's solution structure is in excellent agreement with its crystal structure. We hypothesize that >600 Ca2+ help rigidify the chain of ~120 104-residue repeats to form a ~0.6 μm rod-like structure in order to project the ice-binding domain of MpAFP away from the bacterial cell surface. The proposed extender role of RII can help the strictly aerobic, motile bacterium bind ice in the upper reaches of the Antarctic lake where oxygen and nutrients are most abundant. Ca2+-induced rigidity of tandem Ig-like repeats in large adhesins might be a general mechanism used by bacteria to bind to their substrates and help colonize specific niches.
Thermodynamic characterization of tandem mismatches found in naturally occurring RNA
Christiansen, Martha E.; Znosko, Brent M.
2009-01-01
Although all sequence symmetric tandem mismatches and some sequence asymmetric tandem mismatches have been thermodynamically characterized and a model has been proposed to predict the stability of previously unmeasured sequence asymmetric tandem mismatches [Christiansen,M.E. and Znosko,B.M. (2008) Biochemistry, 47, 4329–4336], experimental thermodynamic data for frequently occurring tandem mismatches is lacking. Since experimental data is preferred over a predictive model, the thermodynamic parameters for 25 frequently occurring tandem mismatches were determined. These new experimental values, on average, are 1.0 kcal/mol different from the values predicted for these mismatches using the previous model. The data for the sequence asymmetric tandem mismatches reported here were then combined with the data for 72 sequence asymmetric tandem mismatches that were published previously, and the parameters used to predict the thermodynamics of previously unmeasured sequence asymmetric tandem mismatches were updated. The average absolute difference between the measured values and the values predicted using these updated parameters is 0.5 kcal/mol. This updated model improves the prediction for tandem mismatches that were predicted rather poorly by the previous model. This new experimental data and updated predictive model allow for more accurate calculations of the free energy of RNA duplexes containing tandem mismatches, and, furthermore, should allow for improved prediction of secondary structure from sequence. PMID:19509311
Two synthetic tandem repetitive DNA probes were used to compare genetic variation at variable-number-tandem-repeat (VNTR) loci among Rubus idaeus L. var. strigosus (Michx.) Maxim. (Rosaceae) individuals sampled at eight sites contaminated by pollutants (N = 39) and eight adjacent...
Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis
NASA Astrophysics Data System (ADS)
Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.
1998-03-01
Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.
Investigation of microsatellite instability in Turkish breast cancer patients.
Demokan, Semra; Muslumanoglu, Mahmut; Yazici, H; Igci, Abdullah; Dalay, Nejat
2002-01-01
Multiple somatic and inherited genetic changes that lead to loss of growth control may contribute to the development of breast cancer. Microsatellites are tandem repeats of simple sequences that occur abundantly and at random throughout most eucaryotic genomes. Microsatellite instability (MI), characterized by the presence of random contractions or expansions in the length of simple sequence repeats or microsatellites, is observed in a variety of tumors. The aim of this study was to compare tumor DNA fingerprints with constitutional DNA fingerprints to investigate changes specific to breast cancer and evaluate its correlation with clinical characteristics. Tumor and normal tissue samples of 38 patients with breast cancer were investigated by comparing PCR-amplified microsatellite sequences D2S443 and D21S1436. Microsatellite instability at D21S1436 and D2S443 was found in 5 (13%) and 7 (18%) patients, respectively. Two patients displayed instability at both marker loci. No association was found between MI and age, family history, lymph node involvement and other clinical parameters.
Problem-solving test: Southwestern blotting.
Szeberényi, József
2014-01-01
Terms to be familiar with before you start to solve the test: Southern blotting, Western blotting, restriction endonucleases, agarose gel electrophoresis, nitrocellulose filter, molecular hybridization, polyacrylamide gel electrophoresis, proto-oncogene, c-abl, Src-homology domains, tyrosine protein kinase, nuclear localization signal, cDNA, deletion mutants, expression plasmid, transfection, RNA polymerase II, promoter, Shine-Dalgarno sequence, polyadenylation element, affinity chromatography, Northern blotting, immunoprecipitation, sodium dodecylsulfate, autoradiography, tandem repeats. Copyright © 2014 The International Union of Biochemistry and Molecular Biology.
Breast Mucin Tumor-Specific Epitopes for Cancer Immunotherapy
1998-09-01
reactivity with tumor-specific monoclonal antibodies show that antigenicity is maximized with the 40 amino acid MUC1-mtr2. By contrast, the MUC1-mtr3...associated mucins (7). The presence of tumor-specific epitopes is evidenced by the development of many monoclonal antibodies (mAb) that recognize...P1-P5 in the tandem repeat sequence (7). This epitope was identified by competition of antibody binding to tumor- specific mucin by synthetic
Alcivar-Warren, Acacia; Meehan-Meola, Dawn; Wang, Yongping; Guo, Ximing; Zhou, Linghua; Xiang, Jianhai; Moss, Shaun; Arce, Steve; Warren, William; Xu, Zhenkang; Bell, Kireina
2006-01-01
To develop genetic and physical maps for shrimp, accurate information on the actual number of chromosomes and a large number of genetic markers is needed. Previous reports have shown two different chromosome numbers for the Pacific whiteleg shrimp, Penaeus vannamei, the most important penaeid shrimp species cultured in the Western hemisphere. Preliminary results obtained by direct sequencing of clones from a Sau3A-digested genomic library of P. vannamei ovary identified a large number of (TAACC/GGTTA)-containing SSRs. The objectives of this study were to (1) examine the frequency of (TAACC)n repeats in 662 P. vannamei genomic clones that were directly sequenced, and perform homology searches of these clones, (2) confirm the number of chromosomes in testis of P. vannamei, and (3) localize the TAACC repeats in P. vannamei chromosome spreads using fluorescence in situ hybridization (FISH). Results for objective 1 showed that 395 out of the 662 clones sequenced contained single or multiple SSRs with three or more repeat motifs, 199 of which contained variable tandem repeats of the pentanucleotide (TAACC/GGTTA)n, with 3 to 14 copies per sequence. The frequency of (TAACC)n repeats in P. vannamei is 4.68 kb for SSRs with five or more repeat motifs. Sequence comparisons using the BLASTN nonredundant and expressed sequence tag (EST) databases indicated that most of the TAACC-containing clones were similar to either the core pentanucleotide repeat in PVPENTREP locus (GenBank accession no. X82619) or portions of 28S rRNA. Transposable elements (transposase for Tn1000 and reverse transcriptase family members), hypothetical or unnamed protein products, and genes of known function such as 18S and 28S rRNAs, heat shock protein 70, and thrombospondin were identified in non-TAACC-containing clones. For objective 2, the meiotic chromosome number of P. vannamei was confirmed as N = 44. For objective 3, four FISH probes (P1 to P4) containing different numbers of TAACC repeats produced positive signals on telomeres of P. vannamei chromosomes. A few chromosomes had positive signals interstitially. Probe signal strength and chromosome coverage differed in the general order of P1>P2>P3>P4, which correlated with the length of TAACC repeats within the probes: 83, 66, 35, and 30 bp, respectively, suggesting that the TAACC repeats, and not the flanking sequences, produced the TAACC signals at chromosome ends and TAACC is likely the telomere sequence for P. vannamei.
Liu, San-Xu; Hou, Wei; Zhang, Xue-Yan; Peng, Chang-Jun; Yue, Bi-Song; Fan, Zhen-Xin; Li, Jing
2018-07-18
The Tibetan macaque, which is endemic to China, is currently listed as a Near Endangered primate species by the International Union for Conservation of Nature (IUCN). Short tandem repeats (STRs) refer to repetitive elements of genome sequence that range in length from 1-6 bp. They are found in many organisms and are widely applied in population genetic studies. To clarify the distribution characteristics of genome-wide STRs and understand their variation among Tibetan macaques, we conducted a genome-wide survey of STRs with next-generation sequencing of five macaque samples. A total of 1 077 790 perfect STRs were mined from our assembly, with an N50 of 4 966 bp. Mono-nucleotide repeats were the most abundant, followed by tetra- and di-nucleotide repeats. Analysis of GC content and repeats showed consistent results with other macaques. Furthermore, using STR analysis software (lobSTR), we found that the proportion of base pair deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals (P<0.05, t-test). We also found a greater number of homozygous STRs than heterozygous STRs (P<0.05, t-test), with the Emei and Jianyang Tibetan macaques showing more heterozygous loci than Huangshan Tibetan macaques. The proportion of insertions and mean variation of alleles in the Emei and Jianyang individuals were slightly higher than those in the Huangshan individuals, thus revealing differences in STR allele size between the two populations. The polymorphic STR loci identified based on the reference genome showed good amplification efficiency and could be used to study population genetics in Tibetan macaques. The neighbor-joining tree classified the five macaques into two different branches according to their geographical origin, indicating high genetic differentiation between the Huangshan and Sichuan populations. We elucidated the distribution characteristics of STRs in the Tibetan macaque genome and provided an effective method for screening polymorphic STRs. Our results also lay a foundation for future genetic variation studies of macaques.
Wakasa, Yuhya; Zhao, Hui; Hirose, Sakiko; Yamauchi, Daiki; Yamada, Yuko; Yang, Lijun; Ohinata, Kousaku; Yoshikawa, Masaaki; Takaiwa, Fumio
2011-09-01
Novokinin (Arg-Pro-Leu-Lys-Pro-Trp, RPLKPW) is a new potent antihypertensive peptide based on the sequence of ovokinin (2-7) derived from ovalbumin. We previously generated transgenic rice seeds in which eight novokinin were fused to storage protein glutelins (GluA2 and GluC) for expression. Oral administration of these seeds to spontaneously hypertensive rats (SHRs) reduced systolic blood pressures at a dose of 1 g seed/kg of SHR. Here, 10- or 18-tandem repeats of novokinin with an endoplasmic reticulum (ER) retention signal (Lys-Asp-Glu-Leu, KDEL) at the C terminus were directly expressed in rice under the control of the glutelin promoter containing its signal peptide. Only small amounts of the 18-repeat novokinin accumulated, and it was unexpectedly deposited in the nucleolus. This abnormal intracellular localization was explained by an endogenous signal for nuclear localization. The GFP reporter protein fused to this sequence targeted to nuclei by a transient assay using onion epidermal cells. Transgenic seed expressing the 18-repeat novokinin exhibited significantly higher antihypertensive activity after a single oral dose to SHR even at one-quarter the amount (0.25 g/kg) of the transgenic rice seed expressing the fusion construct; though, its novokinin content was much lower (1/5). Furthermore, in a long-term administration for 5 weeks, even a smaller dose (0.0625 g/kg) of transgenic seeds could confer antihypertensive activity. This high antihypertensive activity may be attributed to differences in digestibility of expressed products by gastrointestinal enzymes and the unique intracellular localization. These results indicate that accumulation of novokinin as a tandemly repeated structure in transgenic rice is more effective than as a fusion-type structure. © 2010 The Authors. Plant Biotechnology Journal © 2010 Society for Experimental Biology and Blackwell Publishing Ltd.
Koch, K S; Gleiberman, A S; Aoki, T; Leffert, H L; Feren, A; Jones, A L; Fodor, E J
1995-01-01
An unusual S1-nuclease sensitive microsatellite (STMS) has been found in the single copy, rat polymeric immunoglobulin receptor gene (PIGR) terminal exon. In Fisher rats, elements within or beyond the STMS are expressed variably in the 3' untranslated regions (3'UTRs) of two 'Groups' of PIGR-encoded hepatic mRNAs (pIg-R) during liver regeneration. STMS elements include neighboring constant regions (a 60-bp d[GA]-rich tract with a chi-like octamer, followed by 15 tandem d[GGA] repeats) that merge directly with 36 or 39 tandem d[GAA] repeats (Fisher or Wistar strains, respectively) interrupted by d[AA] between their 5th-6th repeat units. The Wistar STMS is flanked upstream by two regions of nearly contiguous d[CA] or d[CT] repeats in the 3' end of intron 8; and downstream, by a 283 bp 'unit' containing several inversions at its 5' end, and two polyadenylation signals at its 3' end. The 283 nt unit is expressed in Group 1 pIg-R mRNAs; but it is absent in the Group 2 family so that their GAA repeats merge with their poly A tails. In contrast to genomic sequence, GGA triplet repeats are amplified (n > or = 24-26), whereas GAA triplet repeats are truncated variably (n < or = 9-37) and expressed uninterruptedly in both mRNA Groups. These results suggest that 3' end processing of the rat PIGR gene may involve misalignment, slippage and premature termination of RNA polymerase II. The function of this unusual processing and possible roles of chi-like octamers in quiescent or extrahepatic tissues are discussed. Images PMID:7739889
Effect of repeat copy number on variable-number tandem repeat mutations in Escherichia coli O157:H7.
Vogler, Amy J; Keys, Christine; Nemoto, Yoshimi; Colman, Rebecca E; Jay, Zack; Keim, Paul
2006-06-01
Variable-number tandem repeat (VNTR) loci have shown a remarkable ability to discriminate among isolates of the recently emerged clonal pathogen Escherichia coli O157:H7, making them a very useful molecular epidemiological tool. However, little is known about the rates at which these sequences mutate, the factors that affect mutation rates, or the mechanisms by which mutations occur at these loci. Here, we measure mutation rates for 28 VNTR loci and investigate the effects of repeat copy number and mismatch repair on mutation rate using in vitro-generated populations for 10 E. coli O157:H7 strains. We find single-locus rates as high as 7.0 x 10(-4) mutations/generation and a combined 28-locus rate of 6.4 x 10(-4) mutations/generation. We observed single- and multirepeat mutations that were consistent with a slipped-strand mispairing mutation model, as well as a smaller number of large repeat copy number mutations that were consistent with recombination-mediated events. Repeat copy number within an array was strongly correlated with mutation rate both at the most mutable locus, O157-10 (r2= 0.565, P = 0.0196), and across all mutating loci. The combined locus model was significant whether locus O157-10 was included (r2= 0.833, P < 0.0001) or excluded (r2= 0.452, P < 0.0001) from the analysis. Deficient mismatch repair did not affect mutation rate at any of the 28 VNTRs with repeat unit sizes of >5 bp, although a poly(G) homomeric tract was destabilized in the mutS strain. Finally, we describe a general model for VNTR mutations that encompasses insertions and deletions, single- and multiple-repeat mutations, and their relative frequencies based upon our empirical mutation rate data.
Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas
2013-07-01
The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100-500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S-5·8S-25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species.
Taylor, J S; Breden, F
2000-01-01
The standard slipped-strand mispairing (SSM) model for the formation of variable number tandem repeats (VNTRs) proposes that a few tandem repeats, produced by chance mutations, provide the "raw material" for VNTR expansion. However, this model is unlikely to explain the formation of VNTRs with long motifs (e.g., minisatellites), because the likelihood of a tandem repeat forming by chance decreases rapidly as the length of the repeat motif increases. Phylogenetic reconstruction of the birth of a mitochondrial (mt) DNA minisatellite in guppies suggests that VNTRs with long motifs can form as a consequence of SSM at noncontiguous repeats. VNTRs formed in this manner have motifs longer than the noncontiguous repeat originally formed by chance and are flanked by one unit of the original, noncontiguous repeat. SSM at noncontiguous repeats can therefore explain the birth of VNTRs with long motifs and the "imperfect" or "short direct" repeats frequently observed adjacent to both mtDNA and nuclear VNTRs. PMID:10880490
Krawczyk, Paweł; Kucharczyk, Tomasz; Kowalski, Dariusz M; Powrózek, Tomasz; Ramlau, Rodryg; Kalinka-Warzocha, Ewa; Winiarczyk, Kinga; Knetki-Wróblewska, Magdalena; Wojas-Krawczyk, Kamila; Kałakucka, Katarzyna; Dyszkiewicz, Wojciech; Krzakowski, Maciej; Milanowski, Janusz
2014-12-01
We presented retrospective analysis of up to five polymorphisms in TS, MTHFR and ERCC1 genes as molecular predictive markers for homogeneous Caucasian, non-squamous NSCLC patients treated with pemetrexed and platinum front-line chemotherapy. The following polymorphisms in DNA isolated from 115 patients were analyzed: various number of 28-bp tandem repeats in 5'-UTR region of TS gene, single nucleotide polymorphism (SNP) within the second tandem repeat of TS gene (G>C); 6-bp deletion in 3'-UTR region of the TS (1494del6); 677C>T SNP in MTHFR; 19007C>T SNP in ERCC1. Molecular examinations' results were correlated with disease control rate, progression-free survival (PFS) and overall survival. Polymorphic tandem repeat sequence (2R, 3R) in the enhancer region of TS gene and G>C SNP within the second repeat of 3R allele seem to be important for the effectiveness of platinum and pemetrexed in first-line chemotherapy. The insignificant shortening of PFS in 3R/3R homozygotes as compared to 2R/2R and 2R/3R genotypes were observed, while it was significantly shorter in patients carrying synchronous 3R allele and G nucleotide. The combined analysis of TS VNTR and MTHFR 677C>T SNP revealed shortening of PFS in synchronous carriers of 3R allele in TS and two C alleles in MTHFR. The strongest factors increased the risk of progression were poor PS, weight loss, anemia and synchronous presence of 3R allele and G nucleotide in the second repeat of 3R allele in TS. Moreover, lack of application of second-line chemotherapy, weight loss and poor performance status and above-mentioned genotype of TS gene increased risk of early mortality. The examined polymorphisms should be accounted as molecular predictor factors for pemetrexed- and platinum-based front-line chemotherapy in non-squamous NSCLC patients.
U'Ren, Jana M; Schupp, James M; Pearson, Talima; Hornstra, Heidie; Friedman, Christine L Clark; Smith, Kimothy L; Daugherty, Rebecca R Leadem; Rhoton, Shane D; Leadem, Ben; Georgia, Shalamar; Cardon, Michelle; Huynh, Lynn Y; DeShazer, David; Harvey, Steven P; Robison, Richard; Gal, Daniel; Mayo, Mark J; Wagner, David; Currie, Bart J; Keim, Paul
2007-03-30
The facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We identified and categorized tandem repeat arrays and their distribution throughout the genome of B. pseudomallei strain K96243 in order to develop a genetic typing method for B. pseudomallei. We then screened 104 of the potentially polymorphic loci across a diverse panel of 31 isolates including B. pseudomallei, B. mallei and B. thailandensis in order to identify loci with varying degrees of polymorphism. A subset of these tandem repeat arrays were subsequently developed into a multiple-locus VNTR analysis to examine 66 B. pseudomallei and 21 B. mallei isolates from around the world, as well as 95 lineages from a serial transfer experiment encompassing ~18,000 generations. B. pseudomallei contains a preponderance of tandem repeat loci throughout its genome, many of which are duplicated elsewhere in the genome. The majority of these loci are composed of repeat motif lengths of 6 to 9 bp with 4 to 10 repeat units and are predominately located in intergenic regions of the genome. Across geographically diverse B. pseudomallei and B.mallei isolates, the 32 VNTR loci displayed between 7 and 28 alleles, with Nei's diversity values ranging from 0.47 and 0.94. Mutation rates for these loci are comparable (>10-5 per locus per generation) to that of the most diverse tandemly repeated regions found in other less diverse bacteria. The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were identical using previous typing methods. Given the health threat to humans and livestock and the potential for B. pseudomallei to be released intentionally, MLVA could prove to be an important tool for fine-scale epidemiological or forensic tracking of this increasingly important environmental pathogen.
NASA Technical Reports Server (NTRS)
Marsh, T. L.; Reich, C. I.; Whitelock, R. B.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1994-01-01
The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.
Centromere reference models for human chromosomes X and Y satellite arrays
Miga, Karen H.; Newton, Yulia; Jain, Miten; Altemose, Nicolas; Willard, Huntington F.; Kent, W. James
2014-01-01
The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes. PMID:24501022
Genetic markers, genotyping methods & next generation sequencing in Mycobacterium tuberculosis
Desikan, Srinidhi; Narayanan, Sujatha
2015-01-01
Molecular epidemiology (ME) is one of the main areas in tuberculosis research which is widely used to study the transmission epidemics and outbreaks of tubercle bacilli. It exploits the presence of various polymorphisms in the genome of the bacteria that can be widely used as genetic markers. Many DNA typing methods apply these genetic markers to differentiate various strains and to study the evolutionary relationships between them. The three widely used genotyping tools to differentiate Mycobacterium tuberculosis strains are IS6110 restriction fragment length polymorphism (RFLP), spacer oligotyping (Spoligotyping), and mycobacterial interspersed repeat units - variable number of tandem repeats (MIRU-VNTR). A new prospect towards ME was introduced with the development of whole genome sequencing (WGS) and the next generation sequencing (NGS) methods, where the entire genome is sequenced that not only helps in pointing out minute differences between the various sequences but also saves time and the cost. NGS is also found to be useful in identifying single nucleotide polymorphisms (SNPs), comparative genomics and also various aspects about transmission dynamics. These techniques enable the identification of mycobacterial strains and also facilitate the study of their phylogenetic and evolutionary traits. PMID:26205019
The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes
Li, Jia; Su, Yingjuan; Wang, Ting
2018-01-01
The plastid accD gene encodes a subunit of the acetyl-CoA carboxylase (ACCase) enzyme. The length of accD gene has been supposed to expand in Cryptomeria japonica, Taiwania cryptomerioides, Cephalotaxus, Taxus chinensis, and Podocarpus lambertii, and the main reason for this phenomenon was the existence of tandemly repeated sequences. However, it is still unknown whether the accD gene length in other cupressophytes has expanded. Here, in order to investigate how widespread this phenomenon was, 18 accD sequences and its surrounding regions of cupressophyte were sequenced and analyzed. Together with 39 GenBank sequence data, our taxon sampling covered all the extant gymnosperm orders. The repetitive elements and substitution rates of accD among 57 gymnosperm species were analyzed, the results show: (1) Reading frame length of accD gene in 18 cupressophytes species has also expanded. (2) Many repetitive elements were identified in accD gene of cupressophyte lineages. (3) The synonymous and non-synonymous substitution rates of accD were accelerated in cupressophytes. (4) accD was located in rearrangement endpoints. These results suggested that repetitive elements may mediate the chloroplast genome rearrangement and accelerated the substitution rates. PMID:29731764
Zhang, Q; Yang, Y Q; Zhang, Z Y; Li, L; Yan, W Y; Jiang, W J; Xin, A G; Lei, C X; Zheng, Z X
2002-01-01
In this study, the sequences of capsid protein VPI regions of YNAs1.1 and YNAs1.2 isolates of foot-and-mouth disease virus (FMDV) were analyzed and a peptide containing amino acids (aa) 133-158 of VP1 and aa 20-34 of VP4 of FMDV type Asia I was assumed to contain B and T cell epitopes, because it is hypervariable and includes a cell attachment site RGD located in the G-H loop. The DNA fragments encoding aa 133-158 of VP1 and aa 20-34 of VP4 of FMDV type Asia 1 were chemically synthesized and ligated into a tandem repeat of aa 133-158-20 approximately 34-133-158. In order to enhance its immunogenicity, the tandem repeat was inserted downstream of the beta-galactosidase gene in the expression vector pWR590. This insertion yielded a recombinant expression vector pAS1 encoding the fusion protein. The latter reacted with sera from FMDV type Asia 1-infected animals in vitro and elicited high levels of neutralizing antibodies in guinea pigs. The T cell proliferation in immunized animals increased following stimulation with the fusion protein. It is reported for the first time that a recombinant fusion protein vaccine was produced using B and T cell epitopes of FMDV type Asia 1 and that this fusion protein was immunogenic. The fusion protein reported here can serve as a candidate of fusion epitopes for design of a vaccine against FMDV type Asia 1.
The complete mitochondrial genome of the Giant Manta ray, Manta birostris.
Hinojosa-Alvarez, Silvia; Díaz-Jaimes, Pindaro; Marcet-Houben, Marina; Gabaldón, Toni
2015-01-01
The complete mitochondrial genome of the giant manta ray (Manta birostris), consists of 18,075 bp with rich A + T and low G content. Gene organization and length is similar to other species of ray. It comprises of 13 protein-coding genes, 2 rRNAs genes, 23 tRNAs genes and 1 non-coding sequence, and the control region. We identified an AT tandem repeat region, similar to that reported in Mobula japanica.
Use of Variable-Number Tandem Repeats To Examine Genetic Diversity of Neisseria meningitidis
Yazdankhah, Siamak P.; Lindstedt, Bjørn-Arne; Caugant, Dominique A.
2005-01-01
Repetitive DNA motifs with potential variable-number tandem repeats (VNTR) were identified in the genome of Neisseria meningitidis and used to develop a typing method. A total of 146 meningococcal isolates recovered from carriers and patients were studied. These included 82 of the 107 N. meningitidis isolates previously used in the development of multilocus sequence typing (MLST), 45 isolates recovered from different counties in Norway in connection with local outbreaks, and 19 serogroup W135 isolates of sequence type 11 (ST-11), which were recovered in several parts of the world. The latter group comprised isolates related to the Hajj outbreak of 2000 and isolates recovered from outbreaks in Burkina Faso in 2001 and 2002. All isolates had been characterized previously by MLST or multilocus enzyme electrophoresis (MLEE). VNTR analysis showed that meningococcal isolates with similar MLST or MLEE types recovered from epidemiologically linked cases in a defined geographical area often presented similar VNTR patterns while isolates of the same MLST or MLEE types without an obvious epidemiological link showed variable VNTR patterns. Thus, VNTR analysis may be used for fine typing of meningococcal isolates after MLST or MLEE typing. The method might be especially valuable for differentiating among ST-11 strains, as shown by the VNTR analyses of serogroup W135 ST-11 meningococcal isolates recovered since the mid-1990s. PMID:15814988
García, Katherine; Gavilán, Ronnie G.; Höfle, Manfred G.; Martínez-Urtaza, Jaime; Espejo, Romilio T.
2012-01-01
The emergence of the pandemic strain Vibrio parahaemolyticus O3:K6 in 1996 caused a large increase of diarrhea outbreaks related to seafood consumption in Southeast Asia, and later worldwide. Isolates of this strain constitutes a clonal complex, and their effectual differentiation is possible by comparison of their variable number tandem repeats (VNTRs). The differentiation of the isolates by the differences in VNTRs will allow inferring the population dynamics and microevolution of this strain but this requires knowing the rate and mechanism of VNTRs' variation. Our study of mutants obtained after serial cultivation of clones showed that mutation rates of the six VNTRs examined are on the order of 10−4 mutant per generation and that difference increases by stepwise addition of single mutations. The single stepwise mutation (SSM) was deduced because mutants with 1, 2, 3, or more repeat unit deletions or insertions follow a geometric distribution. Plausible phylogenetic trees are obtained when, according to SSM, the genetic distance between clusters with different number of repeats is assessed by the absolute differences in repeats. Using this approach, mutants originated from different isolates of pandemic V. parahaemolyticus after serial cultivation are clustered with their parental isolates. Additionally, isolates of pandemic V. parahaemolyticus from Southeast Asia, Tokyo, and northern and southern Chile are clustered according their geographical origin. The deepest split in these four populations is observed between the Tokyo and southern Chile populations. We conclude that proper phylogenetic relations and successful tracing of pandemic V. parahaemolyticus requires measuring the differences between isolates by the absolute number of repeats in the VNTRs considered. PMID:22292049
The mini-exon genes of three Phytomonas isolates that differ in plant tissue tropism.
Sturm, N R; Fernandes, O; Campbell, D A
1995-08-01
The tandem mini-exon gene repeat is an ideal diagnostic target for trypanosomatids because it includes sequences that are conserved absolutely coupled with regions of extreme variability. We have exploited these features and the polymerase chain reaction to differentiate Phytomonas strains isolated from phloem, fruit or latex of various host plants. While the transcribed regions are nearly identical, the intergenic sequences are variable in size and content (130-332 base pairs). The mini-exon genes of these phytomonads can therefore be distinguished from each other and from the corresponding genes in insect trypanosomes, with which they are oft confused.
Landrian, Ivette; McFarland, Karen N; Liu, Jilin; Mulligan, Connie J; Rasmussen, Astrid; Ashizawa, Tetsuo
2017-01-01
Spinocerebellar ataxia type 10 (SCA10), an autosomal dominant cerebellar ataxia disorder, is caused by a non-coding ATTCT microsatellite repeat expansion in the ataxin 10 gene. In a subset of SCA10 families, the 5'-end of the repeat expansion contains a complex sequence of penta- and heptanucleotide interruption motifs which is followed by a pure tract of tandem ATCCT repeats of unknown length at its 3'-end. Intriguingly, expansions that carry these interruption motifs correlate with an epileptic seizure phenotype and are unstable despite the theory that interruptions are expected to stabilize expanded repeats. To examine the apparent contradiction of unstable, interruption-positive SCA10 expansion alleles and to determine whether the instability originates outside of the interrupted region, we sequenced approximately 1 kb of the 5'-end of SCA10 expansions using the ATCCT-PCR product in individuals across multiple generations from four SCA10 families. We found that the greatest instability within this region occurred in paternal transmissions of the allele in stretches of pure ATTCT motifs while the intervening interrupted sequences were stable. Overall, the ATCCT interruption changes by only one to three repeat units and therefore cannot account for the instability across the length of the disease allele. We conclude that the AT-rich interruptions locally stabilize the SCA10 expansion at the 5'-end but do not completely abolish instability across the entire span of the expansion. In addition, analysis of the interruption alleles across these families support a parsimonious single origin of the mutation with a shared distant ancestor.
Lindstedt, Bjørn-Arne; Heir, Even; Gjernes, Elisabet; Vardund, Traute; Kapperud, Georg
2003-01-01
Background The ability to react early to possible outbreaks of Escherichia coli O157:H7 and to trace possible sources relies on the availability of highly discriminatory and reliable techniques. The development of methods that are fast and has the potential for complete automation is needed for this important pathogen. Methods In all 73 isolates of shiga-toxin producing E. coli O157 (STEC) were used in this study. The two available fully sequenced STEC genomes were scanned for tandem repeated stretches of DNA, which were evaluated as polymorphic markers for isolate identification. Results The 73 E. coli isolates displayed 47 distinct patterns and the MLVA assay was capable of high discrimination between the E. coli O157 strains. The assay was fast and all the steps can be automated. Conclusion The findings demonstrate a novel high discriminatory molecular typing method for the important pathogen E. coli O157 that is fast, robust and offers many advantages compared to current methods. PMID:14664722
The evolution and function of protein tandem repeats in plants.
Schaper, Elke; Anisimova, Maria
2015-04-01
Sequence tandem repeats (TRs) are abundant in proteomes across all domains of life. For plants, little is known about their distribution or contribution to protein function. We exhaustively annotated TRs and studied the evolution of TR unit variations for all Ensembl plants. Using phylogenetic patterns of TR units, we detected conserved TRs with unit number and order preserved during evolution, and those TRs that have diverged via recent TR unit gains/losses. We correlated the mode of evolution of TRs to protein function. TR number was strongly correlated with proteome size, with about one-half of all TRs recognized as common protein domains. The majority of TRs have been highly conserved over long evolutionary distances, some since the separation of red algae and green plants c. 1.6 billion yr ago. Conversely, recurrent recent TR unit mutations were rare. Our results suggest that the first TRs by far predate the first plants, and that TR appearance is an ongoing process with similar rates across the plant kingdom. Interestingly, the few detected highly mutable TRs might provide a source of variation for rapid adaptation. In particular, such TRs are enriched in leucine-rich repeats (LRRs) commonly found in R genes, where TR unit gain/loss may facilitate resistance to emerging pathogens. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.
Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes.
Gibbons, John G; Rokas, Antonis
2009-03-01
Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in 10 genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication, and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs were evolutionarily labile, their functional associations appeared. To be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface-associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.
Identification of common, unique and polymorphic microsatellites among 73 cyanobacterial genomes.
Kabra, Ritika; Kapil, Aditi; Attarwala, Kherunnisa; Rai, Piyush Kant; Shanker, Asheesh
2016-04-01
Microsatellites also known as Simple Sequence Repeats are short tandem repeats of 1-6 nucleotides. These repeats are found in coding as well as non-coding regions of both prokaryotic and eukaryotic genomes and play a significant role in the study of gene regulation, genetic mapping, DNA fingerprinting and evolutionary studies. The availability of 73 complete genome sequences of cyanobacteria enabled us to mine and statistically analyze microsatellites in these genomes. The cyanobacterial microsatellites identified through bioinformatics analysis were stored in a user-friendly database named CyanoSat, which is an efficient data representation and query system designed using ASP.net. The information in CyanoSat comprises of perfect, imperfect and compound microsatellites found in coding, non-coding and coding-non-coding regions. Moreover, it contains PCR primers with 200 nucleotides long flanking region. The mined cyanobacterial microsatellites can be freely accessed at www.compubio.in/CyanoSat/home.aspx. In addition to this 82 polymorphic, 13,866 unique and 2390 common microsatellites were also detected. These microsatellites will be useful in strain identification and genetic diversity studies of cyanobacteria.
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.
Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin
2013-01-01
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Jaeckisch, Nina; Yang, Ines; Wohlrab, Sylke; Glöckner, Gernot; Kroymann, Juergen; Vogel, Heiko; Cembella, Allan; John, Uwe
2011-01-01
Many dinoflagellate species are notorious for the toxins they produce and ecological and human health consequences associated with harmful algal blooms (HABs). Dinoflagellates are particularly refractory to genomic analysis due to the enormous genome size, lack of knowledge about their DNA composition and structure, and peculiarities of gene regulation, such as spliced leader (SL) trans-splicing and mRNA transposition mechanisms. Alexandrium ostenfeldii is known to produce macrocyclic imine toxins, described as spirolides. We characterized the genome of A. ostenfeldii using a combination of transcriptomic data and random genomic clones for comparison with other dinoflagellates, particularly Alexandrium species. Examination of SL sequences revealed similar features as in other dinoflagellates, including Alexandrium species. SL sequences in decay indicate frequent retro-transposition of mRNA species. This probably contributes to overall genome complexity by generating additional gene copies. Sequencing of several thousand fosmid and bacterial artificial chromosome (BAC) ends yielded a wealth of simple repeats and tandemly repeated longer sequence stretches which we estimated to comprise more than half of the whole genome. Surprisingly, the repeats comprise a very limited set of 79–97 bp sequences; in part the genome is thus a relatively uniform sequence space interrupted by coding sequences. Our genomic sequence survey (GSS) represents the largest genomic data set of a dinoflagellate to date. Alexandrium ostenfeldii is a typical dinoflagellate with respect to its transcriptome and mRNA transposition but demonstrates Alexandrium-like stop codon usage. The large portion of repetitive sequences and the organization within the genome is in agreement with several other studies on dinoflagellates using different approaches. It remains to be determined whether this unusual composition is directly correlated to the exceptionally genome organization of dinoflagellates with a low amount of histones and histone-like proteins. PMID:22164224
De novo protein sequencing by combining top-down and bottom-up tandem mass spectra.
Liu, Xiaowen; Dekker, Lennard J M; Wu, Si; Vanduijn, Martijn M; Luider, Theo M; Tolić, Nikola; Kou, Qiang; Dvorkin, Mikhail; Alexandrova, Sonya; Vyatkina, Kira; Paša-Tolić, Ljiljana; Pevzner, Pavel A
2014-07-03
There are two approaches for de novo protein sequencing: Edman degradation and mass spectrometry (MS). Existing MS-based methods characterize a novel protein by assembling tandem mass spectra of overlapping peptides generated from multiple proteolytic digestions of the protein. Because each tandem mass spectrum covers only a short peptide of the target protein, the key to high coverage protein sequencing is to find spectral pairs from overlapping peptides in order to assemble tandem mass spectra to long ones. However, overlapping regions of peptides may be too short to be confidently identified. High-resolution mass spectrometers have become accessible to many laboratories. These mass spectrometers are capable of analyzing molecules of large mass values, boosting the development of top-down MS. Top-down tandem mass spectra cover whole proteins. However, top-down tandem mass spectra, even combined, rarely provide full ion fragmentation coverage of a protein. We propose an algorithm, TBNovo, for de novo protein sequencing by combining top-down and bottom-up MS. In TBNovo, a top-down tandem mass spectrum is utilized as a scaffold, and bottom-up tandem mass spectra are aligned to the scaffold to increase sequence coverage. Experiments on data sets of two proteins showed that TBNovo achieved high sequence coverage and high sequence accuracy.
Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson
2012-06-01
The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.
Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun
2017-10-06
Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.
Tandem-repeat protein domains across the tree of life.
Jernigan, Kristin K; Bordenstein, Seth R
2015-01-01
Tandem-repeat protein domains, composed of repeated units of conserved stretches of 20-40 amino acids, are required for a wide array of biological functions. Despite their diverse and fundamental functions, there has been no comprehensive assessment of their taxonomic distribution, incidence, and associations with organismal lifestyle and phylogeny. In this study, we assess for the first time the abundance of armadillo (ARM) and tetratricopeptide (TPR) repeat domains across all three domains in the tree of life and compare the results to our previous analysis on ankyrin (ANK) repeat domains in this journal. All eukaryotes and a majority of the bacterial and archaeal genomes analyzed have a minimum of one TPR and ARM repeat. In eukaryotes, the fraction of ARM-containing proteins is approximately double that of TPR and ANK-containing proteins, whereas bacteria and archaea are enriched in TPR-containing proteins relative to ARM- and ANK-containing proteins. We show in bacteria that phylogenetic history, rather than lifestyle or pathogenicity, is a predictor of TPR repeat domain abundance, while neither phylogenetic history nor lifestyle predicts ARM repeat domain abundance. Surprisingly, pathogenic bacteria were not enriched in TPR-containing proteins, which have been associated within virulence factors in certain species. Taken together, this comparative analysis provides a newly appreciated view of the prevalence and diversity of multiple types of tandem-repeat protein domains across the tree of life. A central finding of this analysis is that tandem repeat domain-containing proteins are prevalent not just in eukaryotes, but also in bacterial and archaeal species.
CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats.
Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine
2008-07-01
Clustered regularly interspaced short palindromic repeat (CRISPR) elements are a particular family of tandem repeats present in prokaryotic genomes, in almost all archaea and in about half of bacteria, and which participate in a mechanism of acquired resistance against phages. They consist in a succession of direct repeats (DR) of 24-47 bp separated by similar sized unique sequences (spacers). In the large majority of cases, the direct repeats are highly conserved, while the number and nature of the spacers are often quite diverse, even among strains of a same species. Furthermore, the acquisition of new units (DR + spacer) was shown to happen almost exclusively on one side of the locus. Therefore, the CRISPR presents an interesting genetic marker for comparative and evolutionary analysis of closely related bacterial strains. CRISPRcompar is a web service created to assist biologists in the CRISPR typing process. Two tools facilitates the in silico investigation: CRISPRcomparison and CRISPRtionary. This website is freely accessible at http://crispr.u-psud.fr/CRISPRcompar/.
Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools.
Cer, Regina Z; Donohue, Duncan E; Mudunuri, Uma S; Temiz, Nuri A; Loss, Michael A; Starner, Nathan J; Halusa, Goran N; Volfovsky, Natalia; Yi, Ming; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M
2013-01-01
The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.
Turner, Peter C; Yomano, Lorraine P; Jarboe, Laura R; York, Sean W; Baggett, Christy L; Moritz, Brélan E; Zentz, Emily B; Shanmugam, K T; Ingram, Lonnie O
2012-04-01
Escherichia coli KO11 (ATCC 55124) was engineered in 1990 to produce ethanol by chromosomal insertion of the Zymomonas mobilis pdc and adhB genes into E. coli W (ATCC 9637). KO11FL, our current laboratory version of KO11, and its parent E. coli W were sequenced, and contigs assembled into genomic sequences using optical NcoI restriction maps as templates. E. coli W contained plasmids pRK1 (102.5 kb) and pRK2 (5.4 kb), but KO11FL only contained pRK2. KO11FL optical maps made with AflII and with BamHI showed a tandem repeat region, consisting of at least 20 copies of a 10-kb unit. The repeat region was located at the insertion site for the pdc, adhB, and chloramphenicol-resistance genes. Sequence coverage of these genes was about 25-fold higher than average, consistent with amplification of the foreign genes that were inserted as circularized DNA. Selection for higher levels of chloramphenicol resistance originally produced strains with higher pdc and adhB expression, and hence improved fermentation performance, by increasing the gene copy number. Sequence data for an earlier version of KO11, ATCC 55124, indicated that multiple copies of pdc adhB were present. Comparison of the W and KO11FL genomes showed large inversions and deletions in KO11FL, mostly enabled by IS10, which is absent from W but present at 30 sites in KO11FL. The early KO11 strain ATCC 55124 had no rearrangements, contained only one IS10, and lacked most accumulated single nucleotide polymorphisms (SNPs) present in KO11FL. Despite rearrangements and SNPs in KO11FL, fermentation performance was equal to that of ATCC 55124.
Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen
2015-01-01
Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution.
Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen
2015-01-01
Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution. PMID:26136762
Yokoyama, Eiji; Hashimoto, Ruiko; Etoh, Yoshiki; Ichihara, Sachiko; Horikawa, Kazumi; Uchimura, Masako
2011-01-01
The distribution of insertion sequence (IS) 629 among strains of enterohemorrhagic Escherichia coli serovar O157 (O157) was investigated and compared with the strain lineages defined by lineage specific polymorphism assay-6 (LSPA-6) to demonstrate the effectiveness of IS629 analysis for population genetics analysis. Using pulsed-field gel electrophoresis and variable-number tandem repeat typing, 140 strains producing both VT1 and VT2 and 98 strains producing only VT2 were selected from a total of 592 strains isolated from patients and asymptomatic carriers in Chiba Prefecture, Japan, during 2003-2008. By LSPA-6 analysis, six strains had atypical amplicon sizes in their Z5935 loci and five strains had atypical amplicon sizes in their arp-iclR intergenic regions. Sequence analyses of PCR amplified DNAs showed that five of the six loci used for LSPA-6 analysis had tandem repeats and the allele changes were due to changes in the number of tandem repeats. Subculturing and long-term incubation was found to have no detectable effect on the lineages defined by LSPA-6 analysis, demonstrating the robustness of LSPA-6 analysis. Minimum spanning tree analysis reconstruction revealed that strains in lineage I, I/II, and II clustered on separate branches, indicating that the distribution of IS629 was biased among O157 strains in different lineages. Strains with LSPA-6 codes 231111, 211113, and 211114 had atypical amplicon sizes and were clustered in lineage I/II branch, and strains with LSPA-6 codes 212114, 221123, 221223, 222123, 222224, 242123, 252123, and 242222 had atypical amplicon sizes and clustered in lineage II branches. Linkage disequilibrium was observed in strains in every lineage when the standardized index of association was calculated using IS629 distribution data. Therefore, the distribution analysis of IS629 may be effective for population genetics analysis of O157 due to the biased IS629 distribution among strains in the three O157 lineages. Copyright © 2010 Elsevier B.V. All rights reserved.
Giovannetti, Elisa; Ugrasena, Dewa G; Supriyadi, Eddy; Vroling, Laura; Azzarello, Antonino; de Lange, Desiree; Peters, Godefridus J; Veerman, Anjo J P; Cloos, Jacqueline
2008-01-01
Genetic variations in the polymorphic tandem repeat sequence of the enhancer region of the thymidylate synthase promoter (TSER), as well as in methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism, influence methotrexate sensitivity. We studied these polymorphisms in children with acute lymphoblastic leukaemia (ALL) and in subjects without malignancy in Indonesia and Holland. The frequencies of TT and CT genotypes were two-fold higher in Dutch children. The TSER 3R/3R repeat was three-fold more frequent in the Indonesian children, while the 2R/2R repeat was only 1% compared to 21% in the Dutch children. No differences of these polymorphisms were found between ALL cells and normal blood cells, indicating an ethnic rather than leukemic origin. These results may have implications for treatment of Indonesian children with ALL.
Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P
1998-10-20
Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.
Alibakhshi, Reza; Moradi, Keivan; Biglari, Mostafa; Shafieenia, Samaneh
2018-05-01
Phenylketonuria (PKU) is one of the most common known inherited metabolic diseases. The present study aimed to investigate the status of molecular defects in phenylalanine hydroxylase ( PAH ) gene in western Iranian PKU patients (predominantly from Kermanshah, Hamadan, and Lorestan provinces) during 2014-2016. Additionally, the results were compared with similar studies in Iran. Nucleotide sequence analysis of all 13 exons and their flanking intronic regions of the PAH gene was performed in 18 western Iranian PKU patients. Moreover, a variable number of tandem repeat (VNTR) located in the PAH gene was studied. The results revealed a mutational spectrum encompassing 11 distinct mutations distributed along the PAH gene sequence on 34 of the 36 mutant alleles (diagnostic efficiency of 94.4%). Also, four PAH VNTR alleles (with repeats of 3, 7, 8 and 9) were detected. The three most frequent mutations were IVS9+5G>A, IVS7-5T>C, and p.P281L with the frequency of 27.8%, 11%, and 11%, respectively. The results showed that there is not only a consanguineous relation, but also a difference in PAH characters of mutations between Kermanshah and the other two parts of western Iran (Hamadan and Lorestan). Also, it seems that the spectrum of mutations in western Iran is relatively distinct from other parts of the country, suggesting that this region might be a special PAH gene distribution region. Moreover, our findings can be useful in the identification of genotype to phenotype relationship in patients, and provide future abilities for confirmatory diagnostic testing, prognosis, and predict the severity of PKU patients.
Marini, Emanuela; Palmieri, Claudio; Magi, Gloria; Facinelli, Bruna
2015-07-09
Integrative conjugative elements (ICEs) are mobile genetic elements that reside in the chromosome but retain the ability to undergo excision and to transfer by conjugation. Genes involved in drug resistance, virulence, or niche adaptation are often found among backbone genes as cargo DNA. We recently characterized in Streptococcus suis an ICE (ICESsu32457) carrying resistance genes [tet(O/W/32/O), tet(40), erm(B), aphA, and aadE] in the 15K unstable genetic element, which is flanked by two ∼1.3kb direct repeats. Remarkably, ∼1.3-kb sequences are conserved in ICESa2603 of Streptococcus agalactiae 2603V/R, which carry heavy metal resistance genes cadC/cadA and mer. In matings between S. suis 32457 (donor) and S. agalactiae 2603V/R (recipient), transconjugants were obtained. PCR experiments, PFGE, and sequence analysis of transconjugants demonstrated a tandem array between ICESsu32457 and ICESa2603. Matings between tandem array-containing S. agalactiae 2603V/R (donor) and Streptococcus pyogenes RF12 (recipient) yielded a single transconjugant containing a hybrid ICE, here named ICESa2603/ICESsu32457. The hybrid formed by recombination of the left ∼1.3-kb sequence of ICESsu32457 and the ∼1.3-kb sequence of ICESa2603. Interestingly, the hybrid ICE was transferable between S. pyogenes strains, thus demonstrating that it behaves as a conventional ICE. These findings suggest that both tandem arrays and hybrid ICEs may contribute to the evolution of antibiotic resistance in streptococci, creating novel mobile elements capable of disseminating new combinations of antibiotic resistance genes. Copyright © 2015 Elsevier B.V. All rights reserved.
Schwaiger, F W; Weyers, E; Epplen, C; Brün, J; Ruff, G; Crawford, A; Epplen, J T
1993-09-01
Twenty-one different caprine and 13 ovine MHC-DRB exon 2 sequences were determined including part of the adjacent introns containing simple repetitive (gt)n(ga)m elements. The positions for highly polymorphic DRB amino acids vary slightly among ungulates and other mammals. From man and mouse to ungulates the basic (gt)n(ga)m structure is fixed in evolution for 7 x 10(7) years whereas ample variations exist in the tandem (gt)n and (ga)m dinucleotides and especially their "degenerated" derivatives. Phylogenetic trees for the alpha-helices and beta-pleated sheets of the ungulate DRB sequences suggest different evolutionary histories. In hoofed animals as well as in humans DRB beta-sheet encoding sequences and adjacent intronic repeats can be assembled into virtually identical groups suggesting coevolution of noncoding as well as coding DNA. In contrast alpha-helices and C-terminal parts of the first DRB domain evolve distinctly. In the absence of a defined mechanism causing specific, site-directed mutations, double-recombination or gene-conversion-like events would readily explain this fact. The role of the intronic simple (gt)n(ga)m repeat is discussed with respect to these genetic exchange mechanisms during evolution.
Ribeyre, Cyril; Lopes, Judith; Boulé, Jean-Baptiste; Piazza, Aurèle; Guédin, Aurore; Zakian, Virginia A; Mergny, Jean-Louis; Nicolas, Alain
2009-05-01
In budding yeast, the Pif1 DNA helicase is involved in the maintenance of both nuclear and mitochondrial genomes, but its role in these processes is still poorly understood. Here, we provide evidence for a new Pif1 function by demonstrating that its absence promotes genetic instability of alleles of the G-rich human minisatellite CEB1 inserted in the Saccharomyces cerevisiae genome, but not of other tandem repeats. Inactivation of other DNA helicases, including Sgs1, had no effect on CEB1 stability. In vitro, we show that CEB1 repeats formed stable G-quadruplex (G4) secondary structures and the Pif1 protein unwinds these structures more efficiently than regular B-DNA. Finally, synthetic CEB1 arrays in which we mutated the potential G4-forming sequences were no longer destabilized in pif1Delta cells. Hence, we conclude that CEB1 instability in pif1Delta cells depends on the potential to form G-quadruplex structures, suggesting that Pif1 could play a role in the metabolism of G4-forming sequences.
Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism.
Gur-Arie, R; Cohen, C J; Eitan, Y; Shelef, L; Hallerman, E M; Kashi, Y
2000-01-01
Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.
Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun
2016-01-01
Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
Pedersen, Niels; Liu, Hongwei; Millon, Lee; Greer, Kimberly
2011-01-01
A significantly increased risk for a number of autoimmune and infectious diseases in purebred and mixed-breed dogs has been associated with certain alleles or allele combinations of the dog leukocyte antigen (DLA) class II complex containing the DRB1, DQA1, and DQB1 genes. The exact level of risk depends on the specific disease, the alleles in question, and whether alleles exist in a homozygous or heterozygous state. The gold standard for identifying high-risk alleles and their zygosity has involved direct sequencing of the exon 2 regions of each of the 3 genes. However, sequencing and identification of specific alleles at each of the 3 loci are relatively expensive and sequencing techniques are not ideal for additional parentage or identity determination. However, it is often possible to get the same information from sequencing only 1 gene given the small number of possible alleles at each locus in purebred dogs, extensive homozygosity, and tendency for disease-causing alleles at each of the 3 loci to be strongly linked to each other into haplotypes. Therefore, genetic testing in purebred dogs with immune diseases can be often simplified by sequencing alleles at 1 rather than 3 loci. Further simplification of genetic tests for canine immune diseases can be achieved by the use of alternative genetic markers in the DLA class II region that are also strongly linked with the disease genotype. These markers consist of either simple tandem repeats or single nucleotide polymorphisms that are also in strong linkage with specific DLA class II genotypes and/or haplotypes. The current study uses necrotizing meningoencephalitis of Pug dogs as a paradigm to assess simple alternative genetic tests for disease risk. It was possible to attain identical necrotizing meningoencephalitis risk assessments to 3-locus DLA class II sequencing by sequencing only the DQB1 gene, using 3 DLA class II-linked simple tandem repeat markers, or with a small single nucleotide polymorphism array designed to identify breed-specific DQB1 alleles.
Han, Limin; Chen, Chen; Wang, Zhezhi
2018-01-01
Epipremnum aureum is an important foliage plant in the Araceae family. In this study, we have sequenced the complete chloroplast genome of E. aureum by using Illumina Hiseq sequencing platforms. This genome is a double-stranded circular DNA sequence of 164,831 bp that contains 35.8% GC. The two inverted repeats (IRa and IRb; 26,606 bp) are spaced by a small single-copy region (22,868 bp) and a large single-copy region (88,751 bp). The chloroplast genome has 131 (113 unique) functional genes, including 86 (79 unique) protein-coding genes, 37 (30 unique) tRNA genes, and eight (four unique) rRNA genes. Tandem repeats comprise the majority of the 43 long repetitive sequences. In addition, 111 simple sequence repeats are present, with mononucleotides being the most common type and di- and tetranucleotides being infrequent events. Positive selection pressure on rps12 in the E. aureum chloroplast has been demonstrated via synonymous and nonsynonymous substitution rates and selection pressure sites analyses. Ycf15 and infA are pseudogenes in this species. We constructed a Maximum Likelihood phylogenetic tree based on the complete chloroplast genomes of 38 species from 13 families. Those results strongly indicated that E. aureum is positioned as the sister of Colocasia esculenta within the Araceae family. This work may provide information for further study of the molecular phylogenetic relationships within Araceae, as well as molecular markers and breeding novel varieties by chloroplast genetic-transformation of E. aureum in particular. PMID:29529038
Linehan, Erin K.; Schrader, Carol E.; Stavnezer, Janet
2015-01-01
Activation-induced cytidine deaminase (AID) is required for initiation of Ig class switch recombination (CSR) and somatic hypermutation (SHM) of antibody genes during immune responses. AID has also been shown to induce chromosomal translocations, mutations, and DNA double-strand breaks (DSBs) involving non-Ig genes in activated B cells. To determine what makes a DNA site a target for AID-induced DSBs, we identify off-target DSBs induced by AID by performing chromatin immunoprecipitation (ChIP) for Nbs1, a protein that binds DSBs, followed by deep sequencing (ChIP-Seq). We detect and characterize hundreds of off-target AID-dependent DSBs. Two types of tandem repeats are highly enriched within the Nbs1-binding sites: long CA repeats, which can form Z-DNA, and tandem pentamers containing the AID target hotspot WGCW. These tandem repeats are not nearly as enriched at AID-independent DSBs, which we also identified. Msh2, a component of the mismatch repair pathway and important for genome stability, increases off-target DSBs, similar to its effect on Ig switch region DSBs, which are required intermediates during CSR. Most of the off-target DSBs are two-ended, consistent with generation during G1 phase, similar to DSBs in Ig switch regions. However, a minority are one-ended, presumably due to conversion of single-strand breaks to DSBs during replication. One-ended DSBs are repaired by processes involving homologous recombination, including break-induced replication repair, which can lead to genome instability. Off-target DSBs, especially those present during S phase, can lead to chromosomal translocations, deletions and gene amplifications, resulting in the high frequency of B cell lymphomas derived from cells that express or have expressed AID. PMID:26263206
Kalmykova, Alla I.; Shevelyov, Yury Y.; Dobritsa, Anna A.; Gvozdev, Vladimir A.
1997-01-01
The acquisition of autosomal fertility genes has been proposed to be an important process in human Y chromosome evolution. For example, the Y-linked fertility factor DAZ (Deleted in Azoospermia) appears to have arisen after the transposition and tandem amplification of the autosomal DAZH gene. The Drosophila melanogaster Y chromosome contains tandemly repeated Su(Ste) units that are thought to affect male fertility as suppressors of the homologous X-linked Stellate repeats. Here we report the detection of a testis-expressed autosomal gene, SSL [Su(Ste)-like], that appears to be an ancestor of the Y-linked Su(Ste) units. SSL encodes a casein kinase 2 (CK2) β-subunit-like protein. Its putative ORF shares extensive (45%) homology with the genuine β-subunit of CK2 and retains the conserved C-terminal and Glu/Asp-rich domains that are essential for CK2 holoenzyme regulation. SSL maps within region 60D1–2 of D. melanogaster and D. simulans polytene chromosomes. We present evidence that SSL was derived from the genuine βCK2 gene by reverse transcription. This event resulted in the loss of the first three introns in the coding region of the SSL ancestor gene. Evolutionary analysis indicates that SSL has evolved under selective pressure at the translational level. Its sequence, especially in the 3′ region, is much closer to the Y-linked Su(Ste) tandem repeats than to the βCK2 gene. These results suggest that the acquisition of testis-specific autosomal genes may be important for the evolution of Drosophila as well as human Y chromosomes. PMID:9177211
Tandem repeats analysis for the high resolution phylogenetic analysis of Yersinia pestis
Pourcel, C; André-Mazeaud, F; Neubauer, H; Ramisse, F; Vergnaud, G
2004-01-01
Background Yersinia pestis, the agent of plague, is a young and highly monomorphic species. Three biovars, each one thought to be associated with the last three Y. pestis pandemics, have been defined based on biochemical assays. More recently, DNA based assays, including DNA sequencing, IS typing, DNA arrays, have significantly improved current knowledge on the origin and phylogenetic evolution of Y. pestis. However, these methods suffer either from a lack of resolution or from the difficulty to compare data. Variable number of tandem repeats (VNTRs) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses in a growing number of pathogens and have given promising results for Y. pestis as well. Results In this study we have genotyped 180 Y. pestis isolates by multiple locus VNTR analysis (MLVA) using 25 markers. Sixty-one different genotypes were observed. The three biovars were distributed into three main branches, with some exceptions. In particular, the Medievalis phenotype is clearly heterogeneous, resulting from different mutation events in the napA gene. Antiqua strains from Asia appear to hold a central position compared to Antiqua strains from Africa. A subset of 7 markers is proposed for the quick comparison of a new strain with the collection typed here. This can be easily achieved using a Web-based facility, specifically set-up for running such identifications. Conclusion Tandem-repeat typing may prove to be a powerful complement to the existing phylogenetic tools for Y. pestis. Typing can be achieved quickly at a low cost in terms of consumables, technical expertise and equipment. The resulting data can be easily compared between different laboratories. The number and selection of markers will eventually depend upon the type and aim of investigations. PMID:15186506
Tandem-repeat protein domains across the tree of life
Jernigan, Kristin K.
2015-01-01
Tandem-repeat protein domains, composed of repeated units of conserved stretches of 20–40 amino acids, are required for a wide array of biological functions. Despite their diverse and fundamental functions, there has been no comprehensive assessment of their taxonomic distribution, incidence, and associations with organismal lifestyle and phylogeny. In this study, we assess for the first time the abundance of armadillo (ARM) and tetratricopeptide (TPR) repeat domains across all three domains in the tree of life and compare the results to our previous analysis on ankyrin (ANK) repeat domains in this journal. All eukaryotes and a majority of the bacterial and archaeal genomes analyzed have a minimum of one TPR and ARM repeat. In eukaryotes, the fraction of ARM-containing proteins is approximately double that of TPR and ANK-containing proteins, whereas bacteria and archaea are enriched in TPR-containing proteins relative to ARM- and ANK-containing proteins. We show in bacteria that phylogenetic history, rather than lifestyle or pathogenicity, is a predictor of TPR repeat domain abundance, while neither phylogenetic history nor lifestyle predicts ARM repeat domain abundance. Surprisingly, pathogenic bacteria were not enriched in TPR-containing proteins, which have been associated within virulence factors in certain species. Taken together, this comparative analysis provides a newly appreciated view of the prevalence and diversity of multiple types of tandem-repeat protein domains across the tree of life. A central finding of this analysis is that tandem repeat domain-containing proteins are prevalent not just in eukaryotes, but also in bacterial and archaeal species. PMID:25653910
Complexity: an internet resource for analysis of DNA sequence complexity
Orlov, Y. L.; Potapov, V. N.
2004-01-01
The search for DNA regions with low complexity is one of the pivotal tasks of modern structural analysis of complete genomes. The low complexity may be preconditioned by strong inequality in nucleotide content (biased composition), by tandem or dispersed repeats or by palindrome-hairpin structures, as well as by a combination of all these factors. Several numerical measures of textual complexity, including combinatorial and linguistic ones, together with complexity estimation using a modified Lempel–Ziv algorithm, have been implemented in a software tool called ‘Complexity’ (http://wwwmgs.bionet.nsc.ru/mgs/programs/low_complexity/). The software enables a user to search for low-complexity regions in long sequences, e.g. complete bacterial genomes or eukaryotic chromosomes. In addition, it estimates the complexity of groups of aligned sequences. PMID:15215465
Evolutionary origins of a novel host plant detoxification gene in butterflies.
Fischer, Hanna M; Wheat, Christopher W; Heckel, David G; Vogel, Heiko
2008-05-01
Chemical interactions between plants and their insect herbivores provide an excellent opportunity to study the evolution of species interactions on a molecular level. Here, we investigate the molecular evolutionary events that gave rise to a novel detoxifying enzyme (nitrile-specifier protein [NSP]) in the butterfly family Pieridae, previously identified as a coevolutionary key innovation. By generating and sequencing expressed sequence tags, genomic libraries, and screening databases we found NSP to be a member of an insect-specific gene family, which we characterized and named the NSP-like gene family. Members consist of variable tandem repeats, are gut expressed, and are found across Insecta evolving in a dynamic, ongoing birth-death process. In the Lepidoptera, multiple copies of single-domain major allergen genes are present and originate via tandem duplications. Multiple domain genes are found solely within the brassicaceous-feeding Pieridae butterflies, one of them being NSP and another called major allergen (MA). Analyses suggest that NSP and its paralog MA have a unique single-domain evolutionary origin, being formed by intragenic domain duplication followed by tandem whole-gene duplication. Duplicates subsequently experienced a period of relaxed constraint followed by an increase in constraint, perhaps after neofunctionalization. NSP and its ortholog MA are still experiencing high rates of change, reflecting a dynamic evolution consistent with the known role of NSP in plant-insect interactions. Our results provide direct evidence to the hypothesis that gene duplication is one of the driving forces for speciation and adaptation, showing that both within- and whole-gene tandem duplications are a powerful force underlying evolutionary adaptation.
Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan
2013-01-01
Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.
Molecular Structure and Transformation of the Glucose Dehydrogenase Gene in Drosophila Melanogaster
Whetten, R.; Organ, E.; Krasney, P.; Cox-Foster, D.; Cavener, D.
1988-01-01
We have precisely mapped and sequenced the three 5' exons of the Drosophila melanogaster Gld gene and have identified the start sites for transcription and translation. The first exon is composed of 335 nucleotides and does not contain any putative translation start codons. The second exon is separated from the first exon by 8 kb and contains the Gld translation start codon. The inferred amino acid sequence of the amino terminus contains two unusual features: three tandem repeats of serine-alanine, and a relatively high density of cysteine residues. P element-mediated transformation experiments demonstrated that a 17.5-kb genomic fragment contains the functional and regulatory components of the Gld gene. PMID:3143620
The complete sequence of the mitochondrial genome of Arctic fox (Alopex lagopus).
Yan, Shou-Qing; Guo, Peng-Cheng; Yue, Yuan; Li, Wan-Hong; Bai, Chun-Yan; Li, Yu-Mei; Sun, Jin-Hai; Zhao, Zhi-Hui
2016-11-01
In the present study, the complete mitochondrial genome sequence of Arctic fox (Alopex lagopus) was determined for the first time. It has a total length of 16,656 bp, and contains 13 protein-coding genes, 22 tRNA genes, 2 ribosome RNA genes and 1 control region. The nucleotide composition is 31.3% for A, 26.2% for C, 14.8% for G and 27.7% for T, respectively. The D-loop region located between tRNA Pro and tRNA Phe contains a (ACACGTACACGCAT) 18 tandem repeat array. The data will be useful for the investigation of the genetic structure and diversity in the natural and farmed population of Arctic foxes.
Isolation, propagation, genome analysis and epidemiology of HKU1 betacoronaviruses
Shrivastava, Susmita; Berglund, Andrew; Qian, Zhaohui; Góes, Luiz Gustavo Bentim; Halpin, Rebecca A.; Fedorova, Nadia; Ransier, Amy; Weston, Philip A.; Durigon, Edison Luiz; Jerez, José Antonio; Robinson, Christine C.; Town, Christopher D.; Holmes, Kathryn V.
2014-01-01
From 1 January 2009 to 31 May 2013, 15 287 respiratory specimens submitted to the Clinical Virology Laboratory at the Children’s Hospital Colorado were tested for human coronavirus RNA by reverse transcription-PCR. Human coronaviruses HKU1, OC43, 229E and NL63 co-circulated during each of the respiratory seasons but with significant year-to-year variability, and cumulatively accounted for 7.4–15.6 % of all samples tested during the months of peak activity. A total of 79 (0.5 % prevalence) specimens were positive for human betacoronavirus HKU1 RNA. Genotypes HKU1 A and B were both isolated from clinical specimens and propagated on primary human tracheal–bronchial epithelial cells cultured at the air–liquid interface and were neutralized in vitro by human intravenous immunoglobulin and by polyclonal rabbit antibodies to the spike glycoprotein of HKU1. Phylogenetic analysis of the deduced amino acid sequences of seven full-length genomes of Colorado HKU1 viruses and the spike glycoproteins from four additional HKU1 viruses from Colorado and three from Brazil demonstrated remarkable conservation of these sequences with genotypes circulating in Hong Kong and France. Within genotype A, all but one of the Colorado HKU1 sequences formed a unique subclade defined by three amino acid substitutions (W197F, F613Y and S752F) in the spike glycoprotein and exhibited a unique signature in the acidic tandem repeat in the N-terminal region of the nsp3 subdomain. Elucidating the function of and mechanisms responsible for the formation of these varying tandem repeats will increase our understanding of the replication process and pathogenicity of HKU1 and potentially of other coronaviruses. PMID:24394697
Arent, Z; Frizzell, C; Gilmore, C; Allen, A; Ellis, W A
2016-07-15
Strains of Leptospira interrogans belonging to two very closely related serovars - Bratislava and Muenchen - have been associated with disease in domestic animals, in particular pigs, but also in horses and dogs. Similar strains have also been recovered from various wildlife species. Their epidemiology is poorly understood. Two hundred and forty seven such isolates, from UK domestic animal and wildlife species, were examined by restriction endonuclease analysis in an attempt to elucidate their epidemiology. A representative sub-sample of 65 of these isolates was further examined by multiple-locus variable-number tandem repeat analysis and 22 by secY sequencing. Ten restriction pattern types were identified. The majority of isolates fell into one of three restriction endonuclease analysis pattern types designated B2a, B2b and M2a. B2a was ubiquitous and was isolated from 10 species and represented the majority of the horse and all dog isolates. B2b was very different, being isolated only from pigs, indicating that this type was maintained by pigs. The pattern M2a was reported for the majority of isolates from pigs but also was common in small rodents isolates. Five restriction pattern types were found only in wildlife suggesting that they are unlikely to pose a disease threat to domestic animals. Multiple-locus variable-number tandem repeat analysis identified six clusters. The REA types B2a and B2b were all found in one MLVA cluster while the majority of the M2a strains examined occurred in another cluster. The secY sequencing detected only one sequence type, clustered with other serovars of Leptospira interrogans. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oetting, W.S.; Lee, H.K.; Flanders, D.J.
The use of short tandem repeat polymorphisms (STRPs) as marker loci for linkage analysis is becoming increasingly important due to their large numbers in the human genome and their high degree of polymorphism. Fluorescence-based detection of the STRP pattern with an automated DNA sequencer has improved the efficiency of this technique by eliminating the need for radioactivity and producing a digitized autoradiogram-like image that can be used for computer analysis. In an effort to simplify the procedure and to reduce the cost of fluorescence STRP analysis, we have developed a technique known as multiplexing STRPs with tailed primers (MSTP) usingmore » primers that have a 19-bp extension, identical to the sequence of an M13 sequencing primer, on the 5{prime} end of the forward primer in conjunction with multiplexing several primer pairs in a single polymerase chain reaction (PCR) amplification. The banding pattern is detected with the addition of the M13 primer-dye conjugate as the sole primer conjugated to the fluorescent dye, eliminating the need for direct conjugation of the infrared fluorescent dye to the STRP primers. The use of MSTP for linkage analysis greatly reduces the number of PCR reactions. Up to five primer pairs can be multiplexed together in the same reaction. At present, a set of 148 STRP markers spaced at an average genetic distance of 28 cM throughout the autosomal genome can be analyzed in 37 sets of multiplexed amplification reactions. We have automated the analysis of these patterns for linkage using software that both detects the STRP banding pattern and determines their sizes. This information can then be exported in a user-defined format from a database manager for linkage analysis. 15 refs., 2 figs., 4 tabs.« less
Epidemiological and Genomic Landscape of Azole Resistance Mechanisms in Aspergillus Fungi
Hagiwara, Daisuke; Watanabe, Akira; Kamei, Katsuhiko; Goldman, Gustavo H.
2016-01-01
Invasive aspergillosis is a life-threatening mycosis caused by the pathogenic fungus Aspergillus. The predominant causal species is Aspergillus fumigatus, and azole drugs are the treatment of choice. Azole drugs approved for clinical use include itraconazole, voriconazole, posaconazole, and the recently added isavuconazole. However, epidemiological research has indicated that the prevalence of azole-resistant A. fumigatus isolates has increased significantly over the last decade. What is worse is that azole-resistant strains are likely to have emerged not only in response to long-term drug treatment but also because of exposure to azole fungicides in the environment. Resistance mechanisms include amino acid substitutions in the target Cyp51A protein, tandem repeat sequence insertions at the cyp51A promoter, and overexpression of the ABC transporter Cdr1B. Environmental azole-resistant strains harboring the association of a tandem repeat sequence and punctual mutation of the Cyp51A gene (TR34/L98H and TR46/Y121F/T289A) have become widely disseminated across the world within a short time period. The epidemiological data also suggests that the number of Aspergillus spp. other than A. fumigatus isolated has risen. Some non-fumigatus species intrinsically show low susceptibility to azole drugs, imposing the need for accurate identification, and drug susceptibility testing in most clinical cases. Currently, our knowledge of azole resistance mechanisms in non-fumigatus Aspergillus species such as A. flavus, A. niger, A. tubingensis, A. terreus, A. fischeri, A. lentulus, A. udagawae, and A. calidoustus is limited. In this review, we present recent advances in our understanding of azole resistance mechanisms particularly in A. fumigatus. We then provide an overview of the genome sequences of non-fumigatus species, focusing on the proteins related to azole resistance mechanisms. PMID:27708619
Filipino DNA variation at 12 X-chromosome short tandem repeat markers.
Salvador, Jazelyn M; Apaga, Dame Loveliness T; Delfin, Frederick C; Calacal, Gayvelline C; Dennis, Sheila Estacio; De Ungria, Maria Corazon A
2018-06-08
Demands for solving complex kinship scenarios where only distant relatives are available for testing have risen in the past years. In these instances, other genetic markers such as X-chromosome short tandem repeat (X-STR) markers are employed to supplement autosomal and Y-chromosomal STR DNA typing. However, prior to use, the degree of STR polymorphism in the population requires evaluation through generation of an allele or haplotype frequency population database. This population database is also used for statistical evaluation of DNA typing results. Here, we report X-STR data from 143 unrelated Filipino male individuals who were genotyped via conventional polymerase chain reaction-capillary electrophoresis (PCR-CE) using the 12 X-STR loci included in the Investigator ® Argus X-12 kit (Qiagen) and via massively parallel sequencing (MPS) of seven X-STR loci included in the ForenSeq ™ DNA Signature Prep kit of the MiSeq ® FGx ™ Forensic Genomics System (Illumina). Allele calls between PCR-CE and MPS systems were consistent (100% concordance) across seven overlapping X-STRs. Allele and haplotype frequencies and other parameters of forensic interest were calculated based on length (PCR-CE, 12 X-STRs) and sequence (MPS, seven X-STRs) variations observed in the population. Results of our study indicate that the 12 X-STRs in the PCR-CE system are highly informative for the Filipino population. MPS of seven X-STR loci identified 73 X-STR alleles compared with 55 X-STR alleles that were identified solely by length via PCR-CE. Of the 73 sequence-based alleles observed, six alleles have not been reported in the literature. The population data presented here may serve as a reference Philippine frequency database of X-STRs for forensic casework applications. Copyright © 2018 Elsevier B.V. All rights reserved.
Tandem Repeated Irritation Test (TRIT) Studies and Clinical Relevance: Post 2006.
Reddy, Rasika; Maibach, Howard
2018-06-11
Single or multiple applications of irritants can lead to occupational contact dermatitis, and most commonly irritant contact dermatitis (ICD). Tandem irritation, the sequential application of two irritants to a target skin area, has been studied using the Tandem Repeated Irritation Test (TRIT) to provide a more accurate representation of skin irritation. Here we present an update to Kartono's review on tandem irritation studies since 2006 [1]. We surveyed the literature available on PubMed, Embase, Google Scholar, and the UCSF Dermatology library databases since 2006. The studies included discuss the tandem effects of common chemical irritants, organic solvents, occlusion as well as clinical relevance - and enlarge our ability to discern whether multiple chemical exposures are more or less likely to enhance irritation.
The role of DNA repair in herpesvirus pathogenesis.
Brown, Jay C
2014-10-01
In cells latently infected with a herpesvirus, the viral DNA is present in the cell nucleus, but it is not extensively replicated or transcribed. In this suppressed state the virus DNA is vulnerable to mutagenic events that affect the host cell and have the potential to destroy the virus' genetic integrity. Despite the potential for genetic damage, however, herpesvirus sequences are well conserved after reactivation from latency. To account for this apparent paradox, I have tested the idea that host cell-encoded mechanisms of DNA repair are able to control genetic damage to latent herpesviruses. Studies were focused on homologous recombination-dependent DNA repair (HR). Methods of DNA sequence analysis were employed to scan herpesvirus genomes for DNA features able to activate HR. Analyses were carried out with a total of 39 herpesvirus DNA sequences, a group that included viruses from the alpha-, beta- and gamma-subfamilies. The results showed that all 39 genome sequences were enriched in two or more of the eight recombination-initiating features examined. The results were interpreted to indicate that HR can stabilize latent herpesvirus genomes. The results also showed, unexpectedly, that repair-initiating DNA features differed in alpha- compared to gamma-herpesviruses. Whereas inverted and tandem repeats predominated in alpha-herpesviruses, gamma-herpesviruses were enriched in short, GC-rich initiation sequences such as CCCAG and depleted in repeats. In alpha-herpesviruses, repair-initiating repeat sequences were found to be concentrated in a specific region (the S segment) of the genome while repair-initiating short sequences were distributed more uniformly in gamma-herpesviruses. The results suggest that repair pathways are activated differently in alpha- compared to gamma-herpesviruses. Copyright © 2014. Published by Elsevier Inc.
A new family of dispersed repeats from Brassica nigra: characterization and localization.
Kapila, R; Negi, M S; This, P; Delseny, M; Srivastava, P S; Lakshmikumaran, M
1996-11-01
The 459-bp HindIII (pBN-4) and the 1732-bp Eco RI (pBNE8) fragments from the Brassica nigra genome were cloned and shown to be members of a dispersed repeat family. Of the three major diploid Brassica species, the repeat pBN-4 was found to be highly specific for the B. nigra genome. The family also hybridized to Sinapis arvensis showing that B. nigra had a closer relationship with the S. arvensis genome than with B. oleracea or B. campestris. The clone pBNE8 showed homology to a number of tRNA species indicating that this family of repeats may have originated from a tRNA sequence. The species-specific 459-bp repeat pBN-4 was localized on the B. nigra chromosomes using monosomic addition lines. In addition to the localization of pBN-4, the chromosomal distribution of two other species-specific repeats, pBN34 and pBNBH35 (reported earlier), was studied. The dispersed repeats pBN-4 and pBNBH35 were found to be present on all of the chromosomes, whereas the tandem repeat pBN34 was localized on two chromosomes.
Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas
2013-01-01
Background and Aims The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. Methods A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100–500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Key Results Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S–5·8S–25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. Conclusions The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species. PMID:23666888
de Knegt, Leonardo V; Pires, Sara M; Löfström, Charlotta; Sørensen, Gitte; Pedersen, Karl; Torpdahl, Mia; Nielsen, Eva M; Hald, Tine
2016-03-01
Salmonella is an important cause of bacterial foodborne infections in Denmark. To identify the main animal-food sources of human salmonellosis, risk managers have relied on a routine application of a microbial subtyping-based source attribution model since 1995. In 2013, multiple locus variable number tandem repeat analysis (MLVA) substituted phage typing as the subtyping method for surveillance of S. Enteritidis and S. Typhimurium isolated from animals, food, and humans in Denmark. The purpose of this study was to develop a modeling approach applying a combination of serovars, MLVA types, and antibiotic resistance profiles for the Salmonella source attribution, and assess the utility of the results for the food safety decisionmakers. Full and simplified MLVA schemes from surveillance data were tested, and model fit and consistency of results were assessed using statistical measures. We conclude that loci schemes STTR5/STTR10/STTR3 for S. Typhimurium and SE9/SE5/SE2/SE1/SE3 for S. Enteritidis can be used in microbial subtyping-based source attribution models. Based on the results, we discuss that an adjustment of the discriminatory level of the subtyping method applied often will be required to fit the purpose of the study and the available data. The issues discussed are also considered highly relevant when applying, e.g., extended multi-locus sequence typing or next-generation sequencing techniques. © 2015 Society for Risk Analysis.
Chow, Chi-Nga; Zheng, Han-Qin; Wu, Nai-Yun; Chien, Chia-Hung; Huang, Hsien-Da; Lee, Tzong-Yi; Chiang-Hsieh, Yi-Fan; Hou, Ping-Fu; Yang, Tien-Yi; Chang, Wen-Chi
2016-01-04
Transcription factors (TFs) are sequence-specific DNA-binding proteins acting as critical regulators of gene expression. The Plant Promoter Analysis Navigator (PlantPAN; http://PlantPAN2.itps.ncku.edu.tw) provides an informative resource for detecting transcription factor binding sites (TFBSs), corresponding TFs, and other important regulatory elements (CpG islands and tandem repeats) in a promoter or a set of plant promoters. Additionally, TFBSs, CpG islands, and tandem repeats in the conserve regions between similar gene promoters are also identified. The current PlantPAN release (version 2.0) contains 16 960 TFs and 1143 TF binding site matrices among 76 plant species. In addition to updating of the annotation information, adding experimentally verified TF matrices, and making improvements in the visualization of transcriptional regulatory networks, several new features and functions are incorporated. These features include: (i) comprehensive curation of TF information (response conditions, target genes, and sequence logos of binding motifs, etc.), (ii) co-expression profiles of TFs and their target genes under various conditions, (iii) protein-protein interactions among TFs and their co-factors, (iv) TF-target networks, and (v) downstream promoter elements. Furthermore, a dynamic transcriptional regulatory network under various conditions is provided in PlantPAN 2.0. The PlantPAN 2.0 is a systematic platform for plant promoter analysis and reconstructing transcriptional regulatory networks. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Effect of Repeat Copy Number on Variable-Number Tandem Repeat Mutations in Escherichia coli O157:H7
Vogler, Amy J.; Keys, Christine; Nemoto, Yoshimi; Colman, Rebecca E.; Jay, Zack; Keim, Paul
2006-01-01
Variable-number tandem repeat (VNTR) loci have shown a remarkable ability to discriminate among isolates of the recently emerged clonal pathogen Escherichia coli O157:H7, making them a very useful molecular epidemiological tool. However, little is known about the rates at which these sequences mutate, the factors that affect mutation rates, or the mechanisms by which mutations occur at these loci. Here, we measure mutation rates for 28 VNTR loci and investigate the effects of repeat copy number and mismatch repair on mutation rate using in vitro-generated populations for 10 E. coli O157:H7 strains. We find single-locus rates as high as 7.0 × 10−4 mutations/generation and a combined 28-locus rate of 6.4 × 10−4 mutations/generation. We observed single- and multirepeat mutations that were consistent with a slipped-strand mispairing mutation model, as well as a smaller number of large repeat copy number mutations that were consistent with recombination-mediated events. Repeat copy number within an array was strongly correlated with mutation rate both at the most mutable locus, O157-10 (r2 = 0.565, P = 0.0196), and across all mutating loci. The combined locus model was significant whether locus O157-10 was included (r2 = 0.833, P < 0.0001) or excluded (r2 = 0.452, P < 0.0001) from the analysis. Deficient mismatch repair did not affect mutation rate at any of the 28 VNTRs with repeat unit sizes of >5 bp, although a poly(G) homomeric tract was destabilized in the mutS strain. Finally, we describe a general model for VNTR mutations that encompasses insertions and deletions, single- and multiple-repeat mutations, and their relative frequencies based upon our empirical mutation rate data. PMID:16740932
Cloning and expression of chitinases of Entamoebae.
de la Vega, H; Specht, C A; Semino, C E; Robbins, P W; Eichinger, D; Caplivski, D; Ghosh, S; Samuelson, J
1997-04-01
Entamoeba histolytica (Eh) and Entamoeba dispar (Ed) are protozoan parasites that infect hundreds of millions of persons. In the colonic lumen, amebae form chitin-walled cysts, the infectious stage of the parasite. Entamoeba invadens (Ei), which infects reptiles and is a model for amebic encystation, produces chitin synthase and chitinase during encystation. Ei cysts formation is blocked by the chitinase-inhibitor allosamidin. Here molecular cloning techniques were used to identify homologous genes of Eh, Ed, and Ei that encode chitinases (EC 3.2.1.14). The Eh gene (Eh cht1) predicts a 507-amino acid (aa) enzyme, which has 93 and 74% positional identities with Ed and Ei chitinases, respectively. The Entamoeba chitinases have signal sequences, followed by acidic and hydrophilic sequences composed of multiple tandemly arranged 7-aa repeats (Eh and Ed) or repeats varying in length (Ei). The aa compositions of the chitinase repeats are similar to those of the repeats of the Eh and Ed Ser-rich proteins. The COOH-terminus of each chitinase has a catalytic domain, which resembles those of Brugia malayi (33% positional identity) and Manduca sexta (29%). Recombinant entamoeba chitinases are precipitated by chitin and show chitinase activity with chitooligosacharide substrates. Consistent with previous biochemical data, chitinase mRNAs are absent in Ei trophozoites and accumulate to maximal levels in Ei encysting for 48 h.
Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria
Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin
2011-01-01
Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374
Kodaira, Mieko; Izumi, Shizue; Takahashi, Norio; Nakamura, Nori
2004-10-01
Human minisatellites consist of tandem arrays of short repeat sequences, and some are highly polymorphic in numbers of repeats among individuals. Since these loci mutate much more frequently than coding sequences, they make attractive markers for screening populations for genetic effects of mutagenic agents. Here we report the results of our analysis of mutations at eight hypervariable minisatellite loci in the offspring (61 from exposed families in 60 of which only one parent was exposed, and 58 from unexposed parents) of atomic bomb survivors with mean doses of >1 Sv. We found 44 mutations in paternal alleles and eight mutations in maternal alleles with no indication that the high doses of acutely applied radiation had caused significant genetic effects. Our finding contrasts with those of some other studies in which much lower radiation doses, applied chronically, caused significantly increased mutation rates. Possible reasons for this discrepancy are discussed.
Sun, Yongjiang; Chan, Roy Kum Wah; Tan, Suat Hoon
2004-01-01
In this study, the irntratypic variability of a tandem repeat locus within the DNA polymerase (pol) gene of human herpes simplex virus type 2 (HSV2) was uncovered. The locus contained variable numbers of tandem dodecanucleotide (5'-GAC GAG GAC GGG-3') repetitive units. Our result showed that approximately 95% of analyzed HSV2 clinical isolates and the current GenBank HSV2 strains contained two copies of the repetitive units. From genital herpes specimens, three new HSV2 strains, which respectively contained 1, 3, and 4 copies of the repetitive units, were identified. This variable number of tandem repeat (VNTR) locus is absent in HSV1, and thus it also contributes to the intertypic variability of HSV1 and HSV2. The intratypic variability of the locus may be useful for HSV2 strain genotyping and this application is discussed.
Conservation of human chromosome 13 polymorphic microsatellite (CA){sub n} repeats in chimpanzees
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deka, R.; Shriver, M.D.; Yu, L.M.
Tandemly repeated (dC-dA){sub n} {center_dot} (dG-dT){sub n} sequences occur abundantly and are found in most eukaryotic genomes. To investigate the level of conservation of these repeat sequences in nonhuman primates, the authors have analyzed seven human chromosome 13 dinucleotide (CA){sub n} repeat loci in chimpanzees by DNA amplification using primers designed for analysis of human loci. Comparable levels of polymorphism at these loci in the two species, revealed by the number of alleles, heterozygosity, and allele sizes, suggest that the (CA){sub n} repeat arrays and their genomic locations are highly conserved. Even though the proportion of shared alleles between themore » two species varies enormously and the modal alleles are not the same, allelic lengths at each locus in the chimpanzees are detected within the bounds of the allele size range observed in humans. A similar observation has been noted in a limited number of gorillas and orangutans. Using a new measure of genetic distance that takes into account the size of alleles, they have compared the genetic distance between humans and chimpanzees. The genetic distance between these two species was found to be ninefold smaller than expected assuming there is no selection or mutational bias toward retention of (CA){sub n} repeat arrays. These findings suggest a functional significance for these microsatellite loci. 34 refs., 1 fig., 2 tabs.« less
2017-01-01
Abstract Target search as performed by DNA-binding proteins is a complex process, in which multiple factors contribute to both thermodynamic discrimination of the target sequence from overwhelmingly abundant off-target sites and kinetic acceleration of dynamic sequence interrogation. TRF1, the protein that binds to telomeric tandem repeats, faces an intriguing variant of the search problem where target sites are clustered within short fragments of chromosomal DNA. In this study, we use extensive (>0.5 ms in total) MD simulations to study the dynamical aspects of sequence-specific binding of TRF1 at both telomeric and non-cognate DNA. For the first time, we describe the spontaneous formation of a sequence-specific native protein–DNA complex in atomistic detail, and study the mechanism by which proteins avoid off-target binding while retaining high affinity for target sites. Our calculated free energy landscapes reproduce the thermodynamics of sequence-specific binding, while statistical approaches allow for a comprehensive description of intermediate stages of complex formation. PMID:28633355
Jiang, Haojun; Xie, Yifan; Li, Xuchao; Ge, Huijuan; Deng, Yongqiang; Mu, Haofang; Feng, Xiaoli; Yin, Lu; Du, Zhou; Chen, Fang; He, Nongyue
2016-01-01
Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels in order to verify the performance in clinical cases. Combining targeted deep sequencing of selective SNP and informative bioinformatics pipeline, we calculated the combined paternity index (CPI) of 17 cases to determine paternity. Sequencing-based NIPAT results fully agreed with invasive prenatal paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future.
Variable-number tandem repeats as molecular markers for biotypes of Pasteuria ramosa in Daphnia spp.
Mouton, Laurence; Nong, Guang; Preston, James F; Ebert, Dieter
2007-06-01
Variable-number tandem repeats (VNTRs) have been identified in populations of Pasteuria ramosa, a castrating endobacterium of Daphnia species. The allelic polymorphisms at 14 loci in laboratory and geographically diverse soil samples showed that VNTRs may serve as biomarkers for the genetic characterization of P. ramosa isolates.
Krsticevic, Flavia J.; Schrago, Carlos G.; Carvalho, A. Bernardo
2015-01-01
The autosomal gene Mst77F of Drosophila melanogaster is essential for male fertility. In 2010, Krsticevic et al. (Genetics 184: 295−307) found 18 Y-linked copies of Mst77F (“Mst77Y”), which collectively account for 20% of the functional Mst77F-like mRNA. The Mst77Y genes were severely misassembled in the then-available genome assembly and were identified by cloning and sequencing polymerase chain reaction products. The genomic structure of the Mst77Y region and the possible existence of additional copies remained unknown. The recent publication of two long-read assemblies of D. melanogaster prompted us to reinvestigate this challenging region of the Y chromosome. We found that the Illumina Synthetic Long Reads assembly failed in the Mst77Y region, most likely because of its tandem duplication structure. The PacBio MHAP assembly of the Mst77Y region seems to be very accurate, as revealed by comparisons with the previously found Mst77Y genes, a bacterial artificial chromosome sequence, and Illumina reads of the same strain. We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length. Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction−induced artifacts. There are several identical copies of some Mst77Y genes, coincidentally bringing the total copy number to 18. Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes. PMID:25858959
Chen, Xiaochen; Li, Qiushi; Li, Ying; Qian, Jun; Han, Jianping
2015-01-01
The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum.
Epitope mapping of the variable repetitive region with the MB antigen of Ureaplasma urealyticum.
Zheng, X; Lau, K; Frazier, M; Cassell, G H; Watson, H L
1996-01-01
One of the major surface structures of Ureaplasma urealyticum recognized by antibodies of patients during infection is the MB antigen. Previously, we showed by Western blot (immunoblot) analysis that any one of the anti-MB monoclonal antibodies (MAbs) 3B1.5, 5B1.1, and 10C6.6 could block the binding of patient antibodies to MB. Subsequent DNA sequencing revealed that a unique six-amino-acid direct tandem repeat region composed the carboxy two-thirds of this antigen. In the present study, using antibody-reactive peptide scanning of this repeat region, we demonstrated that the amino acids defining the epitopes for MAbs 3B1.5 5B1.1 and 10C6.6 are EQP, GK, and KEQPA, respectively. Peptide scanning analysis of an infected patient's serum antibody response showed that the dominant epitope was defined by the sequence PAGK. Mapping of these continuous epitopes revealed overlap between all MAb and patient polyclonal antibody binding sites, thus explaining the ability of a single MAb to apparently block all polyclonal antibody binding sites. We also show that a single amino acid difference in the sequence of the repeats of serovars 3 and 14 accounts for the lack of reactivity with serovar 14 of two of the serovar 3-specific MAbs. Finally, the data demonstrate the need to obtain the sequences of the mba genes of all serovars before an effective serovar-specific antibody detection method can be developed. PMID:8914774
Chen, Xiaochen; Li, Qiushi; Li, Ying; Qian, Jun; Han, Jianping
2015-01-01
The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum. PMID:25705213
Marzo, Mar; Liu, Danxu; Ruiz, Alfredo; Chalmers, Ronald
2013-08-01
Galileo is a DNA transposon responsible for the generation of several chromosomal inversions in Drosophila. In contrast to other members of the P-element superfamily, it has unusually long terminal inverted-repeats (TIRs) that resemble those of Foldback elements. To investigate the function of the long TIRs we derived consensus and ancestral sequences for the Galileo transposase in three species of Drosophilids. Following gene synthesis, we expressed and purified their constituent THAP domains and tested their binding activity towards the respective Galileo TIRs. DNase I footprinting located the most proximal DNA binding site about 70 bp from the transposon end. Using this sequence we identified further binding sites in the tandem repeats that are found within the long TIRs. This suggests that the synaptic complex between Galileo ends may be a complicated structure containing higher-order multimers of the transposase. We also attempted to reconstitute Galileo transposition in Drosophila embryos but no events were detected. Thus, although the limited numbers of Galileo copies in each genome were sufficient to provide functional consensus sequences for the THAP domains, they do not specify a fully active transposase. Since the THAP recognition sequence is short, and will occur many times in a large genome, it seems likely that the multiple binding sites within the long, internally repetitive, TIRs of Galileo and other Foldback-like elements may provide the transposase with its binding specificity. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Killgore, George; Thompson, Angela; Johnson, Stuart; Brazier, Jon; Kuijper, Ed; Pepin, Jacques; Frost, Eric H; Savelkoul, Paul; Nicholson, Brad; van den Berg, Renate J; Kato, Haru; Sambol, Susan P; Zukowski, Walter; Woods, Christopher; Limbago, Brandi; Gerding, Dale N; McDonald, L Clifford
2008-02-01
Using 42 isolates contributed by laboratories in Canada, The Netherlands, the United Kingdom, and the United States, we compared the results of analyses done with seven Clostridium difficile typing techniques: multilocus variable-number tandem-repeat analysis (MLVA), amplified fragment length polymorphism (AFLP), surface layer protein A gene sequence typing (slpAST), PCR-ribotyping, restriction endonuclease analysis (REA), multilocus sequence typing (MLST), and pulsed-field gel electrophoresis (PFGE). We assessed the discriminating ability and typeability of each technique as well as the agreement among techniques in grouping isolates by allele profile A (AP-A) through AP-F, which are defined by toxinotype, the presence of the binary toxin gene, and deletion in the tcdC gene. We found that all isolates were typeable by all techniques and that discrimination index scores for the techniques tested ranged from 0.964 to 0.631 in the following order: MLVA, REA, PFGE, slpAST, PCR-ribotyping, MLST, and AFLP. All the techniques were able to distinguish the current epidemic strain of C. difficile (BI/027/NAP1) from other strains. All of the techniques showed multiple types for AP-A (toxinotype 0, binary toxin negative, and no tcdC gene deletion). REA, slpAST, MLST, and PCR-ribotyping all included AP-B (toxinotype III, binary toxin positive, and an 18-bp deletion in tcdC) in a single group that excluded other APs. PFGE, AFLP, and MLVA grouped two, one, and two different non-AP-B isolates, respectively, with their AP-B isolates. All techniques appear to be capable of detecting outbreak strains, but only REA and MLVA showed sufficient discrimination to distinguish strains from different outbreaks.
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.
2008-01-01
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Clapp, Jannine ; Mitchell, Laura M. ; Bolland, Daniel J. ; Fantes, Judy ; Corcoran, Anne E. ; Scotting, Paul J. ; Armour, John A. L. ; Hewitt, Jane E.
2007-01-01
Facioscapulohumeral muscular dystrophy (FSHD) is caused by deletions within the polymorphic DNA tandem array D4Z4. Each D4Z4 repeat unit has an open reading frame (ORF), termed “DUX4,” containing two homeobox sequences. Because there has been no evidence of a transcript from the array, these deletions are thought to cause FSHD by a position effect on other genes. Here, we identify D4Z4 homologues in the genomes of rodents, Afrotheria (superorder of elephants and related species), and other species and show that the DUX4 ORF is conserved. Phylogenetic analysis suggests that primate and Afrotherian D4Z4 arrays are orthologous and originated from a retrotransposed copy of an intron-containing DUX gene, DUXC. Reverse-transcriptase polymerase chain reaction and RNA fluorescence and tissue in situ hybridization data indicate transcription of the mouse array. Together with the conservation of the DUX4 ORF for >100 million years, this strongly supports a coding function for D4Z4 and necessitates re-examination of current models of the FSHD disease mechanism. PMID:17668377
Lindstedt, Bjørn-Arne; Tham, Wilhelm; Danielsson-Tham, Marie-Louise; Vardund, Traute; Helmersson, Seved; Kapperud, Georg
2008-02-01
The multiple-locus variable-number tandem-repeats analysis (MLVA) method for genotyping has proven to be a fast and reliable typing tool in several bacterial species. MLVA is in our laboratory the routine typing method for Salmonella enterica subsp. enterica serovar Typhimurium and Escherichia coli O157. The gram-positive bacteria Listeria monocytogenes, while not isolated as frequent as S. Typhimurium and E. coli, causes severe illness with an overall mortality rate of 30%. Thus, it is important that any outbreak of this pathogen is detected early and a fast trace to the source can be performed. In view of this, we have used the information provided by two fully sequenced L. monocytogenes strains to develop a MLVA assay coupled with high-resolution capillary electrophoresis and compared it to pulsed-field gel electrophoresis (PFGE) in two sets of isolates, one Norwegian (79 isolates) and one Swedish (61 isolates) set. The MLVA assay could resolve all of the L. monocytogenes serotypes tested, and was slightly more discriminatory than PFGE for the Norwegian isolates (28 MLVA profiles and 24 PFGE profiles) and opposite for the Swedish isolates (42 MLVA profiles and 43 PFGE profiles).
The solution structure of the pentatricopeptide repeat protein PPR10 upon binding atpH RNA
Gully, Benjamin S.; Cowieson, Nathan; Stanley, Will A.; Shearston, Kate; Small, Ian D.; Barkan, Alice; Bond, Charles S.
2015-01-01
The pentatricopeptide repeat (PPR) protein family is a large family of RNA-binding proteins that is characterized by tandem arrays of a degenerate 35-amino-acid motif which form an α-solenoid structure. PPR proteins influence the editing, splicing, translation and stability of specific RNAs in mitochondria and chloroplasts. Zea mays PPR10 is amongst the best studied PPR proteins, where sequence-specific binding to two RNA transcripts, atpH and psaJ, has been demonstrated to follow a recognition code where the identity of two amino acids per repeat determines the base-specificity. A recently solved ZmPPR10:psaJ complex crystal structure suggested a homodimeric complex with considerably fewer sequence-specific protein–RNA contacts than inferred previously. Here we describe the solution structure of the ZmPPR10:atpH complex using size-exclusion chromatography-coupled synchrotron small-angle X-ray scattering (SEC-SY-SAXS). Our results support prior evidence that PPR10 binds RNA as a monomer, and that it does so in a manner that is commensurate with a canonical and predictable RNA-binding mode across much of the RNA–protein interface. PMID:25609698
Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H
2018-04-12
Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Beauparlant, Marc A; Drouin, Guy
2014-02-01
Analyses of the 5S rRNA genes found in the spliced-leader (SL) gene repeat units of numerous trypanosome species suggest that such linkages were not inherited from a common ancestor, but were the result of independent 5S rRNA gene insertions. In trypanosomes, 5S rRNA genes are found either in the tandemly repeated units coding for SL genes or in independent tandemly repeated units. Given that trypanosome species where 5S rRNA genes are within the tandemly repeated units coding for SL genes are phylogenetically related, one might hypothesize that this arrangement is the result of an ancestral insertion of 5S rRNA genes into the tandemly repeated SL gene family of trypanosomes. Here, we use the types of 5S rRNA genes found associated with SL genes, the flanking regions of the inserted 5S rRNA genes and the position of these insertions to show that most of the 5S rRNA genes found within SL gene repeat units of trypanosome species were not acquired from a common ancestor but are the results of independent insertions. These multiple 5S rRNA genes insertion events in trypanosomes are likely the result of frequent founder events in different hosts and/or geographical locations in species having short generation times.
Detection and Characteristics of Rifampicin-Resistant Isolates of Mycobacterium tuberculosis.
Cherednichenko, A G; Dymova, M A; Solodilova, O A; Petrenko, T I; Prozorov, A I; Filipenko, M L
2016-03-01
Genotyping and analysis the drug resistance of 59 isolates of M. tuberculosis obtained from patients living in Altai Territory were performed using a BACTEC MGIT 960 fluorometric system by means of VNTR typing (variable number tandem repeat), PCR-RFLP analysis, and sequence analysis. The occurrence frequency was highest for isolates of the Beijing family (n=30, 50.8%). Analysis of mutation spectrum in the rpoB gene associated with rifampicin resistance revealed the major mutation (codon 531 of the rpoB gene) in 93% samples, which allows us to use rapid test systems.
Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L
2006-01-01
Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
TANDEM: matching proteins with tandem mass spectra.
Craig, Robertson; Beavis, Ronald C
2004-06-12
Tandem mass spectra obtained from fragmenting peptide ions contain some peptide sequence specific information, but often there is not enough information to sequence the original peptide completely. Several proprietary software applications have been developed to attempt to match the spectra with a list of protein sequences that may contain the sequence of the peptide. The application TANDEM was written to provide the proteomics research community with a set of components that can be used to test new methods and algorithms for performing this type of sequence-to-data matching. The source code and binaries for this software are available at http://www.proteome.ca/opensource.html, for Windows, Linux and Macintosh OSX. The source code is made available under the Artistic License, from the authors.
Dias, Guilherme B.; Svartman, Marta; Delprat, Alejandra; Ruiz, Alfredo; Kuhn, Gustavo C.S.
2014-01-01
Transposable elements (TEs) and satellite DNAs (satDNAs) are abundant components of most eukaryotic genomes studied so far and their impact on evolution has been the focus of several studies. A number of studies linked TEs with satDNAs, but the nature of their evolutionary relationships remains unclear. During in silico analyses of the Drosophila virilis assembled genome, we found a novel DNA transposon we named Tetris based on its modular structure and diversity of rearranged forms. We aimed to characterize Tetris and investigate its role in generating satDNAs. Data mining and sequence analysis showed that Tetris is apparently nonautonomous, with a structure similar to foldback elements, and present in D. virilis and D. americana. Herein, we show that Tetris shares the final portions of its terminal inverted repeats (TIRs) with DAIBAM, a previously described miniature inverted transposable element implicated in the generation of chromosome inversions. Both elements are likely to be mobilized by the same autonomous TE. Tetris TIRs contain approximately 220-bp internal tandem repeats that we have named TIR-220. We also found TIR-220 repeats making up longer (kb-size) satDNA-like arrays. Using bioinformatic, phylogenetic and cytogenomic tools, we demonstrated that Tetris has contributed to shaping the genomes of D. virilis and D. americana, providing internal tandem repeats that served as building blocks for the amplification of satDNA arrays. The β-heterochromatic genomic environment seemed to have favored such amplification. Our results imply for the first time a role for foldback elements in generating satDNAs. PMID:24858539
Versatile communication strategies among tandem WW domain repeats
Dodson, Emma Joy; Fishbain-Yoskovitz, Vered; Rotem-Bamberger, Shahar
2015-01-01
Interactions mediated by short linear motifs in proteins play major roles in regulation of cellular homeostasis since their transient nature allows for easy modulation. We are still far from a full understanding and appreciation of the complex regulation patterns that can be, and are, achieved by this type of interaction. The fact that many linear-motif-binding domains occur in tandem repeats in proteins indicates that their mutual communication is used extensively to obtain complex integration of information toward regulatory decisions. This review is an attempt to overview, and classify, different ways by which two and more tandem repeats cooperate in binding to their targets, in the well-characterized family of WW domains and their corresponding polyproline ligands. PMID:25710931
Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana
2016-07-01
The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome
Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O.; Alawad, Abdullah O.; Al-Sadi, Abdullah M.; Hu, Songnian; Yu, Jun
2016-01-01
Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants. PMID:27736909
Chan, Simon R W L; Blackburn, Elizabeth H
2004-01-01
Telomeres are the protective DNA-protein complexes found at the ends of eukaryotic chromosomes. Telomeric DNA consists of tandem repeats of a simple, often G-rich, sequence specified by the action of telomerase, and complete replication of telomeric DNA requires telomerase. Telomerase is a specialized cellular ribonucleoprotein reverse transcriptase. By copying a short template sequence within its intrinsic RNA moiety, telomerase synthesizes the telomeric DNA strand running 5' to 3' towards the distal end of the chromosome, thus extending it. Fusion of a telomere, either with another telomere or with a broken DNA end, generally constitutes a catastrophic event for genomic stability. Telomerase acts to prevent such fusions. The molecular consequences of telomere failure, and the molecular contributors to telomere function, with an emphasis on telomerase, are discussed here. PMID:15065663
Preparation of Small RNAs Using Rolling Circle Transcription and Site-Specific RNA Disconnection.
Wang, Xingyu; Li, Can; Gao, Xiaomeng; Wang, Jing; Liang, Xingguo
2015-01-13
A facile and robust RNA preparation protocol was developed by combining rolling circle transcription (RCT) with RNA cleavage by RNase H. Circular DNA with a complementary sequence was used as the template for promoter-free transcription. With the aid of a 2'-O-methylated DNA, the RCT-generated tandem repeats of the desired RNA sequence were disconnected at the exact end-to-end position to harvest the desired RNA oligomers. Compared with the template DNA, more than 4 × 10(3) times the amount of small RNA products were obtained when modest cleavage was carried out during transcription. Large amounts of RNA oligomers could easily be obtained by simply increasing the reaction volume.
SINE sequences detect DNA fingerprints in salmonid fishes.
Spruell, P; Thorgaard, G H
1996-04-01
DNA probes homologous to two previously described salmonid short interspersed nuclear elements (SINEs) detected DNA fingerprint patterns in 14 species of salmonid fishes. The probes showed more homology to some species than to others and little homology to three nonsalmonid fishes. The DNA fingerprint patterns derived from the SINE probes are individual-specific and inherited in a Mendelian manner. Probes derived from different regions of the same SINE detect only partially overlapping banding patterns, reflecting a more complex SINE structure than has been previously reported. Like the human Alu sequence, the SINEs found in salmonids could provide useful genetic markers and primer sites for PCR-based techniques. These elements may be more desirable for some applications than traditional DNA fingerprinting probes that detect tandemly repeated arrays.
Klippel, Stefan; Wieczorek, Marek; Schümann, Michael; Krause, Eberhard; Marg, Berenice; Seidel, Thorsten; Meyer, Tim; Knapp, Ernst-Walter; Freund, Christian
2011-11-04
The high abundance of repetitive but nonidentical proline-rich sequences in spliceosomal proteins raises the question of how these known interaction motifs recruit their interacting protein domains. Whereas complex formation of these adaptors with individual motifs has been studied in great detail, little is known about the binding mode of domains arranged in tandem repeats and long proline-rich sequences including multiple motifs. Here we studied the interaction of the two adjacent WW domains of spliceosomal protein FBP21 with several ligands of different lengths and composition to elucidate the hallmarks of multivalent binding for this class of recognition domains. First, we show that many of the proteins that define the cellular proteome interacting with FBP21-WW1-WW2 contain multiple proline-rich motifs. Among these is the newly identified binding partner SF3B4. Fluorescence resonance energy transfer (FRET) analysis reveals the tandem-WW domains of FBP21 to interact with splicing factor 3B4 (SF3B4) in nuclear speckles where splicing takes place. Isothermal titration calorimetry and NMR shows that the tandem arrangement of WW domains and the multivalency of the proline-rich ligands both contribute to affinity enhancement. However, ligand exchange remains fast compared with the NMR time scale. Surprisingly, a N-terminal spin label attached to a bivalent ligand induces NMR line broadening of signals corresponding to both WW domains of the FBP21-WW1-WW2 protein. This suggests that distinct orientations of the ligand contribute to a delocalized and semispecific binding mode that should facilitate search processes within the spliceosome.
Klippel, Stefan; Wieczorek, Marek; Schümann, Michael; Krause, Eberhard; Marg, Berenice; Seidel, Thorsten; Meyer, Tim; Knapp, Ernst-Walter; Freund, Christian
2011-01-01
The high abundance of repetitive but nonidentical proline-rich sequences in spliceosomal proteins raises the question of how these known interaction motifs recruit their interacting protein domains. Whereas complex formation of these adaptors with individual motifs has been studied in great detail, little is known about the binding mode of domains arranged in tandem repeats and long proline-rich sequences including multiple motifs. Here we studied the interaction of the two adjacent WW domains of spliceosomal protein FBP21 with several ligands of different lengths and composition to elucidate the hallmarks of multivalent binding for this class of recognition domains. First, we show that many of the proteins that define the cellular proteome interacting with FBP21-WW1-WW2 contain multiple proline-rich motifs. Among these is the newly identified binding partner SF3B4. Fluorescence resonance energy transfer (FRET) analysis reveals the tandem-WW domains of FBP21 to interact with splicing factor 3B4 (SF3B4) in nuclear speckles where splicing takes place. Isothermal titration calorimetry and NMR shows that the tandem arrangement of WW domains and the multivalency of the proline-rich ligands both contribute to affinity enhancement. However, ligand exchange remains fast compared with the NMR time scale. Surprisingly, a N-terminal spin label attached to a bivalent ligand induces NMR line broadening of signals corresponding to both WW domains of the FBP21-WW1-WW2 protein. This suggests that distinct orientations of the ligand contribute to a delocalized and semispecific binding mode that should facilitate search processes within the spliceosome. PMID:21917930
Krüger, Jacqueline; Schleinitz, Dorit
2017-01-01
Microsatellites are polymorphic DNA loci comprising repeated sequence motifs of two to five base pairs which are dispersed throughout the genome. Genotyping of microsatellites is a widely accepted tool for diagnostic and research purposes such as forensic investigations and parentage testing, but also in clinics (e.g. monitoring of bone marrow transplantation), as well as for the agriculture and food industries. The co-amplification of several short tandem repeat (STR) systems in a multiplex reaction with simultaneous detection helps to obtain more information from a DNA sample where its availability may be limited. Here, we introduce and describe this commonly used genotyping technique, providing an overview on available resources on STRs, multiplex design, and analysis.
Simple Sequence Repeats in Escherichia coli: Abundance, Distribution, Composition, and Polymorphism
Gur-Arie, Riva; Cohen, Cyril J.; Eitan, Yuval; Shelef, Leora; Hallerman, Eric M.; Kashi, Yechezkel
2000-01-01
Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.[The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AF209020–209030 and AF209508–209518.] PMID:10645951
Just, Rebecca S; Irwin, Jodi A
2018-05-01
Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Kuhn, G C S; Teo, C H; Schwarzacher, T; Heslop-Harrison, J S
2009-05-01
Satellite DNA (satDNA) is a major component of genomes but relatively little is known about the fine-scale organization of unrelated satDNAs residing at the same chromosome location, and the sequence structure and dynamics of satDNA junctions. We studied the organization and sequence junctions of two nonhomologous satDNAs, pBuM and DBC-150, in three species from the neotropical Drosophila buzzatii cluster (repleta group). In situ hybridization to microchromosomes, interphase nuclei and extended DNA fibers showed frequent interspersion of the two satellites in D. gouveai, D. antonietae and, to a lesser extent, D. seriema. We isolated by PCR six pBuM x DBC-150 junctions: four are exclusive to D. gouveai and two are exclusive to D. antonietae. The six junction breakpoints occur at different positions within monomers, suggesting independent origin. Four junctions showed abrupt transitions between the two satellites, whereas two junctions showed a distinct 10 bp tandem duplication before the junction. Unlike pBuM, DBC-150 junction repeats are more variable than randomly cloned monomers and showed diagnostic features in common to a 3-monomer higher-order repeat seen in the sister species D. serido. The high levels of interspersion between pBuM and DBC-150 repeats suggest extensive rearrangements between the two satellites, maybe favored by specific features of the microchromosomes. Our interpretation is that the junctions evolved by multiples events of illegitimate recombination between nonhomologous satDNA repeats, with subsequent rounds of unequal crossing-over expanding the copy number of some of the junctions.
Coherent Somatic Mutation in Autoimmune Disease
Ross, Kenneth Andrew
2014-01-01
Background Many aspects of autoimmune disease are not well understood, including the specificities of autoimmune targets, and patterns of co-morbidity and cross-heritability across diseases. Prior work has provided evidence that somatic mutation caused by gene conversion and deletion at segmentally duplicated loci is relevant to several diseases. Simple tandem repeat (STR) sequence is highly mutable, both somatically and in the germ-line, and somatic STR mutations are observed under inflammation. Results Protein-coding genes spanning STRs having markers of mutability, including germ-line variability, high total length, repeat count and/or repeat similarity, are evaluated in the context of autoimmunity. For the initiation of autoimmune disease, antigens whose autoantibodies are the first observed in a disease, termed primary autoantigens, are informative. Three primary autoantigens, thyroid peroxidase (TPO), phogrin (PTPRN2) and filaggrin (FLG), include STRs that are among the eleven longest STRs spanned by protein-coding genes. This association of primary autoantigens with long STR sequence is highly significant (). Long STRs occur within twenty genes that are associated with sixteen common autoimmune diseases and atherosclerosis. The repeat within the TTC34 gene is an outlier in terms of length and a link with systemic lupus erythematosus is proposed. Conclusions The results support the hypothesis that many autoimmune diseases are triggered by immune responses to proteins whose DNA sequence mutates somatically in a coherent, consistent fashion. Other autoimmune diseases may be caused by coherent somatic mutations in immune cells. The coherent somatic mutation hypothesis has the potential to be a comprehensive explanation for the initiation of many autoimmune diseases. PMID:24988487
ERIC Educational Resources Information Center
McNamara-Schroeder, Kathleen; Olonan, Cheryl; Chu, Simon; Montoya, Maria C.; Alviri, Mahta; Ginty, Shannon; Love, John J.
2006-01-01
We have devised and implemented a DNA fingerprinting module for an upper division undergraduate laboratory based on the amplification and analysis of three of the 13 short tandem repeat loci that are required by the Federal Bureau of Investigation Combined DNA Index System (FBI CODIS) data base. Students first collect human epithelial (cheek)…
Goedbloed, Miriam; Vermeulen, Mark; Fang, Rixun N; Lembring, Maria; Wollstein, Andreas; Ballantyne, Kaye; Lao, Oscar; Brauer, Silke; Krüger, Carmen; Roewer, Lutz; Lessig, Rüdiger; Ploski, Rafal; Dobosz, Tadeusz; Henke, Lotte; Henke, Jürgen; Furtado, Manohar R; Kayser, Manfred
2009-11-01
The Y-chromosomal short tandem repeat (Y-STR) polymorphisms included in the AmpFlSTR Yfiler polymerase chain reaction amplification kit have become widely used for forensic and evolutionary applications where a reliable knowledge on mutation properties is necessary for correct data interpretation. Therefore, we investigated the 17 Yfiler Y-STRs in 1,730-1,764 DNA-confirmed father-son pairs per locus and found 84 sequence-confirmed mutations among the 29,792 meiotic transfers covered. Of the 84 mutations, 83 (98.8%) were single-repeat changes and one (1.2%) was a double-repeat change (ratio, 1:0.01), as well as 43 (51.2%) were repeat gains and 41 (48.8%) repeat losses (ratio, 1:0.95). Medians from Bayesian estimation of locus-specific mutation rates ranged from 0.0003 for DYS448 to 0.0074 for DYS458, with a median rate across all 17 Y-STRs of 0.0025. The mean age (at the time of son's birth) of fathers with mutations was with 34.40 (+/-11.63) years higher than that of fathers without ones at 30.32 (+/-10.22) years, a difference that is highly statistically significant (p < 0.001). A Poisson-based modeling revealed that the Y-STR mutation rate increased with increasing father's age on a statistically significant level (alpha = 0.0294, 2.5% quantile = 0.0001). From combining our data with those previously published, considering all together 135,212 meiotic events and 331 mutations, we conclude for the Yfiler Y-STRs that (1) none had a mutation rate of >1%, 12 had mutation rates of >0.1% and four of <0.1%, (2) single-repeat changes were strongly favored over multiple-repeat ones for all loci but 1 and (3) considerable variation existed among loci in the ratio of repeat gains versus losses. Our finding of three Y-STR mutations in one father-son pair (and two pairs with two mutations each) has consequences for determining the threshold of allelic differences to conclude exclusion constellations in future applications of Y-STRs in paternity testing and pedigree analyses.
Tsunematsu, Noriko; Goto, Mieko; Saiki, Yumiko; Baba, Michiko; Udagawa, Tadashi; Kazumi, Yuko
2008-09-01
The bacilli which were isolated from a patient suspected of the mixed infections with Mycobacterium avium and Mycobacterium intracellulare, were analyzed. The genotypes of M. avium in the sedimented fractions of treated sputum and in some colonies isolated from Ogawa medium were compared by the Variable Numbers of Tandem Repeats (VNTR). A woman, aged 57. Mycobacterial species isolated from some colonies by culture in 2004 and 2006 and from the treated sputum in 2006, were determined by DNA sequencing analysis of the 16S rRNA gene. Also, by using VNTR, the genotype of mycobacteria was analyzed. [Results] (1) The colony isolated from Ogawa medium in 2004 was monoclonal M. avium. (2) By VNTR analyses of specimens in 2006, multiple acid-fast bacteria were found in the sputum sediment and in isolated bacteria from Ogawa medium. (3) By analyses of 16S rRNA DNA sequence, M. avium and M. intracellulare were found in the colonies isolated from the sputum sediment and the Ogawa medium in 2006. (4) The same VNTR patterns were obtained in M. avium in 2004 and 2006 when single colony was analyzed. (5) From the showerhead and culvert of the bathroom in the patient's house, M. avium was not detected. By VNTR analyses, it was considered that the mixed infections of M. avium and M. intracellulare had been generated during treatment in this case. Therefore, in the case of suspected complex infection, VNTR analysis would be a useful genotyping method in M. avium complex infection.
Karami, Nahid; Helldal, Lisa; Welinder-Olsson, Christina; Ahrén, Christina; Moore, Edward R B
2013-01-01
Extended-spectrum β-lactamase producing Escherichia coli (ESBL-E. coli) were isolated from infants hospitalized in a neonatal, post-surgery ward during a four-month-long nosocomial outbreak and six-month follow-up period. A multi-locus variable number tandem repeat analysis (MLVA), using 10 loci (GECM-10), for 'generic' (i.e., non-STEC) E. coli was applied for sub-species-level (i.e., sub-typing) delineation and characterization of the bacterial isolates. Ten distinct GECM-10 types were detected among 50 isolates, correlating with the types defined by pulsed-field gel electrophoresis (PFGE), which is recognized to be the 'gold-standard' method for clinical epidemiological analyses. Multi-locus sequence typing (MLST), multiplex PCR genotyping of bla CTX-M, bla TEM, bla OXA and bla SHV genes and antibiotic resistance profiling, as well as a PCR assay specific for detecting isolates of the pandemic O25b-ST131 strain, further characterized the outbreak isolates. Two clusters of isolates with distinct GECM-10 types (G06-04 and G07-02), corresponding to two major PFGE types and the MLST-based sequence types (STs) 131 and 1444, respectively, were confirmed to be responsible for the outbreak. The application of GECM-10 sub-typing provided reliable, rapid and cost-effective epidemiological characterizations of the ESBL-producing isolates from a nosocomial outbreak that correlated with and may be used to replace the laborious PFGE protocol for analyzing generic E. coli.
Ravi, Maruthachalam; Kwong, Pak N; Menorca, Ron M G; Valencia, Joel T; Ramahi, Joseph S; Stewart, Jodi L; Tran, Robert K; Sundaresan, Venkatesan; Comai, Luca; Chan, Simon W-L
2010-10-01
Centromeres control chromosome inheritance in eukaryotes, yet their DNA structure and primary sequence are hypervariable. Most animals and plants have megabases of tandem repeats at their centromeres, unlike yeast with unique centromere sequences. Centromere function requires the centromere-specific histone CENH3 (CENP-A in human), which replaces histone H3 in centromeric nucleosomes. CENH3 evolves rapidly, particularly in its N-terminal tail domain. A portion of the CENH3 histone-fold domain, the CENP-A targeting domain (CATD), has been previously shown to confer kinetochore localization and centromere function when swapped into human H3. Furthermore, CENP-A in human cells can be functionally replaced by CENH3 from distantly related organisms including Saccharomyces cerevisiae. We have used cenh3-1 (a null mutant in Arabidopsis thaliana) to replace endogenous CENH3 with GFP-tagged variants. A H3.3 tail domain-CENH3 histone-fold domain chimera rescued viability of cenh3-1, but CENH3's lacking a tail domain were nonfunctional. In contrast to human results, H3 containing the A. thaliana CATD cannot complement cenh3-1. GFP-CENH3 from the sister species A. arenosa functionally replaces A. thaliana CENH3. GFP-CENH3 from the close relative Brassica rapa was targeted to centromeres, but did not complement cenh3-1, indicating that kinetochore localization and centromere function can be uncoupled. We conclude that CENH3 function in A. thaliana, an organism with large tandem repeat centromeres, has stringent requirements for functional complementation in mitosis.
FA-SAT Is an Old Satellite DNA Frozen in Several Bilateria Genomes
Chaves, Raquel; Ferreira, Daniela; Mendes-da-Silva, Ana; Meles, Susana; Adega, Filomena
2017-01-01
Abstract In recent years, a growing body of evidence has recognized the tandem repeat sequences, and specifically satellite DNA, as a functional class of sequences in the genomic “dark matter.” Using an original, complementary, and thus an eclectic experimental design, we show that the cat archetypal satellite DNA sequence, FA-SAT, is “frozen” conservatively in several Bilateria genomes. We found different genomic FA-SAT architectures, and the interspersion pattern was conserved. In Carnivora genomes, the FA-SAT-related sequences are also amplified, with the predominance of a specific FA-SAT variant, at the heterochromatic regions. We inspected the cat genome project to locate FA-SAT array flanking regions and revealed an intensive intermingling with transposable elements. Our results also show that FA-SAT-related sequences are transcribed and that the most abundant FA-SAT variant is not always the most transcribed. We thus conclude that the DNA sequences of FA-SAT and their transcripts are “frozen” in these genomes. Future work is needed to disclose any putative function that these sequences may play in these genomes. PMID:29608678
Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy
NASA Astrophysics Data System (ADS)
Chen, Ellson Y.
1997-05-01
So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.
Kim, Min Jee; Hong, Eui Jeong; Kim, Iksoo
2016-01-01
We sequenced the complete mitochondrial (mt) genome of Camponotus atrox (Hymenoptera: Formicidae), which is only distributed in Korea. The genome was 16 540 bp in size and contained typical sets of genes (13 protein-coding genes, 22 tRNAs, and 2 rRNAs). The C. atrox A+T-rich region, at 1402 bp, was the longest of all sequenced ant genomes and was composed of an identical tandem repeat consisting of six 100-bp copies and one 96-bp copy. A total of 315 bp of intergenic spacer sequence was spread over 23 regions. An alignment of the spacer sequences in ants was largely feasible among congeneric species, and there was substantial sequence divergence, indicating their potential use as molecular markers for congeneric species. The A/T contents at the first and second codon positions of protein-coding genes (PCGs) were similar for ant species, including C. atrox (73.9% vs. 72.3%, on average). With increased taxon sampling among hymenopteran superfamilies, differences in the divergence rates (i.e., the non-synonymous substitution rates) between the suborders Symphyta and Apocrita were detected, consistent with previous results. The C. atrox mt genome had a unique gene arrangement, trnI-trnM-trnQ, at the A+T-rich region and ND2 junction (underline indicates inverted gene). This may have originated from a tandem duplication of trnM-trnI, resulting in trnM-trnI-trnM-trnI-trnQ, and the subsequent loss of the first trnM and second trnI, resulting in trnI-trnM-trnQ.
Cloning and Molecular Characterization of an Immunogenic LigA Protein of Leptospira interrogans
Palaniappan, Raghavan U. M.; Chang, Yung-Fu; Jusuf, S. S. D.; Artiushin, S.; Timoney, John F.; McDonough, Sean P.; Barr, Steve C.; Divers, Thomas J.; Simpson, Kenneth W.; McDonough, Patrick L.; Mohammed, Hussni O.
2002-01-01
A clone expressing a novel immunoreactive leptospiral immunoglobulin-like protein A of 130 kDa (LigA) from Leptospira interrogans serovar pomona type kennewicki was isolated by screening a genomic DNA library with serum from a mare that had recently aborted due to leptospiral infection. LigA is encoded by an open reading frame of 3,675 bp, and the deduced amino acid sequence consists of a series of 90-amino-acid tandem repeats. A search of the NCBI database found that homology of the LigA repeat region was limited to an immunoglobulin-like domain of the bacterial intimin binding protein of Escherichia coli, the cell adhesion domain of Clostridium acetobutylicum, and the invasin of Yersinia pestis. Secondary structure prediction analysis indicates that LigA consists mostly of beta sheets with a few alpha-helical regions. No LigA was detectable by immunoblot analysis of lysates of the leptospires grown in vitro at 30°C or when cultures were shifted to 37°C. Strikingly, immunohistochemistry on kidney from leptospira-infected hamsters demonstrated LigA expression. These findings suggest that LigA is specifically induced only in vivo. Sera from horses, which aborted as a result of natural Leptospira infection, strongly recognize LigA. LigA is the first leptospiral protein described to have 12 tandem repeats and is also the first to be expressed only during infection. Thus, LigA may have value in serodiagnosis or as a protective immunogen in novel vaccines. PMID:12379666
Keys, C; Kemper, S; Keim, P
2005-01-01
Evaluation of the Escherichia coli genome for variable number tandem repeat (VNTR) loci in order to provide a subtyping tool with greater discrimination and more efficient capacity. Twenty-nine putative VNTR loci were identified from the E. coli genomic sequence. Their variability was validated by characterizing the number of repeats at each locus in a set of 56 E. coli O157:H7/HN and O55:H7 isolates. An optimized multiplex assay system was developed to facility high capacity analysis. Locus diversity values ranged from 0.23 to 0.95 while the number of alleles ranged from two to 29. This multiple-locus VNTR analysis (MLVA) data was used to describe genetic relationships among these isolates and was compared with PFGE (pulse field gel electrophoresis) data from a subset of the same strains. Genetic similarity values were highly correlated between the two approaches, through MLVA was capable of discrimination amongst closely related isolates when PFGE similar values were equal to 1.0. Highly variable VNTR loci exist in the E. coli O157:H7 genome and are excellent estimators of genetic relationships, in particular for closely related isolates. Escherichia coli O157:H7 MLVA offers a complimentary analysis to the more traditional PFGE approach. Application of MLVA to an outbreak cluster could generate superior molecular epidemiology and result in a more effective public health response.
Zhang, Qing-Xia; Yang, Meng; Pan, Ya-Jiao; Zhao, Jing; Qu, Bao-Wang; Cheng, Feng; Yang, Ya-Ran; Jiao, Zhang-Ping; Liu, Li; Yan, Jiang-Wei
2018-05-17
Massively parallel sequencing (MPS) has been used in forensic genetics in recent years owing to several advantages, e.g. MPS can provide precise descriptions of the repeat allele structure and variation in the repeat-flanking regions, increasing the discriminating power among loci and individuals. However, it cannot be fully utilized unless sufficient population data are available for all loci. Thus, there is a pressing need to perform population studies providing a basis for the introduction of MPS into forensic practice. Here, we constructed a multiplex PCR system with fusion primers for one-directional PCR for MPS of 15 commonly used forensic autosomal STRs and amelogenin. Samples from 554 unrelated Chinese Northern Han individuals were typed using this MPS assay. In total, 313 alleles obtained by MPS for all 15 STRs were observed, and the corresponding allele frequencies ranged between 0.0009 and 0.5162. Of all 15 loci, the number of alleles identified for 12 loci increased compared to capillary electrophoresis approaches, and for the following six loci more than double the number of alleles was found: D2S1338, D5S818, D21S11, D13S317, vWA, and D3S1358. Forensic parameters were calculated based on length and sequence-based alleles. D21S11 showed the highest heterozygosity (0.8791), discrimination power (0.9865), and paternity exclusion probability in trios (0.7529). The cumulative match probability for MPS was approximately 2.3157 × 10 -20 . © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Ye, Jeng-Jia; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of the cryptic "lineage B" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by next-generation sequencing method. The assembled mitogenome consisting of 16,694 bp, includes 13 protein coding genes, 25 transfer RNAs, 2 ribosomal RNAs genes. The overall base composition of "lineage B" S. lessoniana is 36.7% for A, 18.9 % for C, 34.5 % for T and 9.8 % for G and show 90% identities to "lineage C" S. lessoniana. It is also exhibits high T + A content (71.2%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage B" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Hsiao, Chung-Der; Shen, Kang-Ning; Ching, Tzu-Yun; Wang, Ya-Hsien; Ye, Jeng-Jia; Tsai, Shiou-Yi; Wu, Shan-Chun; Chen, Ching-Hung; Wang, Chia-Hui
2016-07-01
In this study, the complete mitogenome sequence of the cryptic "lineage A" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome consists of 16,605 bp, which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of "lineage A" S. lessoniana is 37.5% for A, 17.4% for C, 9.1% for G, and 35.9% for T and shows 87% identities to "lineage C" S. lessoniana. It is also noticed by its high T + A content (73.4%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage A" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Manges, Amee R; Tellis, Patricia A; Vincent, Caroline; Lifeso, Kimberley; Geneau, Geneviève; Reid-Smith, Richard J; Boerlin, Patrick
2009-11-01
Discriminatory genotyping methods for the analysis of Escherichia coli other than O157:H7 are necessary for public health-related activities. A new multi-locus variable number tandem repeat analysis protocol is presented; this method achieves an index of discrimination of 99.5% and is reproducible and valid when tested on a collection of 836 diverse E. coli.
Voelker, T A; Staswick, P; Chrispeels, M J
1986-12-01
Phytohemagglutinin (PHA), the seed lectin of the common bean, Phaseolus vulgaris, is encoded by two highly homologous, tandemly linked genes, dlec1 and dlec2, which are coordinately expressed at high levels in developing cotyledons. Their respective transcripts translate into closely related polypeptides, PHA-E and PHA-L, constituents of the tetrameric lectin which accumulates at high levels in developing seeds. In the bean cultivar Pinto UI111, PHA-E is not detectable, and PHA-L accumulates at very reduced levels. To investigate the cause of the Pinto phenotype, we cloned and sequenced the two PHA genes of Pinto, called Pdlec1 and Pdlec2, and determined the abundance of their respective mRNAs in developing cotyledons. Both genes are more than 90% homologous to the normal PHA genes found in other cultivars. Pdlec1 carries a 1-bp frameshift mutation close to the 5' end of its coding sequence. Only very truncated polypeptides could be made from its mRNA. The gene Pdlec2 encodes a polypeptide, which resembles PHA-L and its predicted amino acid sequence agrees with the available Pinto PHA amino acid sequence data. Analysis of the mRNA of developing cotyledons revealed that the Pdlec1 message is reduced 600-fold, and Pdlec2 mRNA is reduced 20-fold with respect to mRNA levels in normal cultivars. A comparison of the sequences which are upstream from the coding sequence shows that Pdlec2 has a 100-bp deletion compared to the other genes (dlec1, dlec2 and Pdlec1). This deletion which contains a large tandem repeat may be responsible for the low level of expression of Pdlec2. The very low expression of Pdlec1 is as yet unexplained.
The profile of repeat-associated histone lysine methylation states in the mouse epigenome
Martens, Joost H A; O'Sullivan, Roderick J; Braunschweig, Ulrich; Opravil, Susanne; Radolf, Martin; Steinlein, Peter; Jenuwein, Thomas
2005-01-01
Histone lysine methylation has been shown to index silenced chromatin regions at, for example, pericentric heterochromatin or of the inactive X chromosome. Here, we examined the distribution of repressive histone lysine methylation states over the entire family of DNA repeats in the mouse genome. Using chromatin immunoprecipitation in a cluster analysis representing repetitive elements, our data demonstrate the selective enrichment of distinct H3-K9, H3-K27 and H4-K20 methylation marks across tandem repeats (e.g. major and minor satellites), DNA transposons, retrotransposons, long interspersed nucleotide elements and short interspersed nucleotide elements. Tandem repeats, but not the other repetitive elements, give rise to double-stranded (ds) RNAs that are further elevated in embryonic stem (ES) cells lacking the H3-K9-specific Suv39h histone methyltransferases. Importantly, although H3-K9 tri- and H4-K20 trimethylation appear stable at the satellite repeats, many of the other repeat-associated repressive marks vary in chromatin of differentiated ES cells or of embryonic trophoblasts and fibroblasts. Our data define a profile of repressive histone lysine methylation states for the repetitive complement of four distinct mouse epigenomes and suggest tandem repeats and dsRNA as primary triggers for more stable chromatin imprints. PMID:15678104
Kai, M; Nakata, N; Matsuoka, M; Sekizuka, T; Kuroda, M; Makino, M
2013-10-01
Genome analysis of Mycobacterium leprae strain Kyoto-2 in this study revealed characteristic nucleotide substitutions in gene ML0411, compared to the reference genome M. leprae strain TN. The ML0411 gene of Kyoto-2 had six SNPs compared to that of TN. All SNPs in ML0411 were non-synonymous mutations that result in amino acid replacements. In addition, a seventh SNP was found 41 bp upstream of the start codon in the regulatory region. The seven SNP sites in the ML0411 region were investigated by sequencing in 36 M. leprae isolates from the Leprosy Research Center in Japan. The SNP pattern in 14 of the 36 isolates showed similarity to that of Kyoto-2. Determination of the standard SNP types within the 36 stocked isolates revealed that almost all of the Japanese strains belonged to SNP type III, with nucleotide substitutions at position 14676, 164275, and 2935685 of the M. leprae TN genome. The geographical distribution pattern of east Asian M. leprae isolates by discrimination of ML0411 SNPs was investigated and interestingly turned out to be similar to that of tandem repeat numbers of GACATC in the rpoT gene (3 copies or 4 copies), which has been established as a tool for M. leprae genotyping. All seven Korean M. leprae isolates examined in this study, as well as those derived from Honshu Island of Japan, showed 4 copies of the 6-base tandem repeat plus the ML0411 SNPs observed in M. leprae Kyoto-2. They are termed Northeast Asian (NA) strain of M. leprae. On the other hand, many of isolates derived from the Okinawa Islands of Japan and from the Philippines showed 3 copies of the 6-base tandem repeat in addition to the M. leprae TN ML0411 type of SNPs. These results demonstrate the existence of M. leprae strains in Northeast Asian region having characteristic SNP patterns. Copyright © 2013 Elsevier B.V. All rights reserved.
Hyytiä-Trees, Eija; Smole, Sandra C; Fields, Patricia A; Swaminathan, Bala; Ribot, Efrain M
2006-01-01
Most bacterial genomes contain tandem duplications of short DNA sequences, termed "variable-number tandem repeats" (VNTR). A subtyping method targeting these repeats, multiple-locus VNTR analysis (MLVA), has emerged as a powerful tool for characterization of clonal organisms such as Shiga toxin-producing Escherichia coli O157 (STEC O157). We modified and optimized a recently published MLVA scheme targeting 29 polymorphic VNTR regions of STEC O157 to render it suitable for routine use by public health laboratories that participate in PulseNet, the national and international molecular subtyping network for foodborne disease surveillance. Nine VNTR loci were included in the final protocol. They were amplified in three PCR reactions, after which the PCR products were sized using capillary electrophoresis. Two hundred geographically diverse, sporadic and outbreak- related STEC O157 isolates were characterized by MLVA and the results were compared with data obtained by pulsed-field gel electrophoresis (PFGE) using XbaI macrorestriction of genomic DNA. A total of 139 unique XbaI PFGE patterns and 162 MLVA types were identified. A subset of 100 isolates characterized by both XbaI and BlnI macrorestriction had 62 unique PFGE and MLVA types. Although the clustering of isolates by the two subtyping systems was generally in agreement, some discrepancies were observed. Importantly, MLVA was able to discriminate among some epidemiologically unrelated isolates which were indistinguishable by PFGE. However, among strains from three of the eight outbreaks included in the study, two single locus MLVA variants and one double locus variant were detected among epidemiologically implicated isolates that were indistinguishable by PFGE. Conversely, in three other outbreaks, isolates that were indistinguishable by MLVA displayed multiple PFGE types. An additional more extensive multi-laboratory validation of the MLVA protocol is in progress in order to address critical issues such as establishing epidemiologically relevant interpretation guidelines for the MLVA data.
Matsuyama, T; Fukuda, Y; Sakai, T; Tanimoto, N; Nakanishi, M; Nakamura, Y; Takano, T; Nakayasu, C
2017-08-01
Bacterial haemolytic jaundice caused by Ichthyobacterium seriolicida has been responsible for mortality in farmed yellowtail, Seriola quinqueradiata, in western Japan since the 1980s. In this study, polymorphic analysis of I. seriolicida was performed using three molecular methods: amplified fragment length polymorphism (AFLP) analysis, multilocus sequence typing (MLST) and multiple-locus variable-number tandem repeat analysis (MLVA). Twenty-eight isolates were analysed using AFLP, while 31 isolates were examined by MLST and MLVA. No polymorphisms were identified by AFLP analysis using EcoRI and MseI, or by MLST of internal fragments of eight housekeeping genes. However, MLVA revealed variation in repeat numbers of three elements, allowing separation of the isolates into 16 sequence types. The unweighted pair group method using arithmetic averages cluster analysis of the MLVA data identified four major clusters, and all isolates belonged to clonal complexes. It is likely that I. seriolicida populations share a common ancestor, which may be a recently introduced strain. © 2016 John Wiley & Sons Ltd.
PUF Proteins: Cellular Functions and Potential Applications.
Kiani, Seyed Jalal; Taheri, Tahereh; Rafati, Sima; Samimi-Rad, Katayoun
2017-01-01
RNA-binding proteins play critical roles in the regulation of gene expression. Among several families of RNA-binding proteins, PUF (Pumilio and FBF) proteins have been the subject of extensive investigations, as they can bind RNA in a sequence-specific manner and they are evolutionarily conserved among a wide range of organisms. The outstanding feature of these proteins is a highly conserved RNA-binding domain, which is known as the Pumilio-homology domain (PUM-HD) that mostly consists of eight tandem repeats. Each repeat recognizes an RNA base with a simple three-letter code that can be programmed in order to change the sequence-specificity of the protein. Using this tailored architecture, researchers have been able to change the specificity of the PUM-HD and target desired transcripts in the cell, even in subcellular compartments. The potential applications of this versatile tool in molecular cell biology seem unbounded and the use of these factors in pharmaceutics might be an interesting field of study in near future. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA.
Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A
2016-10-07
Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Kolanko, C J; Pyle, M D; Nath, J; Prasanna, P G; Loats, H; Blakely, W F
2000-03-01
We report a low cost and efficient method for synthesizing a human pancentromeric DNA probe by the polymerase chain reaction (PRC) and an optimized protocol for in situ detection using color pigment immunostaining. The DNA template used in the PCR was a 2.4 kb insert containing human alphoid repeated sequences of pancentromeric DNA subcloned into pUC9 (Miller et al. 1988) and the primers hybridized to internal sequences of the 172 bp consensus tandem repeat associated with human centromeres. PCR was performed in the presence of biotin-11-dUTP, and the product was used for in situ hybridization to detect the pancentromeric region of human chromosomes in metaphase spreads. Detection of pancentromeric probe was achieved by immunoenzymatic color pigment painting to yield a permanent image detected at high resolution by bright field microscopy. The ability to synthesize the centromeric probe rapidly and to detect it with color pigment immunostaining will lead to enhanced identification and eventually to automation of various chromosome aberration assays.
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA*
Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A.
2016-01-01
Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. PMID:27563064
Kobayashi, Tetsuro; Kutsuna, Satoshi; Hayakawa, Kayoko; Kato, Yasuyuki; Ohmagari, Norio; Uryu, Hideko; Yamada, Ritsuko; Kashiwa, Naoyuki; Nei, Takahito; Ehara, Akihito; Takei, Reiko; Mori, Nobuaki; Yamada, Yasuhiro; Hayasaka, Tomomi; Kagawa, Narito; Sugawara, Momoko; Suzaki, Ai; Takahashi, Yuno; Nishiyama, Hiroyuki; Morita, Masatomo; Izumiya, Hidemasa; Ohnishi, Makoto
2016-01-01
For the first time in 16 years, a food-borne outbreak of typhoid fever due to Salmonella enterica serotype Typhi was reported in Japan. Seven patients consumed food in an Indian buffet at a restaurant in the center of Tokyo, while one was a Nepali chef in the restaurant, an asymptomatic carrier and the implicated source of this outbreak. The multiple-locus variable-number tandem repeat analysis showed 100% consistency in the genomic sequence for five of the eight cases. PMID:26621565
Hassanin, Alexandre
2016-01-01
Here I report the complete mitochondrial genome of the African palm civet, (Nandinia binotata) as sequenced from overlapping PCR products. The genome is 17,103 bp in length and contains the 37 genes found in a typical mammalian genome: 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The control region of N. binotata includes both RS2 and RS3 tandem repeats. The overall base composition on the L-strand is A: 33.6%, C: 27.3%, G: 13.0%, and T: 26.1%.
Schouls, Leo M.; van der Heide, Han G. J.; Vauterin, Luc; Vauterin, Paul; Mooi, Frits R.
2004-01-01
Bordetella pertussis, the causative agent of whooping cough, has remained endemic in The Netherlands despite extensive nationwide vaccination since 1953. In the 1990s, several epidemic periods have resulted in many cases of pertussis. We have proposed that strain variation has played a major role in the upsurges of this disease in The Netherlands. Therefore, molecular characterization of strains is important in identifying the causes of pertussis epidemiology. For this reason, we have developed a multiple-locus variable-number tandem repeat analysis (MLVA) typing system for B. pertussis. By combining the MLVA profile with the allelic profile based on multiple-antigen sequence typing, we were able to further differentiate strains. The relationships between the various genotypes were visualized by constructing a minimum spanning tree. MLVA of Dutch strains of B. pertussis revealed that the genotypes of the strains isolated in the prevaccination period were diverse and clearly distinct from the strains isolated in the 1990s. Furthermore, there was a decrease in diversity in the strains from the late 1990s, with a remarkable clonal expansion that coincided with the epidemic periods. Using this genotyping, we have been able to show that B. pertussis is much more dynamic than expected. PMID:15292152
Piniewska, Danuta; Sanak, Marek; Wojtas, Marta; Polanska, Nina
2017-05-01
Advances in forensic identification using molecular genetics are helpful in resolving some historical mysteries. The aim of this study was to confirm the authenticity of shrunken-head artifacts exhibited by two Polish museums. Shrunken heads, known as tsantsas, were headhunting trophies of South American Indians (Jivaroan). A special preparation preserved their hair and facial appearance. However, it was quite common to offer counterfeit shrunken heads of sloths or monkeys to collectors of curiosities. We sampled small skin specimens of four shrunken-head skin from the museum collection from Warsaw and Krakow, Poland. Following genomic DNA isolation, highly polymorphic short tandem repeats were genotyped using a commercial chemistry and DNA sequencing analyzer. Haplogroups of human Y chromosome were identified. We obtained an informative genetic profile of genomic short tandem repeats from all the samples of shrunken heads. Moreover, amplification of amelogenin loci allowed for sex determination. All four studied shrunken heads were of human origin. In two ones, a shared Y-chromosome haplogroup Q characteristic for Indigenous Americans was detected. Another artifact was counterfeited because Y-chromosome haplogroup I2 was found, characteristic for the Southeastern European origin. Commercial genetic methods of identification can be applied successfully in studies on the origin and authenticity of some unusual collection items.
Structure of the circumsporozoite protein gene in 18 strains of Plasmodium falciparum.
Weber, J L; Hockmeyer, W T
1985-06-01
Using the cloned circumsporozoite (CS) protein gene of a Brazilian strain of Plasmodium falciparum as probe, we have analyzed the structure of the CS protein gene from 17 other Asian, African, Central and South American parasite strains by nucleic acid hybridization. Each strain appears to have one CS protein gene which hybridizes readily to the Brazilian strain probe. The 5' and 3' thirds of the genes are invariant in size in all 18 strains whereas the central third containing the 12 base pair tandem repeats varies in size over a range of about 100 base pairs. Several differences were found in the locations of Sau3A sites in the genes. The Sau3A sites are significant because each of the minority Asn-Val-Asp-Pro repeats in the cloned gene has a Sau3A site. DNA melting of hybrids revealed a high degree of homology between the sequences of the cloned gene and genes from an Asian strain and an African strain. A 14 base oligodeoxynucleotide with a sequence from the central repeat region hybridized to all strains tested. We conclude that the CS protein gene is highly conserved among strains of P. falciparum and that malaria vaccine development with the CS protein is unlikely to be complicated by strain variation.
Lei, Wanjun; Ni, Dapeng; Wang, Yujun; Shao, Junjie; Wang, Xincun; Yang, Dan; Wang, Jinsheng; Chen, Haimei; Liu, Chang
2016-02-22
Astragalus membranaceus is an important medicinal plant in Asia. Several of its varieties have been used interchangeably as raw materials for commercial production. High resolution genetic markers are in urgent need to distinguish these varieties. Here, we sequenced and analyzed the chloroplast genome of A. membranaceus (Fisch.) Bunge var. mongholicus (Bunge) P.K. Hsiao using the next generation DNA sequencing technology. The genome was assembled using Abyss and then subjected to gene prediction using CPGAVAS and repeat analysis using MISA, Tandem Repeats Finder, and REPuter. Finally, the genome was subjected phylogenetic and comparative genomic analyses. The complete genome is 123,582 bp long, containing only one copy of the inverted repeat. Gene prediction revealed 110 genes encoding 76 proteins, 30 tRNAs, and four rRNAs. Five intra-specific hypermutation loci were identified, three of which are heteroplasmic. Furthermore, three gene losses and two large inversions were identified. Comparative genomic analyses demonstrated the dynamic nature of the Papilionoideae chloroplast genomes, which showed occurrence of numerous hypermutation loci, frequent gene losses, and fragment inversions. Results obtained herein elucidate the complex evolutionary history of chloroplast genomes and have laid the foundation for the identification of genetic markers to distinguish A. membranaceus varieties.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Warburton, P.E.; Gosden, J.; Lawson, D.
1996-04-15
Alpha satellite DNA is a tandemly repeated DNA family found at the centromeres of all primate chromosomes examined. The fundamental repeat units of alpha satellite DNA are diverged 169- to 172-bp monomers, often found to be organized in chromosome-specific higher-order repeat units. The chromosomes of human (Homo sapiens (HSA)), chimpanzee (Pan troglodytes (PTR) and Pan paniscus), and gorilla (Gorilla gorilla) share a remarkable similarity and synteny. It is of interest to ask if alpha satellite arrays at centromeres of homologous chromosomes between these species are closely related (evolving in an orthologous manner) or if the evolutionary processes that homogenize andmore » spread these arrays within and between chromosomes result in nonorthologous evolution of arrays. By using PCR primers specific for human chromosome 17-specific alpha satellite DNA, we have amplified, cloned, and characterized a chromosome-specific subset from the PTR chimpanzee genome. Hybridization both on Southern blots and in situ as well as sequence analysis show that this subset is most closely related, as expected, to sequences on HSA 17. However, in situ hybridization reveals that this subset is not found on the homologous chromosome in chimpanzee (PTR 19), but instead on PTR 12, which is homologous to HSA 2p. 40 refs., 3 figs.« less
Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan
2013-01-01
Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176
Report for the NGFA-5 project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jaing, C; Jackson, P; Thissen, J
The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, TaqMan PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. To effectively compare the sensitivity and specificity of the different genomic technologies, we used SNP TaqMan PCR, MLVA, microarray and high-throughput illumine and 454 sequencing to test various strains from B. anthracis, B. thuringiensis, BioWatch aerosol filter extracts or soil samples that were spiked with B. anthracis, and samples that were previously collected during DHS and EPAmore » environmental release exercises that were known to contain B. thuringiensis spores. The results of all the samples against the various assays are discussed in this report.« less
[Transcription activator-like effectors(TALEs)based genome engineering].
Zhao, Mei-Wei; Duan, Cheng-Li; Liu, Jiang
2013-10-01
Systematic reverse-engineering of functional genome architecture requires precise modifications of gene sequences and transcription levels. The development and application of transcription activator-like effectors(TALEs) has created a wealth of genome engineering possibilities. TALEs are a class of naturally occurring DNA-binding proteins found in the plant pathogen Xanthomonas species. The DNA-binding domain of each TALE typically consists of tandem 34-amino acid repeat modules rearranged according to a simple cipher to target new DNA sequences. Customized TALEs can be used for a wide variety of genome engineering applications, including transcriptional modulation and genome editing. Such "genome engineering" has now been established in human cells and a number of model organisms, thus opening the door to better understanding gene function in model organisms, improving traits in crop plants and treating human genetic disorders.
Dias, Guilherme B; Svartman, Marta; Delprat, Alejandra; Ruiz, Alfredo; Kuhn, Gustavo C S
2014-05-24
Transposable elements (TEs) and satellite DNAs (satDNAs) are abundant components of most eukaryotic genomes studied so far and their impact on evolution has been the focus of several studies. A number of studies linked TEs with satDNAs, but the nature of their evolutionary relationships remains unclear. During in silico analyses of the Drosophila virilis assembled genome, we found a novel DNA transposon we named Tetris based on its modular structure and diversity of rearranged forms. We aimed to characterize Tetris and investigate its role in generating satDNAs. Data mining and sequence analysis showed that Tetris is apparently nonautonomous, with a structure similar to foldback elements, and present in D. virilis and D. americana. Herein, we show that Tetris shares the final portions of its terminal inverted repeats (TIRs) with DAIBAM, a previously described miniature inverted transposable element implicated in the generation of chromosome inversions. Both elements are likely to be mobilized by the same autonomous TE. Tetris TIRs contain approximately 220-bp internal tandem repeats that we have named TIR-220. We also found TIR-220 repeats making up longer (kb-size) satDNA-like arrays. Using bioinformatic, phylogenetic and cytogenomic tools, we demonstrated that Tetris has contributed to shaping the genomes of D. virilis and D. americana, providing internal tandem repeats that served as building blocks for the amplification of satDNA arrays. The β-heterochromatic genomic environment seemed to have favored such amplification. Our results imply for the first time a role for foldback elements in generating satDNAs. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Entropic fluctuations in DNA sequences
NASA Astrophysics Data System (ADS)
Thanos, Dimitrios; Li, Wentian; Provata, Astero
2018-03-01
The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.
Sloot, Rosa; Borgdorff, Martien W.; de Beer, Jessica L.; van Ingen, Jakko; Supply, Philip
2013-01-01
The population structure of 3,776 Mycobacterium tuberculosis isolates was determined using variable-number tandem-repeat (VNTR) typing. The degree of clonality was so high that a more relaxed definition of clustering cannot be applied. Among recent immigrants with non-Euro-American isolates, transmission is overestimated if based on identical VNTR patterns. PMID:23658260
Zhu, Luchang; Olsen, Randall J; Horstmann, Nicola; Shelburne, Samuel A; Fan, Jia; Hu, Ye; Musser, James M
2016-07-01
Variable-number tandem-repeat (VNTR) polymorphisms are ubiquitous in bacteria. However, only a small fraction of them has been functionally studied. Here, we report an intergenic VNTR polymorphism that confers an altered level of toxin production and increased virulence in Streptococcus pyogenes The nature of the polymorphism is a one-unit deletion in a three-tandem-repeat locus upstream of the rocA gene encoding a sensor kinase. S. pyogenes strains with this type of polymorphism cause human infection and produce significantly larger amounts of the secreted cytotoxins S. pyogenes NADase (SPN) and streptolysin O (SLO). Using isogenic mutant strains, we demonstrate that deleting one or more units of the tandem repeats abolished RocA production, reduced CovR phosphorylation, derepressed multiple CovR-regulated virulence factors (such as SPN and SLO), and increased virulence in a mouse model of necrotizing fasciitis. The phenotypic effect of the VNTR polymorphism was nearly the same as that of inactivating the rocA gene. In summary, we identified and characterized an intergenic VNTR polymorphism in S. pyogenes that affects toxin production and virulence. These new findings enhance understanding of rocA biology and the function of VNTR polymorphisms in S. pyogenes. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Avvaru, Akshay Kumar; Sowpati, Divya Tej; Mishra, Rakesh Kumar
2018-03-15
Microsatellites or Simple Sequence Repeats (SSRs) are short tandem repeats of DNA motifs present in all genomes. They have long been used for a variety of purposes in the areas of population genetics, genotyping, marker-assisted selection and forensics. Numerous studies have highlighted their functional roles in genome organization and gene regulation. Though several tools are currently available to identify SSRs from genomic sequences, they have significant limitations. We present a novel algorithm called PERF for extremely fast and comprehensive identification of microsatellites from DNA sequences of any size. PERF is several fold faster than existing algorithms and uses up to 5-fold lesser memory. It provides a clean and flexible command-line interface to change the default settings, and produces output in an easily-parseable tab-separated format. In addition, PERF generates an interactive and stand-alone HTML report with charts and tables for easy downstream analysis. PERF is implemented in the Python programming language. It is freely available on PyPI under the package name perf_ssr, and can be installed directly using pip or easy_install. The documentation of PERF is available at https://github.com/rkmlab/perf. The source code of PERF is deposited in GitHub at https://github.com/rkmlab/perf under an MIT license. tej@ccmb.res.in. Supplementary data are available at Bioinformatics online.
Hayes, Michael L; Giang, Karolyn; Mulligan, R Michael
2012-05-14
Pentatricopeptide repeat (PPR) proteins are required for numerous RNA processing events in plant organelles including C-to-U editing, splicing, stabilization, and cleavage. Fifteen PPR proteins are known to be required for RNA editing at 21 sites in Arabidopsis chloroplasts, and belong to the PLS class of PPR proteins. In this study, we investigate the co-evolution of four PPR genes (CRR4, CRR21, CLB19, and OTP82) and their six editing targets in Brassicaceae species. PPR genes are composed of approximately 10 to 20 tandem repeats and each repeat has two α-helical regions, helix A and helix B, that are separated by short coil regions. Each repeat and structural feature was examined to determine the selective pressures on these regions. All of the PPR genes examined are under strong negative selection. Multiple independent losses of editing site targets are observed for both CRR21 and OTP82. In several species lacking the known editing target for CRR21, PPR genes are truncated near the 17th PPR repeat. The coding sequences of the truncated CRR21 genes are maintained under strong negative selection; however, the 3' UTR sequences beyond the truncation site have substantially diverged. Phylogenetic analyses of four PPR genes show that sequences corresponding to helix A are high compared to helix B sequences. Differential evolutionary selection of helix A versus helix B is observed in both plant and mammalian PPR genes. PPR genes and their cognate editing sites are mutually constrained in evolution. Editing sites are frequently lost by replacement of an edited C with a genomic T. After the loss of an editing site, the PPR genes are observed with three outcomes: first, few changes are detected in some cases; second, the PPR gene is present as a pseudogene; and third, the PPR gene is present but truncated in the C-terminal region. The retention of truncated forms of CRR21 that are maintained under strong negative selection even in the absence of an editing site target suggests that unrecognized function(s) might exist for this PPR protein. PPR gene sequences that encode helix A are under strong selection, and could be involved in RNA substrate recognition.
Sequences in the intergenic spacer influence RNA Pol I transcription from the human rRNA promoter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, W.M.; Sylvester, J.E.
1994-09-01
In most eucaryotic species, ribosomal genes are tandemly repeated about 100-5000 times per haploid genome. The 43 Kb human rDNA repeat consists of a 13 Kb coding region for the 18S, 5.8S, 28S ribosomal RNAs (rRNAs) and transcribed spacers separated by a 30 Kb intergenic spacer. For species such as frog, mouse and rat, sequences in the intergenic spacer other than the gene promoter have been shown to modulate transcription of the ribosomal gene. These sequences are spacer promoters, enhancers and the terminator for spacer transcription. We are addressing whether the human ribosomal gene promoter is similarly influenced. In-vitro transcriptionmore » run-off assays have revealed that the 4.5 kb region (CBE), directly upstream of the gene promoter, has cis-stimulation and trans-competition properties. This suggests that the CBE fragment contains an enhancer(s) for ribosomal gene transcription. Further experiments have shown that a fragment ({approximately}1.6 kb) within the CBE fragment also has trans-competition function. Deletion subclones of this region are being tested to delineate the exact sequences responsible for these modulating activities. Previous sequence analysis and functional studies have revealed that CBE contains regions of DNA capable of adopting alternative structures such as bent DNA, Z-DNA, and triple-stranded DNA. Whether these structures are required for modulating transcription remains to be determined as does the specific DNA-protein interaction involved.« less
Repetition as the essence of life on this earth: music and genes.
Ohno, S
1987-01-01
In prebiotic nucleic acid replication, templates appear to have been in short supply. A single round of tandem duplication of existing oligomers assured progressive extension of templates to the length adequate for encoding of polypeptide chains. Thus, the first set of coding sequences had to be repeats of base oligomers encoding polypeptide chains of various periodicities. On one hand, the readiness of these periodical polypeptide chains to assume alpha-helical and/or beta-sheet secondary structures contributed to the extremely rapid initial functional diversification of these polypeptide chains. It would be recalled that most, if not all, of the sugar-metabolizing enzymes had already achieved the inviolable functional competence before the division of prokaryotes from eukaryotes. On the other hand, a certain (dipeptidic?) of the peptidic periodicities was apparently chosen as the timekeeping unit by the biological clock. Musical compositions too apparently evolved originally as a timekeeping device. Accordingly, repetitiousness is evident in all musical compositions. Evolution of musical compositions from the early Baroque to the late Romantic parallels that of coding sequences from rather exact repeats of base oligomers to more complex modern coding sequences in which repetitious elements are less conspicuous and more varied. Inasmuch as the earth is governed by the hierarchy of periodicities (days, months and years), such reliance on periodicities is rather expected.
Panax ginseng genome examination for ginsenoside biosynthesis.
Xu, Jiang; Chu, Yang; Liao, Baosheng; Xiao, Shuiming; Yin, Qinggang; Bai, Rui; Su, He; Dong, Linlin; Li, Xiwen; Qian, Jun; Zhang, Jingjing; Zhang, Yujun; Zhang, Xiaoyan; Wu, Mingli; Zhang, Jie; Li, Guozheng; Zhang, Lei; Chang, Zhenzhan; Zhang, Yuebin; Jia, Zhengwei; Liu, Zhixiang; Afreh, Daniel; Nahurira, Ruth; Zhang, Lianjuan; Cheng, Ruiyang; Zhu, Yingjie; Zhu, Guangwei; Rao, Wei; Zhou, Chao; Qiao, Lirui; Huang, Zhihai; Cheng, Yung-Chi; Chen, Shilin
2017-11-01
Ginseng, which contains ginsenosides as bioactive compounds, has been regarded as an important traditional medicine for several millennia. However, the genetic background of ginseng remains poorly understood, partly because of the plant's large and complex genome composition. We report the entire genome sequence of Panax ginseng using next-generation sequencing. The 3.5-Gb nucleotide sequence contains more than 60% repeats and encodes 42 006 predicted genes. Twenty-two transcriptome datasets and mass spectrometry images of ginseng roots were adopted to precisely quantify the functional genes. Thirty-one genes were identified to be involved in the mevalonic acid pathway. Eight of these genes were annotated as 3-hydroxy-3-methylglutaryl-CoA reductases, which displayed diverse structures and expression characteristics. A total of 225 UDP-glycosyltransferases (UGTs) were identified, and these UGTs accounted for one of the largest gene families of ginseng. Tandem repeats contributed to the duplication and divergence of UGTs. Molecular modeling of UGTs in the 71st, 74th, and 94th families revealed a regiospecific conserved motif located at the N-terminus. Molecular docking predicted that this motif captures ginsenoside precursors. The ginseng genome represents a valuable resource for understanding and improving the breeding, cultivation, and synthesis biology of this key herb. © The Author 2017. Published by Oxford University Press.
2011-01-01
Background Since Francisella noatunensis was first isolated from cultured Atlantic cod in 2004, it has emerged as a global fish pathogen causing disease in both warm and cold water species. Outbreaks of francisellosis occur in several important cultured fish species making a correct management of this disease a matter of major importance. Currently there are no vaccines or treatments available. A strain typing system for use in studies of F. noatunensis epizootics would be an important tool for disease management. However, the high genetic similarity within the Francisella spp. makes strain typing difficult, but such typing of the related human pathogen Francisella tullarensis has been performed successfully by targeting loci with higher genetic variation than the traditional signature sequences. These loci are known as Variable Numbers of Tandem Repeat (VNTR). The aim of this study is to identify possible useful VNTRs in the genome of F. noatunensis. Results Seven polymorphic VNTR loci were identified in the preliminary genome sequence of F. noatunensis ssp. noatunensis GM2212 isolate. These VNTR-loci were sequenced in F. noatunensis isolates collected from Atlantic cod (Gadus morhua) from Norway (n = 21), Three-line grunt (Parapristipoma trilineatum) from Japan (n = 1), Tilapia (Oreochromis spp.) from Indonesia (n = 3) and Atlantic salmon (Salmo salar) from Chile (n = 1). The Norwegian isolates presented in this study show both nine allelic profiles and clades, and that the majority of the farmed isolates belong in two clades only, while the allelic profiles from wild cod are unique. Conclusions VNTRs can be used to separate isolates belonging to both subspecies of F. noatunensis. Low allelic diversity in F. noatunensis isolates from outbreaks in cod culture compared to isolates wild cod, indicate that transmission of these isolates may be a result of human activity. The sequence based MLVA system presented in this study should provide a good starting point for further development of a genotyping system that can be used in studies of epizootics and disease management of francisellosis. PMID:21261955
PTGBase: an integrated database to study tandem duplicated genes in plants.
Yu, Jingyin; Ke, Tao; Tehrim, Sadia; Sun, Fengming; Liao, Boshou; Hua, Wei
2015-01-01
Tandem duplication is a wide-spread phenomenon in plant genomes and plays significant roles in evolution and adaptation to changing environments. Tandem duplicated genes related to certain functions will lead to the expansion of gene families and bring increase of gene dosage in the form of gene cluster arrays. Many tandem duplication events have been studied in plant genomes; yet, there is a surprising shortage of efforts to systematically present the integration of large amounts of information about publicly deposited tandem duplicated gene data across the plant kingdom. To address this shortcoming, we developed the first plant tandem duplicated genes database, PTGBase. It delivers the most comprehensive resource available to date, spanning 39 plant genomes, including model species and newly sequenced species alike. Across these genomes, 54 130 tandem duplicated gene clusters (129 652 genes) are presented in the database. Each tandem array, as well as its member genes, is characterized in complete detail. Tandem duplicated genes in PTGBase can be explored through browsing or searching by identifiers or keywords of functional annotation and sequence similarity. Users can download tandem duplicated gene arrays easily to any scale, up to the complete annotation data set for an entire plant genome. PTGBase will be updated regularly with newly sequenced plant species as they become available. © The Author(s) 2015. Published by Oxford University Press.
Deng, Dong; Yan, Chuangye; Wu, Jianping; Pan, Xiaojing; Yan, Nieng
2014-04-01
Transcription activator-like (TAL) effectors specifically bind to double stranded (ds) DNA through a central domain of tandem repeats. Each TAL effector (TALE) repeat comprises 33-35 amino acids and recognizes one specific DNA base through a highly variable residue at a fixed position in the repeat. Structural studies have revealed the molecular basis of DNA recognition by TALE repeats. Examination of the overall structure reveals that the basic building block of TALE protein, namely a helical hairpin, is one-helix shifted from the previously defined TALE motif. Here we wish to suggest a structure-based re-demarcation of the TALE repeat which starts with the residues that bind to the DNA backbone phosphate and concludes with the base-recognition hyper-variable residue. This new numbering system is consistent with the α-solenoid superfamily to which TALE belongs, and reflects the structural integrity of TAL effectors. In addition, it confers integral number of TALE repeats that matches the number of bound DNA bases. We then present fifteen crystal structures of engineered dHax3 variants in complex with target DNA molecules, which elucidate the structural basis for the recognition of bases adenine (A) and guanine (G) by reported or uncharacterized TALE codes. Finally, we analyzed the sequence-structure correlation of the amino acid residues within a TALE repeat. The structural analyses reported here may advance the mechanistic understanding of TALE proteins and facilitate the design of TALEN with improved affinity and specificity.
de Neeling, Albert; Rasmussen, Erik Michael; Norman, Anders; Mulder, Arnout; van Hunen, Rianne; de Vries, Gerard; Haddad, Walid; Anthony, Richard; Lillebaek, Troels; van der Hoek, Wim; van Soolingen, Dick
2017-01-01
ABSTRACT In many countries, Mycobacterium tuberculosis isolates are routinely subjected to variable-number tandem-repeat (VNTR) typing to investigate M. tuberculosis transmission. Unexpectedly, cross-border clusters were identified among African refugees in the Netherlands and Denmark, although transmission in those countries was unlikely. Whole-genome sequencing (WGS) was applied to analyze transmission in depth and to assess the precision of VNTR typing. WGS was applied to 40 M. tuberculosis isolates from refugees in the Netherlands and Denmark (most of whom were from the Horn of Africa) that shared the exact same VNTR profile. Cluster investigations were undertaken to identify in-country epidemiological links. Combining WGS results for the isolates (all members of the central Asian strain [CAS]/Delhi genotype), from both European countries, an average genetic distance of 80 single-nucleotide polymorphisms (SNPs) (maximum, 153 SNPs) was observed. The few pairs of isolates with confirmed epidemiological links, except for one pair, had a maximum distance of 12 SNPs. WGS divided this refugee cluster into several subclusters of patients from the same country of origin. Although the M. tuberculosis cases, mainly originating from African countries, shared the exact same VNTR profile, most were clearly distinguished by WGS. The average genetic distance in this specific VNTR cluster was 2 times greater than that in other VNTR clusters. Thus, identical VNTR profiles did not represent recent direct M. tuberculosis transmission for this group of patients. It appears that either these strains from Africa are extremely conserved genetically or there is ongoing transmission of this genotype among refugees on their long migration routes from Africa to Europe. PMID:29167288
Jajou, Rana; de Neeling, Albert; Rasmussen, Erik Michael; Norman, Anders; Mulder, Arnout; van Hunen, Rianne; de Vries, Gerard; Haddad, Walid; Anthony, Richard; Lillebaek, Troels; van der Hoek, Wim; van Soolingen, Dick
2018-02-01
In many countries, Mycobacterium tuberculosis isolates are routinely subjected to variable-number tandem-repeat (VNTR) typing to investigate M. tuberculosis transmission. Unexpectedly, cross-border clusters were identified among African refugees in the Netherlands and Denmark, although transmission in those countries was unlikely. Whole-genome sequencing (WGS) was applied to analyze transmission in depth and to assess the precision of VNTR typing. WGS was applied to 40 M. tuberculosis isolates from refugees in the Netherlands and Denmark (most of whom were from the Horn of Africa) that shared the exact same VNTR profile. Cluster investigations were undertaken to identify in-country epidemiological links. Combining WGS results for the isolates (all members of the central Asian strain [CAS]/Delhi genotype), from both European countries, an average genetic distance of 80 single-nucleotide polymorphisms (SNPs) (maximum, 153 SNPs) was observed. The few pairs of isolates with confirmed epidemiological links, except for one pair, had a maximum distance of 12 SNPs. WGS divided this refugee cluster into several subclusters of patients from the same country of origin. Although the M. tuberculosis cases, mainly originating from African countries, shared the exact same VNTR profile, most were clearly distinguished by WGS. The average genetic distance in this specific VNTR cluster was 2 times greater than that in other VNTR clusters. Thus, identical VNTR profiles did not represent recent direct M. tuberculosis transmission for this group of patients. It appears that either these strains from Africa are extremely conserved genetically or there is ongoing transmission of this genotype among refugees on their long migration routes from Africa to Europe. Copyright © 2018 Jajou et al.
Coagulase-Negative Staphylococci in Human Milk From Mothers of Preterm Compared With Term Neonates.
Soeorg, Hiie; Metsvaht, Tuuli; Eelmäe, Imbi; Metsvaht, Hanna Kadri; Treumuth, Sirli; Merila, Mirjam; Ilmoja, Mari-Liis; Lutsar, Irja
2017-05-01
Human milk is the preferred nutrition for neonates and a source of bacteria. Research aim: The authors aimed to characterize the molecular epidemiology and genetic content of staphylococci in the human milk of mothers of preterm and term neonates. Staphylococci were isolated once per week in the 1st month postpartum from the human milk of mothers of 20 healthy term and 49 preterm neonates hospitalized in the neonatal intensive care unit. Multilocus variable-number tandem-repeats analysis and multilocus sequence typing were used. The presence of the mecA gene, icaA gene of the ica-operon, IS 256, and ACME genetic elements was determined by PCR. The human milk of mothers of preterm compared with term neonates had higher counts of staphylococci but lower species diversity. The human milk of mothers of preterm compared with term neonates more often contained Staphylococcus epidermidis mecA (32.7% vs. 2.6%), icaA (18.8% vs. 6%), IS 256 (7.9% vs. 0.9%), and ACME (15.4% vs. 5.1%), as well as Staphylococcus haemolyticus mecA (90.5% vs. 10%) and IS 256 (61.9% vs. 10%). The overall distribution of multilocus variable-number tandem-repeats analysis (MLVA) types and sequence types was similar between the human milk of mothers of preterm and term neonates, but a few mecA-IS 256-positive MLVA types colonized only mothers of preterm neonates. Maternal hospitalization within 1 month postpartum and the use of an arterial catheter or antibacterial treatment in the neonate increased the odds of harboring mecA-positive staphylococci in human milk. Limiting exposure of mothers of preterm neonates to the hospital could prevent human milk colonization with more pathogenic staphylococci.
Lavania, Mallika; Jadhav, Rupendra; Turankar, Ravindra P; Singh, Itu; Nigam, Astha; Sengupta, U
2015-12-01
Leprosy is still a major health problem in India which has the highest number of cases. Multiple locus variable number of tandem repeat analysis (MLVA) and single nucleotide polymorphism (SNP) have been proposed as tools of strain typing for tracking the transmission of leprosy. However, empirical data for a defined population from scale and duration were lacking for studying the transmission chain of leprosy. Seventy slit skin scrapings were collected from Purulia (West Bengal), Miraj (Maharashtra), Shahdara (Delhi), and Naini (UP) hospitals of The Leprosy Mission (TLM). SNP subtyping and MLVA on 10 VNTR loci were applied for the strain typing of Mycobacterium leprae. Along with the strain typing conventional epidemiological investigation was also performed to trace the transmission chain. In addition, phylogenetic analysis was done on variable number of tandem repeat (VNTR) data sets using sequence type analysis and recombinational tests (START) software. START software performs analyses to aid in the investigation of bacterial population structure using multilocus sequence data. These analyses include data summary, lineage assignment, and tests for recombination and selection. Diversity was observed in the cross-sectional survey of isolates obtained from 70 patients. Similarity in fingerprinting profiles observed in specimens of cases from the same family or neighborhood locations indicated a possible common source of infection. The data suggest that these VNTRs including subtyping of SNPs can be used to study the sources and transmission chain in leprosy, which could be very important in monitoring of the disease dynamics in high endemic foci. The present study strongly indicates that multi-case families might constitute epidemic foci and the main source of M. leprae in villages, causing the predominant strain or cluster infection leading to the spread of leprosy in the community. Copyright © 2015 Elsevier B.V. All rights reserved.
Dimovski, Karolina; Cao, Hanwei; Wijburg, Odilia L. C.; Strugnell, Richard A.; Mantena, Radha K.; Whipp, Margaret; Hogg, Geoff
2014-01-01
Variable-number tandem repeats (VNTRs) mutate rapidly and can be useful markers for genotyping. While multilocus VNTR analysis (MLVA) is increasingly used in the detection and investigation of food-borne outbreaks caused by Salmonella enterica serovar Typhimurium (S. Typhimurium) and other bacterial pathogens, MLVA data analysis usually relies on simple clustering approaches that may lead to incorrect interpretations. Here, we estimated the rates of copy number change at each of the five loci commonly used for S. Typhimurium MLVA, during in vitro and in vivo passage. We found that loci STTR5, STTR6, and STTR10 changed during passage but STTR3 and STTR9 did not. Relative rates of change were consistent across in vitro and in vivo growth and could be accurately estimated from diversity measures of natural variation observed during large outbreaks. Using a set of 203 isolates from a series of linked outbreaks and whole-genome sequencing of 12 representative isolates, we assessed the accuracy and utility of several alternative methods for analyzing and interpreting S. Typhimurium MLVA data. We show that eBURST analysis was accurate and informative. For construction of MLVA-based trees, a novel distance metric, based on the geometric model of VNTR evolution coupled with locus-specific weights, performed better than the commonly used simple or categorical distance metrics. The data suggest that, for the purpose of identifying potential transmission clusters for further investigation, isolates whose profiles differ at one of the rapidly changing STTR5, STTR6, and STTR10 loci should be collapsed into the same cluster. PMID:24957617
Microsatellite DNA capture from enriched libraries.
Gonzalez, Elena G; Zardoya, Rafael
2013-01-01
Microsatellites are DNA sequences of tandem repeats of one to six nucleotides, which are highly polymorphic, and thus the molecular markers of choice in many kinship, population genetic, and conservation studies. There have been significant technical improvements since the early methods for microsatellite isolation were developed, and today the most common procedures take advantage of the hybrid capture methods of enriched-targeted microsatellite DNA. Furthermore, recent advents in sequencing technologies (i.e., next-generation sequencing, NGS) have fostered the mining of microsatellite markers in non-model organisms, affording a cost-effective way of obtaining a large amount of sequence data potentially useful for loci characterization. The rapid improvements of NGS platforms together with the increase in available microsatellite information open new avenues to the understanding of the evolutionary forces that shape genetic structuring in wild populations. Here, we provide detailed methodological procedures for microsatellite isolation based on the screening of GT microsatellite-enriched libraries, either by cloning and Sanger sequencing of positive clones or by direct NGS. Guides for designing new species-specific primers and basic genotyping are also given.
Identification and characterization of cell-specific enhancer elements for the mouse ETF/Tead2 gene.
Tanoue, Y; Yasunami, M; Suzuki, K; Ohkubo, H
2001-12-21
We have identified and characterized by transient transfection assays the cell-specific 117-bp enhancer sequence in the first intron of the mouse ETF (Embryonic TEA domain-containing factor)/Tead2 gene required for transcriptional activation in ETF/Tead2 gene-expressing cells, such as P19 cells. The 117-bp enhancer contains one GC-rich sequence (5'-GGGGCGGGG-3'), termed the GC box, and two tandemly repeated GA-rich sequences (5'-GGGGGAGGGG-3'), termed the proximal and distal GA elements. Further analyses, including transfection studies and electrophoretic mobility shift assays using a series of deletion and mutation constructs, indicated that Sp1, a putative activator, may be required to predominate over its competition with another unknown putative repressor, termed the GA element-binding factor, for binding to both the GC box, which overlapped with the proximal GA element, and the distal GA element in the 117-bp sequence in order to achieve a full enhancer activity. We also discuss a possible mechanism underlying the cell-specific enhancer activity of the 117-bp sequence.
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling
Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien
2012-01-01
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung
2018-01-01
Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414
Initial sequence and comparative analysis of the cat genome
Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.
2007-01-01
The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172
Pechlaner, Raimund; Willeit, Peter; Summerer, Monika; Santer, Peter; Egger, Georg; Kronenberg, Florian; Demetz, Egon; Weiss, Günter; Tsimikas, Sotirios; Witztum, Joseph L; Willeit, Karin; Iglseder, Bernhard; Paulweber, Bernhard; Kedenko, Lyudmyla; Haun, Margot; Meisinger, Christa; Gieger, Christian; Müller-Nurasyid, Martina; Peters, Annette; Willeit, Johann; Kiechl, Stefan
2015-01-01
The enzyme heme oxygenase-1 (HO-1) exerts cytoprotective effects in response to various cellular stressors. A variable number tandem repeat polymorphism in the HO-1 gene promoter region has previously been linked to cardiovascular disease. We examined this association prospectively in the general population. Incidence of stroke, myocardial infarction, or vascular death was registered between 1995 and 2010 in 812 participants of the Bruneck Study aged 45 to 84 years (49.4% males). Carotid atherosclerosis progression was quantified by high-resolution ultrasound. HO-1 variable number tandem repeat length was determined by polymerase chain reaction. Subjects with ≥32 tandem repeats on both HO-1 alleles compared with the rest of the population (recessive trait) featured substantially increased cardiovascular disease risk (hazard ratio [95% confidence interval], 5.45 [2.39, 12.42]; P<0.0001), enhanced atherosclerosis progression (median difference in atherosclerosis score [interquartile range], 2.1 [0.8, 5.6] versus 0.0 [0.0, 2.2] mm; P=0.0012), and a trend toward higher levels of oxidized phospholipids on apolipoprotein B-100 (median oxidized phospholipids/apolipoprotein B level [interquartile range], 11364 [4160, 18330] versus 4844 [3174, 12284] relative light units; P=0.0554). Increased cardiovascular disease risk in those homozygous for ≥32 repeats was also detected in a pooled analysis of 7848 participants of the Bruneck, SAPHIR, and KORA prospective studies (hazard ratio [95% confidence interval], 3.26 [1.50, 7.33]; P=0.0043). This study found a strong association between the HO-1 variable number tandem repeat polymorphism and cardiovascular disease risk confined to subjects with a high number of repeats on both HO-1 alleles and provides evidence for accelerated atherogenesis and decreased antioxidant defense in this vascular high-risk group. © 2014 American Heart Association, Inc.
Vera, Manuel; Bello, Xabier; Álvarez-Dios, Jose-Antonio; Pardo, Belen G; Sánchez, Laura; Carlsson, Jens; Carlsson, Jeanette E L; Bartolomé, Carolina; Maside, Xulio; Martinez, Paulino
2015-12-01
The flat oyster (Ostrea edulis) is one of the most appreciated molluscs in Europe, but its production has been greatly reduced by the parasite Bonamia ostreae. Here, new generation genomic resources were used to analyse the repetitive fraction of the oyster genome, with the aim of developing molecular markers to face this main oyster production challenge. The resulting oyster database, consists of two sets of 10,318 and 7159 unique contigs (4.8 Mbp and 6.8 Mbp in total length) representing the oyster's genome (WG) and haemocyte transcriptome (HT), respectively. A total of 1083 sequences were identified as TE-derived, which corresponded to 4.0% of WG and 1.1% of HT. They were clustered into 142 homology groups, most of which were assigned to the Penelope order of retrotransposons, and to the Helitron and TIR DNA-transposons. Simple repeats and rRNA pseudogenes, also made a significant contribution to the oyster's genome (0.5% and 0.3% of WG and HT, respectively).The most frequent short tandem repeats identified in WG were tetranucleotide motifs while trinucleotide motifs were in HT. Forty identified microsatellite loci, 20 from each database, were selected for technical validation. Success was much lower among WG than HT microsatellites (15% vs 55%), which could reflect higher variation in anonymous regions interfering with primer annealing. All microsatellites developed adjusted to Hardy-Weinberg proportions and represent a useful tool to support future breeding programmes and to manage genetic resources of natural flat oyster beds. Copyright © 2015 Elsevier B.V. All rights reserved.
Gao, Zhenrui; Liu, Chuanliang; Zhang, Yanzhao; Li, Ying; Yi, Keke; Zhao, Xinhua; Cui, Min-Long
2013-01-01
The red leaf coloration of Empire Red Leaf Cotton (ERLC) (Gossypium hirsutum L.), resulted from anthocyanin accumulation in light, is a well known dominant agricultural trait. However, the underpin molecular mechanism remains elusive. To explore this, we compared the molecular biological basis of anthocyanin accumulation in both ERLC and the green leaf cotton variety CCRI 24 (Gossypium hirsutum L.). Introduction of R2R3-MYB transcription factor Rosea1, the master regulator anthocyanin biosynthesis in Antirrhinum majus, into CCRI 24 induced anthocyanin accumulation, indicating structural genes for anthocyanin biosynthesis are not defected and the leaf coloration might be caused by variation of regulatory genes expression. Expression analysis found that a transcription factor RLC1 (Red Leaf Cotton 1) which encodes the ortholog of PAP1/Rosea1 was highly expressed in leaves of ERLC but barely expressed in CCRI 24 in light. Ectopic expression of RLC1 from ERLC and CCRI 24 in hairy roots of Antirrhinum majus and CCRI 24 significantly enhanced anthocyanin accumulation. Comparison of RLC1 promoter sequences between ERLC and CCRI 24 revealed two 228-bp tandem repeats presented in ERLC with only one repeat in CCRI 24. Transient assays in cotton leave tissue evidenced that the tandem repeats in ERLC is responsible for light-induced RLC1 expression and therefore anthocyanin accumulation. Taken together, our results in this article strongly support an important step toward understanding the role of R2R3-MYB transcription factors in the regulatory menchanisms of anthocyanin accumulation in red leaf cotton under light.
Rumore, Jillian Leigh; Tschetter, Lorelee; Nadon, Celine
2016-05-01
The lack of pattern diversity among pulsed-field gel electrophoresis (PFGE) profiles for Escherichia coli O157:H7 in Canada does not consistently provide optimal discrimination, and therefore, differentiating temporally and/or geographically associated sporadic cases from potential outbreak cases can at times impede investigations. To address this limitation, DNA sequence-based methods such as multilocus variable-number tandem-repeat analysis (MLVA) have been explored. To assess the performance of MLVA as a supplemental method to PFGE from the Canadian perspective, a retrospective analysis of all E. coli O157:H7 isolated in Canada from January 2008 to December 2012 (inclusive) was conducted. A total of 2285 E. coli O157:H7 isolates and 63 clusters of cases (by PFGE) were selected for the study. Based on the qualitative analysis, the addition of MLVA improved the categorization of cases for 60% of clusters and no change was observed for ∼40% of clusters investigated. In such situations, MLVA serves to confirm PFGE results, but may not add further information per se. The findings of this study demonstrate that MLVA data, when used in combination with PFGE-based analyses, provide additional resolution to the detection of clusters lacking PFGE diversity as well as demonstrate good epidemiological concordance. In addition, MLVA is able to identify cluster-associated isolates with variant PFGE pattern combinations that may have been previously missed by PFGE alone. Optimal laboratory surveillance in Canada is achieved with the application of PFGE and MLVA in tandem for routine surveillance, cluster detection, and outbreak response.
Perry, N; Cheasty, T; Dallman, T; Launders, N; Willshaw, G
2013-10-01
Evaluation of multilocus variable number tandem repeat analysis (MLVA) to subtype all isolates of Vero cytotoxin-producing Escherichia coli O157 phage type 8 in England and Wales. Over a 13 month period from December 2010, 483 isolates of VTEC O157 PT8 were tested by MLVA; 39% were received in the first 4 months of 2011, when infections are generally low. One profile, or single locus variants of it, was present in 249 (52%) isolates but was not common previously. These cases represented a national increase in PT8, associated epidemiologically with soil-contaminated vegetables. Most of the 177 other MLVA profiles were unique to a single isolate. Profiles shared by >1 isolate included cases from two small community, food-borne outbreaks and 11 households. Several shared profiles were found among 23 isolates without known links. Apart from one group, isolates linked to travel abroad had very diverse profiles. Multilocus variable number tandem repeat analysis discriminated apparent sporadic isolates of the same PT and assisted in detection of cases in an emerging national outbreak. Multilocus variable number tandem repeat analysis is an epidemiologically valid complement to surveillance and applicable as a rapid, practical test for large numbers of isolates. © 2013 The Society for Applied Microbiology.
Van der Bij, A K; Van der Zwan, D; Peirano, G; Severin, J A; Pitout, J D D; Van Westreenen, M; Goessens, W H F
2012-09-01
Recently, the first outbreak of clonally related VIM-2 metallo-β-lactamase (MBL)-producing Pseudomonas aeruginosa in a Dutch tertiary-care centre was described. Subsequently, a nationwide surveillance study was performed in 2010-2011, which identified the presence of VIM-2 MBL-producing P. aeruginosa in 11 different hospitals. Genotyping by multiple-locus variable-number tandem-repeat analysis (MLVA) showed that the majority of the 82 MBL-producing isolates found belonged to a single MLVA type (n = 70, 85%), identified as ST111 by multilocus sequence typing (MLST). As MBL-producing isolates cause serious infections that are difficult to treat, the presence of clonally related isolates in various hospitals throughout the Netherlands is of nationwide concern. © 2012 The Authors. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.
Prolonged and mixed non-O157 Escherichia coli infection in an Australian household.
Staples, M; Graham, R M A; Doyle, C J; Smith, H V; Jennison, A V
2012-05-01
An Australian family was identified through a Public Health follow up on a Shiga-toxigenic Escherichia coli (STEC) positive bloody diarrhoea case, with three of the four family members experiencing either symptomatic or asymptomatic STEC shedding. Bacterial isolates were submitted to stx sequence sub-typing, multi-locus variable number tandem repeat analysis (MLVA), multi-locus sequence typing (MLST) and binary typing. The analysis revealed that there were multiple strains of STEC being shed by the family members, with similar virulence gene profiles and the same serogroup but differing in their MLVA and MLST profiles. This study illustrates the potentially complicated nature of non-O157 STEC infections and the importance of molecular epidemiology in understanding disease clusters. © 2012 QUEENSLAND HEALTH. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.
Maroni, G.; Wise, J.; Young, J. E.; Otto, E.
1987-01-01
A search for duplications of the Drosophila melanogaster metallothionein gene (Mtn) yielded numerous examples of this type of chromosomal rearrangement. These duplications are distributed widely—we found them in samples from four continents, and they are functional—larvae carrying Mtn duplications produce more Mtn RNA and tolerate increased cadmium and copper concentrations. Six different duplication types were characterized by restriction-enzyme analyses using probes from the Mtn region. The restriction maps show that in four cases the sequences, ranging in size between 2.2 and 6.0 kb, are arranged as direct, tandem repeats; in two other cases, this basic pattern is modified by the insertion of a putative transposable element into one of the repeated units. Duplications of the D. melanogaster metallothionein gene such as those that we found in natural populations may represent early stages in the evolution of a gene family. PMID:2828157
Telomerase Mechanism of Telomere Synthesis
Wu, R. Alex; Upton, Heather E.; Vogan, Jacob M.; Collins, Kathleen
2017-01-01
Telomerase is the essential reverse transcriptase required for linear chromosome maintenance in most eukaryotes. Telomerase supplements the tandem array of simple-sequence repeats at chromosome ends to compensate for the DNA erosion inherent in genome replication. The template for telomerase reverse transcriptase is within the RNA subunit of the ribonucleoprotein complex, which in cells contains additional telomerase holoenzyme proteins that assemble the active ribonucleoprotein and promote its function at telomeres. Telomerase is distinct among polymerases in its reiterative reuse of an internal template. The template is precisely defined, processively copied, and regenerated by release of single-stranded product DNA. New specificities of nucleic acid handling that underlie the catalytic cycle of repeat synthesis derive from both active site specialization and new motif elaborations in protein and RNA subunits. Studies of telomerase provide unique insights into cellular requirements for genome stability, tissue renewal, and tumorigenesis as well as new perspectives on dynamic ribonucleoprotein machines. PMID:28141967
NASA Astrophysics Data System (ADS)
Young, James F.; Hockmeyer, Wayne T.; Gross, Mitchell; Ripley Ballou, W.; Wirtz, Robert A.; Trosper, James H.; Beaudoin, Richard L.; Hollingdale, Michael R.; Miller, Louis H.; Diggs, Carter L.; Rosenberg, Martin
1985-05-01
The circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum may be the most promising target for the development of a malaria vaccine. In this study, proteins composed of 16, 32, or 48 tandem copies of a tetrapeptide repeating sequence found in the CS protein were efficiently expressed in the bacterium Escherichia coli. When injected into mice, these recombinant products resulted in the production of high titers of antibodies that reacted with the authentic CS protein on live sporozoites and blocked sporozoite invasion of human hepatoma cells in vitro. These CS protein derivatives are therefore candidates for a human malaria vaccine.
Bhavanandan, V P; Gupta, D; Woitach, J; Guo, X; Jiang, W
1999-06-01
Secreted epithelial mucins are large macromolecules which exhibit extreme polydispersity, the molecular basis of which is not fully understood. We have obtained partial sequences of two genes (BSM1 and BSM2) coding for two distinct molecules. This is the first time that such closely-related genes have been identified for any mucin from an animal. We propose that a combination of multiple homologous genes, alternative splicing, differential glycosylation, and additional post-translational processing all contribute to the extreme polydispersity of mucins. The multiple domain structure and non-identical tandem repeats are also very important for the generation of the saccharide diversities of mucins.
Wyrwa, Katarzyna; Książkiewicz, Michał; Szczepaniak, Anna; Susek, Karolina; Podkowiński, Jan; Naganowska, Barbara
2016-09-01
Narrow-leafed lupin (Lupinus angustifolius L.) has recently been considered a reference genome for the Lupinus genus. In the present work, genetic and cytogenetic maps of L. angustifolius were supplemented with 30 new molecular markers representing lupin genome regions, harboring genes involved in nitrogen fixation during the symbiotic interaction of legumes and soil bacteria (Rhizobiaceae). Our studies resulted in the precise localization of bacterial artificial chromosomes (BACs) carrying sequence variants for early nodulin 40, nodulin 26, nodulin 45, aspartate aminotransferase P2, asparagine synthetase, cytosolic glutamine synthetase, and phosphoenolpyruvate carboxylase. Together with previously mapped chromosomes, the integrated L. angustifolius map encompasses 73 chromosome markers, including 5S ribosomal DNA (rDNA) and 45S rDNA, and anchors 20 L. angustifolius linkage groups to corresponding chromosomes. Chromosomal identification using BAC fluorescence in situ hybridization identified two BAC clones as narrow-leafed lupin centromere-specific markers, which served as templates for preliminary studies of centromere composition within the genus. Bioinformatic analysis of these two BACs revealed that centromeric/pericentromeric regions of narrow-leafed lupin chromosomes consisted of simple sequence repeats ordered into tandem repeats containing the trinucleotide and pentanucleotide simple sequence repeats AGG and GATAC, structured into long arrays. Moreover, cross-genus microsynteny analysis revealed syntenic patterns of 31 single-locus BAC clones among several legume species. The gene and chromosome level findings provide evidence of ancient duplication events that must have occurred very early in the divergence of papilionoid lineages. This work provides a strong foundation for future comparative mapping among legumes and may facilitate understanding of mechanisms involved in shaping legume chromosomes.
Nakahata, Yasukazu; Yoshida, Mayumi; Takano, Atsuko; Soma, Haruhiko; Yamamoto, Takuro; Yasuda, Akio; Nakatsu, Toru; Takumi, Toru
2008-01-01
Background The circadian expression of the mammalian clock genes is based on transcriptional feedback loops. Two basic helix-loop-helix (bHLH) PAS (for Period-Arnt-Sim) domain-containing transcriptional activators, CLOCK and BMAL1, are known to regulate gene expression by interacting with a promoter element termed the E-box (CACGTG). The non-canonical E-boxes or E-box-like sequences have also been reported to be necessary for circadian oscillation. Results We report a new cis-element required for cell-autonomous circadian transcription of clock genes. This new element consists of a canonical E-box or a non-canonical E-box and an E-box-like sequence in tandem with the latter with a short interval, 6 base pairs, between them. We demonstrate that both E-box or E-box-like sequences are needed to generate cell-autonomous oscillation. We also verify that the spacing nucleotides with constant length between these 2 E-elements are crucial for robust oscillation. Furthermore, by in silico analysis we conclude that several clock and clock-controlled genes possess a direct repeat of the E-box-like elements in their promoter region. Conclusion We propose a novel possible mechanism regulated by double E-box-like elements, not to a single E-box, for circadian transcriptional oscillation. The direct repeat of the E-box-like elements identified in this study is the minimal required element for the generation of cell-autonomous transcriptional oscillation of clock and clock-controlled genes. PMID:18177499
A review of bioinformatic methods for forensic DNA analyses.
Liu, Yao-Yuan; Harbison, SallyAnn
2018-03-01
Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used. Copyright © 2017 Elsevier B.V. All rights reserved.
Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.
Sharma, S; Raina, S N
2005-01-01
A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.
Genome Dynamics and Evolution of the Mla (Powdery Mildew) Resistance Locus in BarleyW⃞
Wei, Fusheng; Wing, Rod A.; Wise, Roger P.
2002-01-01
Genes that confer defense against pathogens often are clustered in the genome and evolve via diverse mechanisms. To evaluate the organization and content of a major defense gene complex in cereals, we determined the complete sequence of a 261-kb BAC contig from barley cv Morex that spans the Mla (powdery mildew) resistance locus. Among the 32 predicted genes on this contig, 15 are associated with plant defense responses; 6 of these are associated with defense responses to powdery mildew disease but function in different signaling pathways. The Mla region is organized as three gene-rich islands separated by two nested complexes of transposable elements and a 45-kb gene-poor region. A heterochromatic-like region is positioned directly proximal to Mla and is composed of a gene-poor core with 17 families of diverse tandem repeats that overlap a hypermethylated, but transcriptionally active, gene-dense island. Paleontology analysis of long terminal repeat retrotransposons indicates that the present Mla region evolved over a period of >7 million years through a variety of duplication, inversion, and transposon-insertion events. Sequence-based recombination estimates indicate that R genes positioned adjacent to nested long terminal repeat retrotransposons, such as Mla, do not favor recombination as a means of diversification. We present a model for the evolution of the Mla region that encompasses several emerging features of large cereal genomes. PMID:12172030
Signal recognition particle RNA in dinoflagellates and the Perkinsid Perkinsus marinus.
Zhang, Huan; Campbell, David A; Sturm, Nancy R; Rosenblad, Magnus A; Dungan, Christopher F; Lin, Senjie
2013-09-01
In dinoflagellates and perkinsids, the molecular structure of the protein translocating machinery is unclear. Here, we identified several types of full-length signal recognition particle (SRP) RNA genes from Karenia brevis (dinoflagellate) and Perkinsus marinus (perkinsid). We also identified the four SRP S-domain proteins, but not the two Alu domain proteins, from P. marinus and several dinoflagellates. We mapped both ends of SRP RNA transcripts from K. brevis and P. marinus, and obtained the 3' end from four other dinoflagellates. The lengths of SRP RNA are predicted to be ∼260-300 nt in dinoflagellates and 280-285 nt in P. marinus. Although these SRP RNA sequences are substantially variable, the predicted structures are similar. The genomic organization of the SRP RNA gene differs among species. In K. brevis, this gene is located downstream of the spliced leader (SL) RNA, either as SL RNA-SRP RNA-tRNA gene tandem repeats, or within a SL RNA-SRP RNA-tRNA-U6-5S rRNA gene cluster. In other dinoflagellates, SRP RNA does not cluster with SL RNA or 5S rRNA genes. The majority of P. marinus SRP RNA genes array as tandem repeats without the above-mentioned small RNA genes. Our results capture a snapshot of a potentially complex evolutionary history of SRP RNA in alveolates. Copyright © 2013 Elsevier GmbH. All rights reserved.
Short tandem repeat analysis in Japanese population.
Hashiyada, M
2000-01-01
Short tandem repeats (STRs), known as microsatellites, are one of the most informative genetic markers for characterizing biological materials. Because of the relatively small size of STR alleles (generally 100-350 nucleotides), amplification by polymerase chain reaction (PCR) is relatively easy, affording a high sensitivity of detection. In addition, STR loci can be amplified simultaneously in a multiplex PCR. Thus, substantial information can be obtained in a single analysis with the benefits of using less template DNA, reducing labor, and reducing the contamination. We investigated 14 STR loci in a Japanese population living in Sendai by three multiplex PCR kits, GenePrint PowerPlex 1.1 and 2.2. Fluorescent STR System (Promega, Madison, WI, USA) and AmpF/STR Profiler (Perkin-Elmer, Norwalk, CT, USA). Genomic DNA was extracted using sodium dodecyl sulfate (SDS) proteinase K or Chelex 100 treatment followed by the phenol/chloroform extraction. PCR was performed according to the manufacturer's protocols. Electrophoresis was carried out on an ABI 377 sequencer and the alleles were determined by GeneScan 2.0.2 software (Perkin-Elmer). In 14 STRs loci, statistical parameters indicated a relatively high rate, and no significant deviation from Hardy-Weinberg equilibrium was detected. We apply this STR system to paternity testing and forensic casework, e.g., personal identification in rape cases. This system is an effective tool in the forensic sciences to obtain information on individual identification.
McDowall, Kenneth J.; Thamchaipenet, Arinthip; Hunter, Iain S.
1999-01-01
Physiological studies have shown that Streptomyces rimosus produces the polyketide antibiotic oxytetracycline abundantly when its mycelial growth is limited by phosphate starvation. We show here that transcripts originating from the promoter for one of the biosynthetic genes, otcC (encoding anhydrotetracycline oxygenase), and from a promoter for the divergent otcX genes peak in abundance at the onset of antibiotic production induced by phosphate starvation, indicating that the synthesis of oxytetracycline is controlled, at least in part, at the level of transcription. Furthermore, analysis of the sequences of the promoters for otcC, otcX, and the polyketide synthase (otcY) genes revealed tandem repeats having significant similarity to the DNA-binding sites of ActII-Orf4 and DnrI, which are Streptomyces antibiotic regulatory proteins (SARPs) related to the OmpR family of transcription activators. Together, the above results suggest that oxytetracycline production by S. rimosus requires a SARP-like transcription factor that is either produced or activated or both under conditions of low phosphate concentrations. We also provide evidence consistent with the otrA resistance gene being cotranscribed with otcC as part of a polycistronic message, suggesting a simple mechanism of coordinate regulation which ensures that resistance to the antibiotic increases in proportion to production. PMID:10322002
Böhme, M U; Fritzsch, G; Tippmann, A; Schlegel, M; Berendonk, T U
2007-06-01
For the first time the complete mitochondrial genome was sequenced for a member of Lacertidae. Lacerta viridis viridis was sequenced in order to compare the phylogenetic relationships of this family to other reptilian lineages. Using the long-polymerase chain reaction (long PCR) we characterized a mitochondrial genome, 17,156 bp long showing a typical vertebrate pattern with 13 protein coding genes, 22 transfer RNAs (tRNA), two ribosomal RNAs (rRNA) and one major noncoding region. The noncoding region of L. v. viridis was characterized by a conspicuous 35 bp tandem repeat at its 5' terminus. A phylogenetic study including all currently available squamate mitochondrial sequences demonstrates the position of Lacertidae within a monophyletic squamate group. We obtained a narrow relationship of Lacertidae to Scincidae, Iguanidae, Varanidae, Anguidae, and Cordylidae. Although, the internal relationships within this group yielded only a weak resolution and low bootstrap support, the revealed relationships were more congruent with morphological studies than with recent molecular analyses.
Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights
Kumar, Amitha Sampath; Sowpati, Divya Tej; Mishra, Rakesh K.
2016-01-01
Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features. PMID:27893794
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, P; Garcia, E
2003-02-06
The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Kelly, Laura J; Renny-Byfield, Simon; Pellicer, Jaume; Macas, Jiří; Novák, Petr; Neumann, Pavel; Lysak, Martin A; Day, Peter D; Berger, Madeleine; Fay, Michael F; Nichols, Richard A; Leitch, Andrew R; Leitch, Ilia J
2015-10-01
Plants exhibit an extraordinary range of genome sizes, varying by > 2000-fold between the smallest and largest recorded values. In the absence of polyploidy, changes in the amount of repetitive DNA (transposable elements and tandem repeats) are primarily responsible for genome size differences between species. However, there is ongoing debate regarding the relative importance of amplification of repetitive DNA versus its deletion in governing genome size. Using data from 454 sequencing, we analysed the most repetitive fraction of some of the largest known genomes for diploid plant species, from members of Fritillaria. We revealed that genomic expansion has not resulted from the recent massive amplification of just a handful of repeat families, as shown in species with smaller genomes. Instead, the bulk of these immense genomes is composed of highly heterogeneous, relatively low-abundance repeat-derived DNA, supporting a scenario where amplified repeats continually accumulate due to infrequent DNA removal. Our results indicate that a lack of deletion and low turnover of repetitive DNA are major contributors to the evolution of extremely large genomes and show that their size cannot simply be accounted for by the activity of a small number of high-abundance repeat families. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Quantitative analysis of TALE-DNA interactions suggests polarity effects.
Meckler, Joshua F; Bhakta, Mital S; Kim, Moon-Soo; Ovadia, Robert; Habrian, Chris H; Zykovich, Artem; Yu, Abigail; Lockwood, Sarah H; Morbitzer, Robert; Elsäesser, Janett; Lahaye, Thomas; Segal, David J; Baldwin, Enoch P
2013-04-01
Transcription activator-like effectors (TALEs) have revolutionized the field of genome engineering. We present here a systematic assessment of TALE DNA recognition, using quantitative electrophoretic mobility shift assays and reporter gene activation assays. Within TALE proteins, tandem 34-amino acid repeats recognize one base pair each and direct sequence-specific DNA binding through repeat variable di-residues (RVDs). We found that RVD choice can affect affinity by four orders of magnitude, with the relative RVD contribution in the order NG > HD ≈ NN > NI > NK. The NN repeat preferred the base G over A, whereas the NK repeat bound G with 10(3)-fold lower affinity. We compared AvrBs3, a naturally occurring TALE that recognizes its target using some atypical RVD-base combinations, with a designed TALE that precisely matches 'standard' RVDs with the target bases. This comparison revealed unexpected differences in sensitivity to substitutions of the invariant 5'-T. Another surprising observation was that base mismatches at the 5' end of the target site had more disruptive effects on affinity than those at the 3' end, particularly in designed TALEs. These results provide evidence that TALE-DNA recognition exhibits a hitherto un-described polarity effect, in which the N-terminal repeats contribute more to affinity than C-terminal ones.
Cassandra retrotransposons carry independently transcribed 5S RNA
Kalendar, Ruslan; Tanskanen, Jaakko; Chang, Wei; Antonius, Kristiina; Sela, Hanan; Peleg, Ofer; Schulman, Alan H.
2008-01-01
We report a group of TRIMs (terminal-repeat retrotransposons in miniature), which are small nonautonomous retrotransposons. These elements, named Cassandra, universally carry conserved 5S RNA sequences and associated RNA polymerase (pol) III promoters and terminators in their long terminal repeats (LTRs). They were found in all vascular plants investigated. Uniquely for LTR retrotransposons, Cassandra produces noncapped, polyadenylated transcripts from the 5S pol III promoter. Capped, read-through transcripts containing Cassandra sequences can also be detected in RNA and in EST databases. The predicted Cassandra RNA 5S secondary structures resemble those for cellular 5S rRNA, with high information content specifically in the pol III promoter region. Genic integration sites are common for Cassandra, an unusual feature for abundant retrotransposons. The 5S in each LTR produces a tandem 5S arrangement with an inter-5S spacing resembling that of cellular 5S. The distribution of 5S genes is very variable in flowering plants and may be partially explained by Cassandra activity. Cassandra thus appears both to have adapted a ubiquitous cellular gene for ribosomal RNA for use as a promoter and to parasitize an as-yet-unidentified group of retrotransposons for the proteins needed in its lifecycle. PMID:18408163
Ait-Arkoub, Zaïna; Voujon, Delphine; Deback, Claire; Abrao, Emiliana P.; Agut, Henri; Boutolleau, David
2013-01-01
The complete 154-kbp linear double-stranded genomic DNA sequence of herpes simplex virus 2 (HSV-2), consisting of two extended regions of unique sequences bounded by a pair of inverted repeat elements, was published in 1998 and since then has been widely employed in a wide range of studies. Throughout the HSV-2 genome are scattered 150 microsatellites (also referred to as short tandem repeats) of 1- to 6-nucleotide motifs, mainly distributed in noncoding regions. Microsatellites are considered reliable markers for genetic mapping to differentiate herpesvirus strains, as shown for cytomegalovirus and HSV-1. The aim of this work was to characterize 12 polymorphic microsatellites within the HSV-2 genome by use of 3 multiplex PCR assays in combination with length polymorphism analysis for the rapid genetic differentiation of 56 HSV-2 clinical isolates and 2 HSV-2 laboratory strains (gHSV-2 and MS). This new system was applied to a specific new HSV-2 variant recently identified in HIV-1-infected patients originating from West Africa. Our results confirm that microsatellite polymorphism analysis is an accurate tool for studying the epidemiology of HSV-2 infections. PMID:23966512
Marzo, Mar; Liu, Danxu; Ruiz, Alfredo; Chalmers, Ronald
2013-01-01
Galileo is a DNA transposon responsible for the generation of several chromosomal inversions in Drosophila. In contrast to other members of the P-element superfamily, it has unusually long terminal inverted-repeats (TIRs) that resemble those of Foldback elements. To investigate the function of the long TIRs we derived consensus and ancestral sequences for the Galileo transposase in three species of Drosophilids. Following gene synthesis, we expressed and purified their constituent THAP domains and tested their binding activity towards the respective Galileo TIRs. DNase I footprinting located the most proximal DNA binding site about 70 bp from the transposon end. Using this sequence we identified further binding sites in the tandem repeats that are found within the long TIRs. This suggests that the synaptic complex between Galileo ends may be a complicated structure containing higher-order multimers of the transposase. We also attempted to reconstitute Galileo transposition in Drosophila embryos but no events were detected. Thus, although the limited numbers of Galileo copies in each genome were sufficient to provide functional consensus sequences for the THAP domains, they do not specify a fully active transposase. Since the THAP recognition sequence is short, and will occur many times in a large genome, it seems likely that the multiple binding sites within the long, internally repetitive, TIRs of Galileo and other Foldback-like elements may provide the transposase with its binding specificity. PMID:23648487
Linking maternal and somatic 5S rRNA types with different sequence-specific non-LTR retrotransposons
Pagano, Johanna F.B.; Ensink, Wim A.; van Olst, Marina; van Leeuwen, Selina; Nehrdich, Ulrike; Zhu, Kongju; Spaink, Herman P.; Girard, Geneviève; Rauwerda, Han; Jonker, Martijs J.; Dekker, Rob J.
2017-01-01
5S rRNA is a ribosomal core component, transcribed from many gene copies organized in genomic repeats. Some eukaryotic species have two 5S rRNA types defined by their predominant expression in oogenesis or adult tissue. Our next-generation sequencing study on zebrafish egg, embryo, and adult tissue identified maternal-type 5S rRNA that is exclusively accumulated during oogenesis, replaced throughout the embryogenesis by a somatic-type, and thus virtually absent in adult somatic tissue. The maternal-type 5S rDNA contains several thousands of gene copies on chromosome 4 in tandem repeats with small intergenic regions, whereas the somatic-type is present in only 12 gene copies on chromosome 18 with large intergenic regions. The nine-nucleotide variation between the two 5S rRNA types likely affects TFIII binding and riboprotein L5 binding, probably leading to storage of maternal-type rRNA. Remarkably, these sequence differences are located exactly at the sequence-specific target site for genome integration by the 5S rRNA-specific Mutsu retrotransposon family. Thus, we could define maternal- and somatic-type MutsuDr subfamilies. Furthermore, we identified four additional maternal-type and two new somatic-type MutsuDr subfamilies, each with their own target sequence. This target-site specificity, frequently intact maternal-type retrotransposon elements, plus specific presence of Mutsu retrotransposon RNA and piRNA in egg and adult tissue, suggest an involvement of retrotransposons in achieving the differential copy number of the two types of 5S rDNA loci. PMID:28003516
Schliemann, Sibylle; Schmidt, Christina; Elsner, Peter
2014-01-01
The objective of our study was to investigate the tandem irritation potential of two organic solvents with concurrent exposure to the hydrophilic detergent irritant sodium lauryl sulphate (SLS). A tandem repeated irritation test was performed with two undiluted organic solvents, cumene (C) and octane (O), with either alternating application with SLS 0.5% or twice daily application of each irritant alone in 27 volunteers on the skin of the back. The cumulative irritation induced over 4 days was quantified using visual scoring and non-invasive bioengineering measurements (skin colour reflectance, skin hydration and transepidermal water loss). Repeated application of C/SLS and O/SLS induced more decline of stratum corneum hydration and higher degrees of clinical irritation and erythema compared to each irritant alone. Our results demonstrate a further example of additive harmful skin effects induced by particular skin irritants and indicate that exposure to organic solvents together with detergents may increase the risk of acquiring occupational contact dermatitis. © 2014 S. Karger AG, Basel.
Thermal denaturation of the BRCT tandem repeat region of human tumour suppressor gene product BRCA1.
Pyrpassopoulos, Serapion; Ladopoulou, Angela; Vlassi, Metaxia; Papanikolau, Yannis; Vorgias, Constantinos E; Yannoukakos, Drakoulis; Nounesis, George
2005-04-01
Reduced stability of the tandem BRCT domains of human BReast CAncer 1 (BRCA1) due to missense mutations may be critical for loss of function in DNA repair and damage-induced checkpoint control. In the present thermal denaturation study of the BRCA1 BRCT region, high-precision differential scanning calorimetry (DSC) and circular dichroism (CD) spectroscopy provide evidence for the existence of a denatured state that is structurally very similar to the native. Consistency between theoretical structure-based estimates of the enthalpy (DeltaH) and heat capacity change (DeltaCp) and the calorimetric results is obtained when considering partial thermal unfolding contained in the region of the conserved hydrophobic pocket formed at the interface of the two BRCT repeats. The structural integrity of this region has been shown to be crucial for the interaction of BRCA1 with phosphorylated peptides. In addition, cancer-causing missense mutations located at the inter-BRCT-repeat interface have been linked to the destabilization of the tandem BRCT structure.
Molecular typing of Chinese Streptococcus pyogenes isolates.
You, Yuanhai; Wang, Haibin; Bi, Zhenwang; Walker, Mark; Peng, Xianhui; Hu, Bin; Zhou, Haijian; Song, Yanyan; Tao, Xiaoxia; Kou, Zengqiang; Meng, Fanliang; Zhang, Menghan; Bi, Zhenqiang; Luo, Fengji; Zhang, Jianzhong
2015-06-01
Streptococcus pyogenes causes human infections ranging from mild pharyngitis and impetigo to serious diseases including necrotizing fasciitis and streptococcal toxic shock syndrome. The objective of this study was to compare molecular emm typing and pulsed field gel electrophoresis (PFGE) with multiple-locus variable-number tandem-repeat analysis (MLVA) for genotyping of Chinese S. pyogenes isolates. Molecular emm typing and PFGE were performed using standard protocols. Seven variable number tandem repeat (VNTR) loci reported in a previous study were used to genotype 169 S. pyogenes geographically-diverse isolates from China isolated from a variety of disease syndromes. Multiple-locus variable-number tandem-repeat analysis provided greater discrimination between isolates when compared to emm typing and PFGE. Removal of a single VNTR locus (Spy2) reduced the sensitivity by only 0.7%, which suggests that Spy2 was not informative for the isolates screened. The results presented support the use of MLVA as a powerful epidemiological tool for genotyping S. pyogenes clinical isolates. Copyright © 2015 Elsevier Ltd. All rights reserved.
Full-length model of the human galectin-4 and insights into dynamics of inter-domain communication
NASA Astrophysics Data System (ADS)
Rustiguel, Joane K.; Soares, Ricardo O. S.; Meisburger, Steve P.; Davis, Katherine M.; Malzbender, Kristina L.; Ando, Nozomi; Dias-Baruffi, Marcelo; Nonato, Maria Cristina
2016-09-01
Galectins are proteins involved in diverse cellular contexts due to their capacity to decipher and respond to the information encoded by β-galactoside sugars. In particular, human galectin-4, normally expressed in the healthy gastrointestinal tract, displays differential expression in cancerous tissues and is considered a potential drug target for liver and lung cancer. Galectin-4 is a tandem-repeat galectin characterized by two carbohydrate recognition domains connected by a linker-peptide. Despite their relevance to cell function and pathogenesis, structural characterization of full-length tandem-repeat galectins has remained elusive. Here, we investigate galectin-4 using X-ray crystallography, small- and wide-angle X-ray scattering, molecular modelling, molecular dynamics simulations, and differential scanning fluorimetry assays and describe for the first time a structural model for human galectin-4. Our results provide insight into the structural role of the linker-peptide and shed light on the dynamic characteristics of the mechanism of carbohydrate recognition among tandem-repeat galectins.
Cho, Seongbeom; Boxrud, David J; Bartkus, Joanne M; Whittam, Thomas S; Saeed, Mahdi
2007-01-01
Simplified multiple-locus variable-number tandem repeat analysis (MLVA) was developed using one-shot multiplex PCR for seven variable-number tandem repeats (VNTR) markers with high diversity capacity. MLVA, phage typing, and PFGE methods were applied on 34 diverse Salmonella Enteritidis isolates from human and non-human sources. MLVA detected allelic variations that helped to classify the S. Enteritidis isolates into more evenly distributed subtypes than other methods. MLVA-based S. Enteritidis clonal groups were largely associated with sources of the isolates. Nei's diversity indices for polymorphism ranged from 0.25 to 0.70 for seven VNTR loci markers. Based on Simpson's and Shannon's diversity indices, MLVA had a higher discriminatory power than pulsed field gel electrophoresis (PFGE), phage typing, or multilocus enzyme electrophoresis. Therefore, MLVA may be used along with PFGE to enhance the effectiveness of the molecular epidemiologic investigation of S. Enteritidis infections. PMID:17692097
JGI Plant Genomics Gene Annotation Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Shengqiang; Rokhsar, Dan; Goodstein, David
2014-07-14
Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less
A retrotransposable element from the mosquito Anopheles gambiae .
Besansky, N J
1990-01-01
A family of middle repetitive elements from the African malaria vector Anopheles gambiae is described. Approximately 100 copies of the element, designated T1Ag, are dispersed in the genome. Full-length elements are 4.6 kilobase pairs in length, but truncation of the 5' end is common. Nucleotide sequences of one full-length, two 5'-truncated, and two 5' ends of T1Ag elements were determined and aligned to define a consensus sequence. Sequence analysis revealed two long, overlapping open reading frames followed by a polyadenylation signal, AATAAA, and a tail consisting of tandem repetitions of the motif TGAAA. No direct or inverted long terminal repeats (LTRs) were detected. The first open reading frame, 442 amino acids in length, includes a domain resembling that of nucleic acid-binding proteins. The second open reading frame, 975 amino acids long, resembles the reverse transcriptases of a category of retrotransposable elements without LTRs, variously termed class II retrotransposons, class III elements or non-LTR retrotransposons. Similarity at the sequence and structural levels places T1Ag in this category. Images PMID:1689457
Satellite DNA in Plants: More than Just Rubbish.
Garrido-Ramos, Manuel A
2015-01-01
For decades, satellite DNAs have been the hidden part of genomes. Initially considered as junk DNA, there is currently an increasing appreciation of the functional significance of satellite DNA repeats and of their sequences. Satellite DNA families accumulate in the heterochromatin in different parts of the eukaryotic chromosomes, mainly in pericentromeric and subtelomeric regions, but they also span the functional centromere. Tandem repeat sequences may spread from subtelomeric to interstitial loci, leading to the formation of chromosome-specific loci or to the accumulation in equilocal sites in different chromosomes. They also appear as the main components of the heterochromatin in the sex-specific region of sex chromosomes. Satellite DNA, required for chromosome organization, also plays a role in pairing and segregation. Some satellite repeats are transcribed and can participate in the formation and maintenance of heterochromatin structure and in the modulation of gene expression. In addition to the identification of the different satellite DNA families, their characteristics and location, we are interested in determining their impact on the genomes, by identifying the mechanisms leading to their appearance and amplification as well as in understanding how they change over time, the factors affecting these changes, and the influence exerted by the evolutionary history of the organisms. On the other hand, satellite DNA sequences are rapidly evolving sequences that may cause reproductive barriers between organisms and promote speciation. The accumulation of experimental data collected in recent years and the emergence of new approaches based on next-generation sequencing and high-throughput genome analysis are opening new perspectives that are changing our understanding of satellite DNA. This review examines recent data to provide a timely update on the overall information gathered about this part of the genome, focusing on the advances in the knowledge of its origin, its evolution, and its potential functional roles. © 2015 S. Karger AG, Basel.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deidda, G.; Grisanti, P.; Vigneti, E.
1994-09-01
The gene for facioscapulohumeral muscular dystrophy (FSHD) has been localized by linkage analysis to the 4q35 region. The most telomeric p13E-11 prove has been shown to detect 4q35 DNA rearrangements in both sporadic and familial cases of the disease. With the aim of constructing a detailed physical map of the 4q35 region and searching for the mutant gene, we used p13E-11 probe to isolate cosmid clones from a human genomic library in a pCos-EMBL 2 vector. Two positive clones were isolated, clones 3 and 5, which partially overlap and carry human genomic inserts of 42 and 45 kb, respectively. Themore » cosmids share a common region containing the p13E-11 region and a stretch of KpnI units consisting of 3.2 kb tandemly repeated sequences (about 10). The restriction maps were constructed using the following enzymes: Bam HI, BgIII, Eco RI, EcoRV, KpnI and Sfi I. Clone 3 extends 4 kb upstream of C5 and stops within the Kpn repeats. Clone 5 extends 4 kb downstream from the Kpn repeats and it presents an additional EcoRI site. Clone 5 contains a stretch of Kpn sequences of nearly 32 kb, corresponding to 10 Kpn repeats; clone 3 contains a stretch of 29 kb corresponding to 9 Kpn repeats, as determined by PFGE analysis of partial digestion of the clones. Clone 5 seems to contain the entire Eco RI region prone to rearrangements in FSHD patients. From clone 5 several subclones were obtained, from the Kpn region and from the region spanning from the last Kpn repeat to the cloning site. No single copy sequences were detected. Subclones from the 3{prime} end region contain beta-satellite or Sau3A-like sequences. In situ hybridization with the whole C5 cosmid shows hybridization signals at the tip of chromosome 4 (4q35) and chromosome 10 (10q26), in the pericentromeric region of chromosome 1 (1q12) and in the p12 region of the acrocentric chromosomes (chr. 21, 22, 13, 14, 15).« less
Hu, Yaqin; Li, Shan; Li, Junhui; Ye, Xingqian; Ding, Tian; Liu, Donghong; Chen, Jianchu; Ge, Zhiwei; Chen, Shiguo
2015-12-10
Sea cucumber fucoidan is a major bioactive component of sea cucumber. The structures of fucoidans have significant influences on their biological activities. The present study clarified the delicate structure of a fucoidan from Pearsonothuria graeffei. Fucoidan was obtained after papain digestion and purified by ion chromatography. The carbohydrate sequence of fucoidan was firstly determined by negative-ion electrospray tandem mass spectrometry (ES-MS) with collision-induced dissociation of the oligosaccharide fragments, which were obtained by mild acid hydrolysis, and completed by NMR for assignment of the anomeric conformation. It was unambiguously identified as a tetrasaccharide repeating unit with a backbone of [ → 3Fuc (2S, 4S) α1 → 3Fucα1→ 3Fuc (4S) α1 → 3Fuc#7 × 10#]n. The glycosidic bonds between the non-sulfated and 2,4-O-disulfated fucose residues were selectively cleaved, and highly ordered oligosaccharide fragments with a tetrasaccharide repeating unit were obtained. The highly 4-O- and 2, 4-di-O-sulfated polysaccharide deserves further developments for Pharmacia use. Copyright © 2015 Elsevier Ltd. All rights reserved.
Topological characteristics of helical repeat proteins.
Groves, M R; Barford, D
1999-06-01
The recent elucidation of protein structures based upon repeating amino acid motifs, including the armadillo motif, the HEAT motif and tetratricopeptide repeats, reveals that they belong to the class of helical repeat proteins. These proteins share the common property of being assembled from tandem repeats of an alpha-helical structural unit, creating extended superhelical structures that are ideally suited to create a protein recognition interface.
Voelker, Toni A.; Staswick, Paul; Chrispeels, Maarten J.
1986-01-01
Phytohemagglutinin (PHA), the seed lectin of the common bean, Phaseolus vulgaris, is encoded by two highly homologous, tandemly linked genes, dlec1 and dlec2, which are coordinately expressed at high levels in developing cotyledons. Their respective transcripts translate into closely related polypeptides, PHA-E and PHA-L, constituents of the tetrameric lectin which accumulates at high levels in developing seeds. In the bean cultivar Pinto UI111, PHA-E is not detectable, and PHA-L accumulates at very reduced levels. To investigate the cause of the Pinto phenotype, we cloned and sequenced the two PHA genes of Pinto, called Pdlec1 and Pdlec2, and determined the abundance of their respective mRNAs in developing cotyledons. Both genes are more than 90% homologous to the normal PHA genes found in other cultivars. Pdlec1 carries a 1-bp frameshift mutation close to the 5' end of its coding sequence. Only very truncated polypeptides could be made from its mRNA. The gene Pdlec2 encodes a polypeptide, which resembles PHA-L and its predicted amino acid sequence agrees with the available Pinto PHA amino acid sequence data. Analysis of the mRNA of developing cotyledons revealed that the Pdlec1 message is reduced 600-fold, and Pdlec2 mRNA is reduced 20-fold with respect to mRNA levels in normal cultivars. A comparison of the sequences which are upstream from the coding sequence shows that Pdlec2 has a 100-bp deletion compared to the other genes (dlec1, dlec2 and Pdlec1). This deletion which contains a large tandem repeat may be responsible for the low level of expression of Pdlec2. The very low expression of Pdlec1 is as yet unexplained. ImagesFig. 5. PMID:16453730
Sidorenko, Lyudmila; Dorweiler, Jane E; Cigan, A Mark; Arteaga-Vazquez, Mario; Vyas, Meenal; Kermicle, Jerry; Jurcin, Diane; Brzeski, Jan; Cai, Yu; Chandler, Vicki L
2009-11-01
Paramutation involves homologous sequence communication that leads to meiotically heritable transcriptional silencing. We demonstrate that mop2 (mediator of paramutation2), which alters paramutation at multiple loci, encodes a gene similar to Arabidopsis NRPD2/E2, the second-largest subunit of plant-specific RNA polymerases IV and V. In Arabidopsis, Pol-IV and Pol-V play major roles in RNA-mediated silencing and a single second-largest subunit is shared between Pol-IV and Pol-V. Maize encodes three second-largest subunit genes: all three genes potentially encode full length proteins with highly conserved polymerase domains, and each are expressed in multiple overlapping tissues. The isolation of a recessive paramutation mutation in mop2 from a forward genetic screen suggests limited or no functional redundancy of these three genes. Potential alternative Pol-IV/Pol-V-like complexes could provide maize with a greater diversification of RNA-mediated transcriptional silencing machinery relative to Arabidopsis. Mop2-1 disrupts paramutation at multiple loci when heterozygous, whereas previously silenced alleles are only up-regulated when Mop2-1 is homozygous. The dramatic reduction in b1 tandem repeat siRNAs, but no disruption of silencing in Mop2-1 heterozygotes, suggests the major role for tandem repeat siRNAs is not to maintain silencing. Instead, we hypothesize the tandem repeat siRNAs mediate the establishment of the heritable silent state-a process fully disrupted in Mop2-1 heterozygotes. The dominant Mop2-1 mutation, which has a single nucleotide change in a domain highly conserved among all polymerases (E. coli to eukaryotes), disrupts both siRNA biogenesis (Pol-IV-like) and potentially processes downstream (Pol-V-like). These results suggest either the wild-type protein is a subunit in both complexes or the dominant mutant protein disrupts both complexes. Dominant mutations in the same domain in E. coli RNA polymerase suggest a model for Mop2-1 dominance: complexes containing Mop2-1 subunits are non-functional and compete with wild-type complexes.
Using long ssDNA polynucleotides to amplify STRs loci in degraded DNA samples
Pérez Santángelo, Agustín; Corti Bielsa, Rodrigo M.; Sala, Andrea; Ginart, Santiago; Corach, Daniel
2017-01-01
Obtaining informative short tandem repeat (STR) profiles from degraded DNA samples is a challenging task usually undermined by locus or allele dropouts and peak-high imbalances observed in capillary electrophoresis (CE) electropherograms, especially for those markers with large amplicon sizes. We hereby show that the current STR assays may be greatly improved for the detection of genetic markers in degraded DNA samples by using long single stranded DNA polynucleotides (ssDNA polynucleotides) as surrogates for PCR primers. These long primers allow a closer annealing to the repeat sequences, thereby reducing the length of the template required for the amplification in fragmented DNA samples, while at the same time rendering amplicons of larger sizes suitable for multiplex assays. We also demonstrate that the annealing of long ssDNA polynucleotides does not need to be fully complementary in the 5’ region of the primers, thus allowing for the design of practically any long primer sequence for developing new multiplex assays. Furthermore, genotyping of intact DNA samples could also benefit from utilizing long primers since their close annealing to the target STR sequences may overcome wrong profiling generated by insertions/deletions present between the STR region and the annealing site of the primers. Additionally, long ssDNA polynucleotides might be utilized in multiplex PCR assays for other types of degraded or fragmented DNA, e.g. circulating, cell-free DNA (ccfDNA). PMID:29099837
Evolution of Transcription Activator-Like Effectors in Xanthomonas oryzae
Erkes, Annett; Reschke, Maik; Boch, Jens
2017-01-01
Abstract Transcription activator-like effectors (TALEs) are secreted by plant–pathogenic Xanthomonas bacteria into plant cells where they act as transcriptional activators and, hence, are major drivers in reprogramming the plant for the benefit of the pathogen. TALEs possess a highly repetitive DNA-binding domain of typically 34 amino acid (AA) tandem repeats, where AA 12 and 13, termed repeat variable di-residue (RVD), determine target specificity. Different Xanthomonas strains possess different repertoires of TALEs. Here, we study the evolution of TALEs from the level of RVDs determining target specificity down to the level of DNA sequence with focus on rice-pathogenic Xanthomonas oryzae pv. oryzae (Xoo) and Xanthomonas oryzae pv. oryzicola (Xoc) strains. We observe that codon pairs coding for individual RVDs are conserved to a similar degree as the flanking repeat sequence. We find strong indications that TALEs may evolve 1) by base substitutions in codon pairs coding for RVDs, 2) by recombination of N-terminal or C-terminal regions of existing TALEs, or 3) by deletion of individual TALE repeats, and we propose possible mechanisms. We find indications that the reassortment of TALE genes in clusters is mediated by an integron-like mechanism in Xoc. We finally study the effect of the presence/absence and evolutionary modifications of TALEs on transcriptional activation of putative target genes in rice, and find that even single RVD swaps may lead to considerable differences in activation. This correlation allowed a refined prediction of TALE targets, which is the crucial step to decipher their virulence activity. PMID:28637323
Ordered mapping of 3 alphoid DNA subsets on human chromosome 22
DOE Office of Scientific and Technical Information (OSTI.GOV)
Antonacci, R.; Baldini, A.; Archidiacono, N.
1994-09-01
Alpha satellite DNA consists of tandemly repeated monomers of 171 bp clustered in the centromeric region of primate chromosomes. Sequence divergence between subsets located in different human chromosomes is usually high enough to ensure chromosome-specific hybridization. Alphoid probes specific for almost every human chromosome have been reported. A single chromosome can carry different subsets of alphoid DNA and some alphoid subsets can be shared by different chromosomes. We report the physical order of three alphoid DNA subsets on human chromosome 22 determined by a combination of low and high resolution cytological mapping methods. Results visually demonstrate the presence of threemore » distinct alphoid DNA domains at the centromeric region of chromosome 22. We have measured the interphase distances between the three probes in three-color FISH experiments. Statistical analysis of the results indicated the order of the subsets. Two color experiments on prometaphase chromosomes established the order of the three domains relative to the arms of chromosome 22 and confirmed the results obtained using interphase mapping. This demonstrates the applicability of interphase mapping for alpha satellite DNA orderering. However, in our experiments, interphase mapping did not provide any information about the relationship between extremities of the repeat arrays. This information was gained from extended chromatin hybridization. The extremities of two of the repeat arrays were seen to be almost overlapping whereas the third repeat array was clearly separated from the other two. Our data show the value of extended chromatin hybridization as a complement of other cytological techniques for high resolution mapping of repetitive DNA sequences.« less
Takemura, Yuzuru; Miyachi, Hayato; Skelton, Lorraine; Jackman, Ann L.
1995-01-01
One of the resistance mechanisms to folate‐based thymidylate synthase (TS) inhibitors is the increase in TS activity in tumor cells. Human B lymphoblastoid cell line (W1L2) was made resistant to a lipophilic non‐polyglutamatable TS inhibitor (ZM249148), and the subline (W1L2:R179) showed a 20‐fold increase in TS enzyme activity with concomitant overexpression of TS mRNA. To overcome the resistance, we designed a ribozyme that can cleave the CUC sequences in a triple tandemly repeated sequence of TS mRNA. Expression of this ribozyme in W1L2:R179 cells transfected with Epstein Barr virus‐based expression vector resulted in sensitization to TS inhibitors concomitantly with a decrease of TS expression. The ribozyme expressed in transfectants was shown to be functional in cleaving artificial TS RNA in vitro. PMID:8567390
Ford, Laura; Wang, Qinning; Stafford, Russell; Ressler, Kelly-Anne; Norton, Sophie; Shadbolt, Craig; Hope, Kirsty; Franklin, Neil; Krsteski, Radomir; Carswell, Adrienne; Carter, Glen P; Seemann, Torsten; Howard, Peter; Valcanis, Mary; Castillo, Cristina Fabiola Sotomayor; Bates, John; Glass, Kathryn; Williamson, Deborah A; Sintchenko, Vitali; Howden, Benjamin P; Kirk, Martyn D
2018-05-01
Salmonella Typhimurium is a common cause of foodborne illness in Australia. We report on seven outbreaks of Salmonella Typhimurium multilocus variable-number tandem-repeat analysis (MLVA) 03-26-13-08-523 (European convention 2-24-12-7-0212) in three Australian states and territories investigated between November 2015 and March 2016. We identified a common egg grading facility in five of the outbreaks. While no Salmonella Typhimurium was detected at the grading facility and eggs could not be traced back to a particular farm, whole genome sequencing (WGS) of isolates from cases from all seven outbreaks indicated a common source. WGS was able to provide higher discriminatory power than MLVA and will likely link more Salmonella Typhimurium cases between states and territories in the future. National harmonization of Salmonella surveillance is important for effective implementation of WGS for Salmonella outbreak investigations.
The role of the environment in transmission of Dichelobacter nodosus between ewes and their lambs
Muzafar, Mohd; Calvo-Bado, Leo A.; Green, Laura E.; Smith, Edward M.; Russell, Claire L.; Grogono-Thomas, Rose; Wellington, Elizabeth M.H.
2015-01-01
Dichelobacter nodosus (D. nodosus) is the essential causative agent of footrot in sheep. The current study investigated when D. nodosus was detectable on newborn lambs and possible routes of transmission. Specific qPCR was used to detect and quantify the load of D. nodosus in foot swabs of lambs at birth and 5–13 h post-partum, and their mothers 5–13 h post-partum; and in samples of bedding, pasture, soil and faeces. D. nodosus was not detected on the feet of newborn lambs swabbed at birth, but was detected 5–13 h after birth, once they had stood on bedding containing naturally occurring D. nodosus. Multiple genotypes identified by cloning and sequencing a marker gene, pgrA, and by multi locus variable number tandem repeat analysis (MLVA) of community DNA from swabs on individual feet indicated a mixed population of D. nodosus was present on the feet of both ewes and lambs. There was high variation in pgrA tandem repeat number (between 3 and 21 repeats), and multiple MLVA types. The overall similarity index between the populations on ewes and lambs was 0.45, indicating moderate overlap. Mother offspring pairs shared some alleles but not all, suggesting lambs were infected from sources(s) other than just their mother's feet. We hypothesise that D. nodosus is transferred to the feet of lambs via bedding containing naturally occurring populations of D. nodosus, probably as a result of transfer from the feet of the group of housed ewes. The results support the hypothesis that the environment plays a key role in the transmission of D. nodosus between ewes and lambs. PMID:25953734
Hattori, Eiji; Nakajima, Mizuho; Yamada, Kazuo; Iwayama, Yoshimi; Toyota, Tomoko; Saitou, Naruya; Yoshikawa, Takeo
2009-01-01
Associations have been reported between the variable number of tandem repeat (VNTR) polymorphisms in the exon 3 of dopamine D4 receptor gene gene and multiple psychiatric illnesses/traits. We examined the distribution of VNTR alleles of different length in a Japanese cohort and found that, as reported earlier, the size of allele ‘7R' was much rarer (0.5%) in Japanese than in Caucasian populations (∼20%). This presents a challenge to an earlier proposed hypothesis that positive selection favoring the allele 7R has contributed to its high frequency. To further address the issue of selection, we carried out sequencing of the VNTR region not only from human but also from chimpanzee samples, and made inference on the ancestral repeat motif and haplotype by use of a phylogenetic analysis program. The most common 4R variant was considered to be the ancestral haplotype as earlier proposed. However, in a gene tree of VNTR constructed on the basis of this inferred ancestral haplotype, the allele 7R had five descendent haplotypes in relatively long lineage, where genetic drift can have major influence. We also tested this length polymorphism for association with schizophrenia, studying two Japanese sample sets (one with 570 cases and 570 controls, and the other with 124 pedigrees). No evidence of association between the allele 7R and schizophrenia was found in any of the two data sets. Collectively, this study suggests that the VNTR variation does not have an effect large enough to cause either selection or a detectable association with schizophrenia in a study of samples of moderate size. PMID:19092778
Expressed sequence tags from the plant trypanosomatid Phytomonas serpens.
Pappas, Georgios J; Benabdellah, Karim; Zingales, Bianca; González, Antonio
2005-08-01
We have generated 2190 expressed sequence tags (ESTs) from a cDNA library of the plant trypanosomatid Phytomonas serpens. Upon processing and clustering the set of 1893 accepted sequences was reduced to 697 clusters consisting of 452 singletons and 245 contigs. Functional categories were assigned based on BLAST searches against a database of the eukaryotic orthologous groups of proteins (KOG). Thirty six percent of the generated sequences showed no hits against the KOG database and 39.6% presented similarity to the KOG classes corresponding to translation, ribosomal structure and biogenesis. The most populated cluster contained 45 ESTs homologous to members of the glucose transporter family. This fact can be immediately correlated to the reported Phytomonas dependence on anaerobic glycolytic ATP production due to the lack of cytochrome-mediated respiratory chain. In this context, not only a number of enzymes of the glycolytic pathway were identified but also of the Krebs cycle as well as specific components of the respiratory chain. The data here reported, including a few hundred unique sequences and the description of tandemly repeated motifs and putative transcript stability motifs at untranslated mRNA ends, represent an initial approach to overcome the lack of information on the molecular biology of this organism.
Tian, Shi-Lin; Li, Zheng; Li, Li; Shah, S N M; Gong, Zhen-Hui
2017-07-01
Capsanthin/capsorubin synthase ( Ccs ) gene is a key gene that regulates the synthesis of capsanthin and the development of red coloration in pepper fruits. There are three tandem repeat units in the promoter region of Ccs , but the potential effects of the number of repetitive units on the transcriptional regulation of Ccs has been unclear. In the present study, expression vectors carrying different numbers of repeat units of the Ccs promoter were constructed, and the transient expression of the β-glucuronidase ( GUS ) gene was used to detect differences in expression levels associated with the promoter fragments. These repeat fragments and the plant expression vector PBI121 containing the 35s CaMV promoter were ligated to form recombinant vectors that were transfected into Agrobacterium tumefaciens GV3101. A fluorescence spectrophotometer was used to analyze the expression associated with the various repeat units. It was concluded that the constructs containing at least one repeat were associated with GUS expression, though they did not differ from one another. This repeating unit likely plays a role in transcription and regulation of Ccs expression.
Takahashi, Hajime; Ohshima, Chihiro; Nakagawa, Miku; Thanatsang, Krittaporn; Phraephaisarn, Chirapiphat; Chaturongkasumrit, Yuphakhun; Keeratipibul, Suwimon; Kuda, Takashi; Kimura, Bon
2014-01-01
Listeria innocua is an important hygiene indicator bacterium in food industries because it behaves similar to Listeria monocytogenes, which is pathogenic to humans. PFGE is often used to characterize bacterial strains and to track contamination source. However, because PFGE is an expensive, complicated, time-consuming protocol, and poses difficulty in data sharing, development of a new typing method is necessary. MLVA is a technique that identifies bacterial strains on the basis of the number of tandem repeats present in the genome varies depending on the strains. MLVA has gained attention due to its high reproducibility and ease of data sharing. In this study, we developed a MLVA protocol to assess L. innocua and evaluated it by tracking the contamination source of L. innocua in an actual food manufacturing factory by typing the bacterial strains isolated from the factory. Three VNTR regions of the L. innocua genome were chosen for use in the MLVA. The number of repeat units in each VNTR region was calculated based on the results of PCR product analysis using capillary electrophoresis (CE). The calculated number of repetitions was compared with the results of the gene sequence analysis to demonstrate the accuracy of the CE repeat number analysis. The developed technique was evaluated using 60 L. innocua strains isolated from a food factory. These 60 strains were classified into 11 patterns using MLVA. Many of the strains were classified into ST-6, revealing that this MLVA strain type can contaminate each manufacturing process in the factory. The MLVA protocol developed in this study for L. innocua allowed rapid and easy analysis through the use of CE. This technique was found to be very useful in hygiene control in factories because it allowed us to track contamination sources and provided information regarding whether the bacteria were present in the factories.
Detecting long tandem duplications in genomic sequences.
Audemard, Eric; Schiex, Thomas; Faraut, Thomas
2012-05-08
Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.
Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice.
Ma, Jianxin; Bennetzen, Jeffrey L
2006-01-10
Centromeres have many unusual biological properties, including kinetochore attachment and severe repression of local meiotic recombination. These properties are partly an outcome, partly a cause, of unusual DNA structure in the centromeric region. Although several plant and animal genomes have been sequenced, most centromere sequences have not been completed or analyzed in depth. To shed light on the unique organization, variability, and evolution of centromeric DNA, detailed analysis of a 1.97-Mb sequence that includes centromere 8 (CEN8) of japonica rice was undertaken. Thirty-three long-terminal repeat (LTR)-retrotransposon families (including 11 previously unknown) were identified in the CEN8 region, totaling 245 elements and fragments that account for 67% of the region. The ratio of solo LTRs to intact elements in the CEN8 region is approximately 0.9:1, compared with approximately 2.2:1 in noncentromeric regions of rice. However, the ratio of solo LTRs to intact elements in the core of the CEN8 region ( approximately 2.5:1) is higher than in any other region investigated in rice, suggesting a hotspot for unequal recombination. Comparison of the CEN8 region of japonica and its orthologous segments from indica rice indicated that approximately 15% of the intact retrotransposons and solo LTRs were inserted into CEN8 after the divergence of japonica and indica from a common ancestor, compared with approximately 50% for previously studied euchromatic regions. Frequent DNA rearrangements were observed in the CEN8 region, including a 212-kb subregion that was found to be composed of three rearranged tandem repeats. Phylogenetic analysis also revealed recent segmental duplication and extensive rearrangement and reshuffling of the CentO satellite repeats.
Disease-associated repeat instability and mismatch repair.
Schmidt, Monika H M; Pearson, Christopher E
2016-02-01
Expanded tandem repeat sequences in DNA are associated with at least 40 human genetic neurological, neurodegenerative, and neuromuscular diseases. Repeat expansion can occur during parent-to-offspring transmission, and arise at variable rates in specific tissues throughout the life of an affected individual. Since the ongoing somatic repeat expansions can affect disease age-of-onset, severity, and progression, targeting somatic expansion holds potential as a therapeutic target. Thus, understanding the factors that regulate this mutation is crucial. DNA repair, in particular mismatch repair (MMR), is the major driving force of disease-associated repeat expansions. In contrast to its anti-mutagenic roles, mammalian MMR curiously drives the expansion mutations of disease-associated (CAG)·(CTG) repeats. Recent advances have broadened our knowledge of both the MMR proteins involved in disease repeat expansions, including: MSH2, MSH3, MSH6, MLH1, PMS2, and MLH3, as well as the types of repeats affected by MMR, now including: (CAG)·(CTG), (CGG)·(CCG), and (GAA)·(TTC) repeats. Mutagenic slipped-DNA structures have been detected in patient tissues, and the size of the slip-out and their junction conformation can determine the involvement of MMR. Furthermore, the formation of other unusual DNA and R-loop structures is proposed to play a key role in MMR-mediated instability. A complex correlation is emerging between tissues showing varying amounts of repeat instability and MMR expression levels. Notably, naturally occurring polymorphic variants of DNA repair genes can have dramatic effects upon the levels of repeat instability, which may explain the variation in disease age-of-onset, progression and severity. An increasing grasp of these factors holds prognostic and therapeutic potential. Copyright © 2015 Elsevier B.V. All rights reserved.
Transcription of tandemly repetitive DNA: functional roles.
Biscotti, Maria Assunta; Canapa, Adriana; Forconi, Mariko; Olmo, Ettore; Barucca, Marco
2015-09-01
A considerable fraction of the eukaryotic genome is made up of satellite DNA constituted of tandemly repeated sequences. These elements are mainly located at centromeres, pericentromeres, and telomeres and are major components of constitutive heterochromatin. Although originally satellite DNA was thought silent and inert, an increasing number of studies are providing evidence on its transcriptional activity supporting, on the contrary, an unexpected dynamicity. This review summarizes the multiple structural roles of satellite noncoding RNAs at chromosome level. Indeed, satellite noncoding RNAs play a role in the establishment of a heterochromatic state at centromere and telomere. These highly condensed structures are indispensable to preserve chromosome integrity and genome stability, preventing recombination events, and ensuring the correct chromosome pairing and segregation. Moreover, these RNA molecules seem to be involved also in maintaining centromere identity and in elongation, capping, and replication of telomere. Finally, the abnormal variation of centromeric and pericentromeric DNA transcription across major eukaryotic lineages in stress condition and disease has evidenced the critical role that these transcripts may play and the potentially dire consequences for the organism.
Liu, Feng; Melton, James T; Bi, Yuping
2017-10-01
To further understand the trends in the evolution of mitochondrial genomes (mitogenomes or mtDNAs) in the Ulvophyceae, the mitogenomes of two separate thalli of Ulva pertusa were sequenced. Two U. pertusa mitogenomes (Up1 and Up2) were 69,333 bp and 64,602 bp in length. These mitogenomes shared two ribosomal RNAs (rRNAs), 28 transfer RNAs (tRNAs), 29 protein-coding genes, and 12 open reading frames. The 4.7 kb difference in size was attributed to variation in intron content and tandem repeat regions. A total of six introns were present in the smaller U. pertusa mtDNA (Up2), while the larger mtDNA (Up1) had eight. The larger mtDNA had two additional group II introns in two genes (cox1 and cox2) and tandem duplication mutations in noncoding regions. Our results showed the first case of intraspecific variation in chlorophytan mitogenomes and provided further genomic data for the undersampled Ulvophyceae. © 2017 Phycological Society of America.
Al-Saadi, Abdulwahid; Reddy, Joseph D; Duan, Yong P; Brunings, Asha M; Yuan, Qiaoping; Gabriel, Dean W
2007-08-01
Citrus canker disease is caused by five groups of Xanthomonas citri strains that are distinguished primarily by host range: three from Asia (A, A*, and A(w)) and two that form a phylogenetically distinct clade and originated in South America (B and C). Every X. citri strain carries multiple DNA fragments that hybridize with pthA, which is essential for the pathogenicity of wide-host-range X. citri group A strain 3213. DNA fragments that hybridized with pthA were cloned from a representative strain from all five groups. Each strain carried one and only one pthA homolog that functionally complemented a knockout mutation of pthA in 3213. Every complementing homolog was of identical size to pthA and carried 17.5 nearly identical, direct tandem repeats, including three new genes from narrow-host-range groups C (pthC), A(w) (pthAW), and A* (pthA*). Every noncomplementing paralog was of a different size; one of these was sequenced from group A* (pthA*-2) and was found to have an intact promoter and full-length reading frame but with 15.5 repeats. None of the complementing homologs nor any of the noncomplementing paralogs conferred avirulence to 3213 on grapefruit or suppressed avirulence of a group A* strain on grapefruit. A knockout mutation of pthC in a group C strain resulted in loss of pathogenicity on lime, but the strain was unaffected in ability to elicit an HR on grapefruit. This pthC- mutant was fully complemented by pthA, pthB, or pthC. Analysis of the predicted amino-acid sequences of all functional pthA homologs and nonfunctional paralogs indicated that the specific sequence of the 17th repeat may be essential for pathogenicity of X. citri on citrus.
Kurushima, J. D.; Lipinski, M. J.; Gandolfi, B.; Froenicke, L.; Grahn, J. C.; Grahn, R. A.; Lyons, L. A.
2012-01-01
Summary Both cat breeders and the lay public have interests in the origins of their pets, not only in the genetic identity of the purebred individuals, but also the historical origins of common household cats. The cat fancy is a relatively new institution with over 85% of its 40–50 breeds arising only in the past 75 years, primarily through selection on single-gene aesthetic traits. The short, yet intense cat breed history poses a significant challenge to the development of a genetic marker-based breed identification strategy. Using different breed assignment strategies and methods, 477 cats representing 29 fancy breeds were analysed with 38 short tandem repeats, 148 intergenic and five phenotypic single nucleotide polymorphisms. Results suggest the frequentist method of Paetkau (accuracy single nucleotide polymorphisms = 0.78, short tandem repeats = 0.88) surpasses the Bayesian method of Rannala and Mountain (single nucleotide polymorphisms = 0.56, short tandem repeats = 0.83) for accurate assignment of individuals to the correct breed. Additionally, a post-assignment verification step with the five phenotypic single nucleotide polymorphisms accurately identified between 0.31 and 0.58 of the mis-assigned individuals raising the sensitivity of assignment with the frequentist method to 0.89 and 0.92 single nucleotide polymorphisms and short tandem repeats respectively. This study provides a novel multi-step assignment strategy and suggests that, despite their short breed history and breed family groupings, a majority of cats can be assigned to their proper breed or population of origin, i.e. race. PMID:23171373
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adams-Cioaba, Melanie A.; Guo, Yahong; Bian, ChuanBing
Expansion of the CGG trinucleotide repeat in the 5'-untranslated region of the FMR1, fragile X mental retardation 1, gene results in suppression of protein expression for this gene and is the underlying cause of Fragile X syndrome. In unaffected individuals, the FMRP protein, together with two additional paralogues (Fragile X Mental Retardation Syndrome-related Protein 1 and 2), associates with mRNA to form a ribonucleoprotein complex in the nucleus that is transported to dendrites and spines of neuronal cells. It is thought that the fragile X family of proteins contributes to the regulation of protein synthesis at sites where mRNAs aremore » locally translated in response to stimuli. Here, we report the X-ray crystal structures of the non-canonical nuclear localization signals of the FXR1 and FXR2 autosomal paralogues of FMRP, which were determined at 2.50 and 1.92 {angstrom}, respectively. The nuclear localization signals of the FXR1 and FXR2 comprise tandem Tudor domain architectures, closely resembling that of UHRF1, which is proposed to bind methylated histone H3K9. The FMRP, FXR1 and FXR2 proteins comprise a small family of highly conserved proteins that appear to be important in translational regulation, particularly in neuronal cells. The crystal structures of the N-terminal tandem Tudor domains of FXR1 and FXR2 revealed a conserved architecture with that of FMRP. Biochemical analysis of the tandem Tudor doamins reveals their ability to preferentially recognize trimethylated peptides in a sequence-specific manner.« less
MULTIPLE-LOCUS VARIABLE-NUMBER TANDEM REPEAT ANALYSIS OF BRUCELLA ISOLATES FROM THAILAND.
Kumkrong, Khurawan; Chankate, Phanita; Tonyoung, Wittawat; Intarapuk, Apiradee; Kerdsin, Anusak; Kalambaheti, Thareerat
2017-01-01
Brucellosis-induced abortion can result in significant economic loss to farm animals. Brucellosis can be transmitted to humans during slaughter of infected animals or via consumption of contaminated food products. Strain identification of Brucella isolates can reveal the route of transmission. Brucella strains were isolated from vaginal swabs of farm animal, cow milk and from human blood cultures. Multiplex PCR was used to identify Brucella species, and owing to high DNA homology among Brucella isolates, multiple-locus variable-number tandem repeat analysis (MLVA) based on the number of tandem repeats at 16 different genomic loci was used for strain identification. Multiplex PCR categorized the isolates into B. abortus (n = 7), B. melitensis (n = 37), B. suis (n = 3), and 5 of unknown Brucella spp. MLVA-16 clustering analysis differentiated the strains into various genotypes, with Brucella isolates from the same geographic region being closely related, and revealed that the Thai isolates were phylogenetically distinct from those in other countries, including within the Southeast Asian region. Thus, MLVA-16 typing has utility in epidemiological studies.
Hoogenboom, Jerry; van der Gaag, Kristiaan J; de Leeuw, Rick H; Sijen, Titia; de Knijff, Peter; Laros, Jeroen F J
2017-03-01
Massively parallel sequencing (MPS) is on the advent of a broad scale application in forensic research and casework. The improved capabilities to analyse evidentiary traces representing unbalanced mixtures is often mentioned as one of the major advantages of this technique. However, most of the available software packages that analyse forensic short tandem repeat (STR) sequencing data are not well suited for high throughput analysis of such mixed traces. The largest challenge is the presence of stutter artefacts in STR amplifications, which are not readily discerned from minor contributions. FDSTools is an open-source software solution developed for this purpose. The level of stutter formation is influenced by various aspects of the sequence, such as the length of the longest uninterrupted stretch occurring in an STR. When MPS is used, STRs are evaluated as sequence variants that each have particular stutter characteristics which can be precisely determined. FDSTools uses a database of reference samples to determine stutter and other systemic PCR or sequencing artefacts for each individual allele. In addition, stutter models are created for each repeating element in order to predict stutter artefacts for alleles that are not included in the reference set. This information is subsequently used to recognise and compensate for the noise in a sequence profile. The result is a better representation of the true composition of a sample. Using Promega Powerseq™ Auto System data from 450 reference samples and 31 two-person mixtures, we show that the FDSTools correction module decreases stutter ratios above 20% to below 3%. Consequently, much lower levels of contributions in the mixed traces are detected. FDSTools contains modules to visualise the data in an interactive format allowing users to filter data with their own preferred thresholds. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Li, Shu-Fen; Zhang, Guo-Jun; Yuan, Jin-Hong; Deng, Chuan-Liang; Gao, Wu-Jun
2016-05-01
The present review discusses the roles of repetitive sequences played in plant sex chromosome evolution, and highlights epigenetic modification as potential mechanism of repetitive sequences involved in sex chromosome evolution. Sex determination in plants is mostly based on sex chromosomes. Classic theory proposes that sex chromosomes evolve from a specific pair of autosomes with emergence of a sex-determining gene(s). Subsequently, the newly formed sex chromosomes stop recombination in a small region around the sex-determining locus, and over time, the non-recombining region expands to almost all parts of the sex chromosomes. Accumulation of repetitive sequences, mostly transposable elements and tandem repeats, is a conspicuous feature of the non-recombining region of the Y chromosome, even in primitive one. Repetitive sequences may play multiple roles in sex chromosome evolution, such as triggering heterochromatization and causing recombination suppression, leading to structural and morphological differentiation of sex chromosomes, and promoting Y chromosome degeneration and X chromosome dosage compensation. In this article, we review the current status of this field, and based on preliminary evidence, we posit that repetitive sequences are involved in sex chromosome evolution probably via epigenetic modification, such as DNA and histone methylation, with small interfering RNAs as the mediator.
Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Lee, Hyun Oh; Joh, Ho Jun; Kim, Nam-Hoon; Park, Hyun-Seung; Yang, Tae-Jin
2015-01-01
We report complete sequences of chloroplast (cp) genome and 45S nuclear ribosomal DNA (45S nrDNA) for 11 Panax ginseng cultivars. We have obtained complete sequences of cp and 45S nrDNA, the representative barcoding target sequences for cytoplasm and nuclear genome, respectively, based on low coverage NGS sequence of each cultivar. The cp genomes sizes ranged from 156,241 to 156,425 bp and the major size variation was derived from differences in copy number of tandem repeats in the ycf1 gene and in the intergenic regions of rps16-trnUUG and rpl32-trnUAG. The complete 45S nrDNA unit sequences were 11,091 bp, representing a consensus single transcriptional unit with an intergenic spacer region. Comparative analysis of these sequences as well as those previously reported for three Chinese accessions identified very rare but unique polymorphism in the cp genome within P. ginseng cultivars. There were 12 intra-species polymorphisms (six SNPs and six InDels) among 14 cultivars. We also identified five SNPs from 45S nrDNA of 11 Korean ginseng cultivars. From the 17 unique informative polymorphic sites, we developed six reliable markers for analysis of ginseng diversity and cultivar authentication. PMID:26061692
Locati, Mauro D; Pagano, Johanna F B; Ensink, Wim A; van Olst, Marina; van Leeuwen, Selina; Nehrdich, Ulrike; Zhu, Kongju; Spaink, Herman P; Girard, Geneviève; Rauwerda, Han; Jonker, Martijs J; Dekker, Rob J; Breit, Timo M
2017-04-01
5S rRNA is a ribosomal core component, transcribed from many gene copies organized in genomic repeats. Some eukaryotic species have two 5S rRNA types defined by their predominant expression in oogenesis or adult tissue. Our next-generation sequencing study on zebrafish egg, embryo, and adult tissue identified maternal-type 5S rRNA that is exclusively accumulated during oogenesis, replaced throughout the embryogenesis by a somatic-type, and thus virtually absent in adult somatic tissue. The maternal-type 5S rDNA contains several thousands of gene copies on chromosome 4 in tandem repeats with small intergenic regions, whereas the somatic-type is present in only 12 gene copies on chromosome 18 with large intergenic regions. The nine-nucleotide variation between the two 5S rRNA types likely affects TFIII binding and riboprotein L5 binding, probably leading to storage of maternal-type rRNA. Remarkably, these sequence differences are located exactly at the sequence-specific target site for genome integration by the 5S rRNA-specific Mutsu retrotransposon family. Thus, we could define maternal- and somatic-type MutsuDr subfamilies. Furthermore, we identified four additional maternal-type and two new somatic-type MutsuDr subfamilies, each with their own target sequence. This target-site specificity, frequently intact maternal-type retrotransposon elements, plus specific presence of Mutsu retrotransposon RNA and piRNA in egg and adult tissue, suggest an involvement of retrotransposons in achieving the differential copy number of the two types of 5S rDNA loci. © 2017 Locati et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Palmieri, Ferdinando; Agrimi, Gennaro; Blanco, Emanuela; Castegna, Alessandra; Di Noia, Maria A; Iacobazzi, Vito; Lasorsa, Francesco M; Marobbio, Carlo M T; Palmieri, Luigi; Scarcia, Pasquale; Todisco, Simona; Vozza, Angelo; Walker, John
2006-01-01
The inner membranes of mitochondria contain a family of carrier proteins that are responsible for the transport in and out of the mitochondrial matrix of substrates, products, co-factors and biosynthetic precursors that are essential for the function and activities of the organelle. This family of proteins is characterized by containing three tandem homologous sequence repeats of approximately 100 amino acids, each folded into two transmembrane alpha-helices linked by an extensive polar loop. Each repeat contains a characteristic conserved sequence. These features have been used to determine the extent of the family in genome sequences. The genome of Saccharomyces cerevisiae contains 34 members of the family. The identity of five of them was known before the determination of the genome sequence, but the functions of the remaining family members were not. This review describes how the functions of 15 of these previously unknown transport proteins have been determined by a strategy that consists of expressing the genes in Escherichia coli or Saccharomyces cerevisiae, reconstituting the gene products into liposomes and establishing their functions by transport assay. Genetic and biochemical evidence as well as phylogenetic considerations have guided the choice of substrates that were tested in the transport assays. The physiological roles of these carriers have been verified by genetic experiments. Various pieces of evidence point to the functions of six additional members of the family, but these proposals await confirmation by transport assay. The sequences of many of the newly identified yeast carriers have been used to characterize orthologs in other species, and in man five diseases are presently known to be caused by defects in specific mitochondrial carrier genes. The roles of eight yeast mitochondrial carriers remain to be established.
High-throughput analysis of the satellitome illuminates satellite DNA evolution
NASA Astrophysics Data System (ADS)
Ruiz-Ruano, Francisco J.; López-León, María Dolores; Cabrero, Josefa; Camacho, Juan Pedro M.
2016-07-01
Satellite DNA (satDNA) is a major component yet the great unknown of eukaryote genomes and clearly underrepresented in genome sequencing projects. Here we show the high-throughput analysis of satellite DNA content in the migratory locust by means of the bioinformatic analysis of Illumina reads with the RepeatExplorer and RepeatMasker programs. This unveiled 62 satDNA families and we propose the term “satellitome” for the whole collection of different satDNA families in a genome. The finding that satDNAs were present in many contigs of the migratory locust draft genome indicates that they show many genomic locations invisible by fluorescent in situ hybridization (FISH). The cytological pattern of five satellites showing common descent (belonging to the SF3 superfamily) suggests that non-clustered satDNAs can become into clustered through local amplification at any of the many genomic loci resulting from previous dissemination of short satDNA arrays. The fact that all kinds of satDNA (micro- mini- and satellites) can show the non-clustered and clustered states suggests that all these elements are mostly similar, except for repeat length. Finally, the presence of VNTRs in bacteria, showing similar properties to non-clustered satDNAs in eukaryotes, suggests that this kind of tandem repeats show common properties in all living beings.
Analyses of carnivore microsatellites and their intimate association with tRNA-derived SINEs.
López-Giráldez, Francesc; Andrés, Olga; Domingo-Roura, Xavier; Bosch, Montserrat
2006-10-23
The popularity of microsatellites has greatly increased in the last decade on account of their many applications. However, little is currently understood about the factors that influence their genesis and distribution among and within species genomes. In this work, we analyzed carnivore microsatellite clones from GenBank to study their association with interspersed repeats and elucidate the role of the latter in microsatellite genesis and distribution. We constructed a comprehensive carnivore microsatellite database comprising 1236 clones from GenBank. Thirty-three species of 11 out of 12 carnivore families were represented, although two distantly related species, the domestic dog and cat, were clearly overrepresented. Of these clones, 330 contained tRNALys-derived SINEs and 357 contained other interspersed repeats. Our rough estimates of tRNA SINE copies per haploid genome were much higher than published ones. Our results also revealed a distinct juxtaposition of AG and A-rich repeats and tRNALys-derived SINEs suggesting their coevolution. Both microsatellites arose repeatedly in two regions of the interspersed repeat. Moreover, microsatellites associated with tRNALys-derived SINEs showed the highest complexity and less potential instability. Our results suggest that tRNALys-derived SINEs are a significant source for microsatellite generation in carnivores, especially for AG and A-rich repeat motifs. These observations indicate two modes of microsatellite generation: the expansion and variation of pre-existing tandem repeats and the conversion of sequences with high cryptic simplicity into a repeat array; mechanisms which are not specific to tRNALys-derived SINEs. Microsatellite and interspersed repeat coevolution could also explain different distribution of repeat types among and within species genomes.Finally, due to their higher complexity and lower potential informative content of microsatellites associated with tRNALys-derived SINEs, we recommend avoiding their use as genetic markers.
SV40 host-substituted variants: a new look at the monkey DNA inserts and recombinant junctions.
Singer, Maxine; Winocour, Ernest
2011-04-10
The available monkey genomic data banks were examined in order to determine the chromosomal locations of the host DNA inserts in 8 host-substituted SV40 variant DNAs. Five of the 8 variants contained more than one linked monkey DNA insert per tandem repeat unit and in all cases but one, the 19 monkey DNA inserts in the 8 variants mapped to different locations in the monkey genome. The 50 parental DNAs (32 monkey and 18 SV40 DNA segments) which spanned the crossover and flanking regions that participated in monkey/monkey and monkey/SV40 recombinations were characterized by substantial levels of microhomology of up to 8 nucleotides in length; the parental DNAs also exhibited direct and inverted repeats at or adjacent to the crossover sequences. We discuss how the host-substituted SV40 variants arose and the nature of the recombination mechanisms involved. Copyright © 2011 Elsevier Inc. All rights reserved.
2013-01-01
Background Clavibacter michiganensis subsp. michiganensis (Cmm) causes bacterial wilt and canker in tomato. Cmm is present nearly in all European countries. During the last three years several local outbreaks were detected in Belgium. The lack of a convenient high-resolution strain-typing method has hampered the study of the routes of transmission of Cmm and epidemiology in tomato cultivation. In this study the genetic relatedness among a worldwide collection of Cmm strains and their relatives was approached by gyrB and dnaA gene sequencing. Further, we developed and applied a multilocus variable number of tandem repeats analysis (MLVA) scheme to discriminate among Cmm strains. Results A phylogenetic analysis of gyrB and dnaA gene sequences of 56 Cmm strains demonstrated that Belgian Cmm strains from recent outbreaks of 2010–2012 form a genetically uniform group within the Cmm clade, and Cmm is phylogenetically distinct from other Clavibacter subspecies and from non-pathogenic Clavibacter-like strains. MLVA conducted with eight minisatellite loci detected 25 haplotypes within Cmm. All strains from Belgian outbreaks, isolated between 2010 and 2012, together with two French strains from 2010 seem to form one monomorphic group. Regardless of the isolation year, location or tomato cultivar, Belgian strains from recent outbreaks belonged to the same haplotype. On the contrary, strains from diverse geographical locations or isolated over longer periods of time formed mostly singletons. Conclusions We hypothesise that the introduction might have originated from one lot of seeds or contaminated tomato seedlings that was the source of the outbreak in 2010 and that these Cmm strains persisted and induced infection in 2011 and 2012. Our results demonstrate that MLVA is a promising typing technique for a local surveillance and outbreaks investigation in epidemiological studies of Cmm. PMID:23738754
Alizadeh, F; Bozorgmehr, A; Tavakkoly-Bazzaz, J; Ohadi, M
2018-06-01
Differential expansion of a number of human short tandem repeats (STRs) at the critical core promoter and 5' untranslated region (UTR) support the hypothesis that at least some of these STRs may provide a selective advantage in human evolution. Following a genome-wide screen of all human protein-coding gene 5' UTRs based on the Ensembl database ( http://www.ensembl.org ), we previously reported that the longest STR in this interval is a (GA) 32 , which belongs to the X-linked zinc finger MYM-type containing 3 (ZMYM3) gene. In the present study, we analyzed the evolutionary implication of this region across evolution and examined the allele and genotype distribution of the "exceptionally long" STR by direct sequencing of 486 Iranian unrelated male subjects consisting of 196 cases of schizophrenia (SCZ) and 290 controls. We found that the ZMYM3 transcript containing the STR is human-specific (ENST00000373998.5). A significant allele variance difference was observed between the cases and controls (Levene's test for equality of variances F = 4.00, p < 0.03). In addition, six alleles were observed in the SCZ patients that were not detected in the control group ("disease-only" alleles) (mid p exact < 0.0003). Those alleles were at the extreme short and long ends of the allele distribution curve and composed 4% of the genotypes in the SCZ group. In conclusion, we found skewing of the genetic architecture at the ZMYM3 STR in SCZ. Further, we found a bell-shaped distribution of alleles and selection against alleles at the extreme ends of this STR. The ZMYM3 STR sets a prototype, the evolutionary course of which determines the range of alleles in a particular species. Extreme "disease-only" alleles and genotypes may change our perspective of adaptive evolution and complex disorders. The ZMYM3 gene "exceptionally long" STR should be sequenced in SCZ and other human-specific phenotypes/characteristics.
Zaluga, Joanna; Stragier, Pieter; Van Vaerenbergh, Johan; Maes, Martine; De Vos, Paul
2013-06-05
Clavibacter michiganensis subsp. michiganensis (Cmm) causes bacterial wilt and canker in tomato. Cmm is present nearly in all European countries. During the last three years several local outbreaks were detected in Belgium. The lack of a convenient high-resolution strain-typing method has hampered the study of the routes of transmission of Cmm and epidemiology in tomato cultivation. In this study the genetic relatedness among a worldwide collection of Cmm strains and their relatives was approached by gyrB and dnaA gene sequencing. Further, we developed and applied a multilocus variable number of tandem repeats analysis (MLVA) scheme to discriminate among Cmm strains. A phylogenetic analysis of gyrB and dnaA gene sequences of 56 Cmm strains demonstrated that Belgian Cmm strains from recent outbreaks of 2010-2012 form a genetically uniform group within the Cmm clade, and Cmm is phylogenetically distinct from other Clavibacter subspecies and from non-pathogenic Clavibacter-like strains. MLVA conducted with eight minisatellite loci detected 25 haplotypes within Cmm. All strains from Belgian outbreaks, isolated between 2010 and 2012, together with two French strains from 2010 seem to form one monomorphic group. Regardless of the isolation year, location or tomato cultivar, Belgian strains from recent outbreaks belonged to the same haplotype. On the contrary, strains from diverse geographical locations or isolated over longer periods of time formed mostly singletons. We hypothesise that the introduction might have originated from one lot of seeds or contaminated tomato seedlings that was the source of the outbreak in 2010 and that these Cmm strains persisted and induced infection in 2011 and 2012. Our results demonstrate that MLVA is a promising typing technique for a local surveillance and outbreaks investigation in epidemiological studies of Cmm.
Sequence Ready Characterization of the Pericentromeric Region of 19p12
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evan E. Eichler
2006-08-31
Current mapping and sequencing strategies have been inadequate within the proximal portion of 19p12 due, in part, to the presence of a recently expanded ZNF (zinc-finger) gene family and the presence of large (25-50 kb) inverted beta-satellite repeat structures which bracket this tandemly duplicated gene family. The virtual of absence of classically defined “unique” sequence within the region has hampered efforts to identify and characterize a suitable minimal tiling path of clones which can be used as templates required for finished sequencing of the region. The goal of this proposal is to develop and implement a novel sequence-anchor strategy tomore » generate a contiguous BAC map of the most proximal portion of chromosome 19p12 for the purpose of complete sequence characterization. The target region will be an estimated 4.5 Mb of DNA extending from STS marker D19S450 (the beginning of the ZNF gene cluster) to the centromeric (alpha-satellite) junction of 19p11. The approach will entail 1) pre-selection of 19p12 BAC and cosmid clones (NIH approved library) utilizing both 19p12 -unique and 19p12-SPECIFIC repeat probes (Eichler et al., 1998); 2) the generation of a BAC/cosmid end-sequence map across the region with a density of one marker every 8kb; 3) the development of a second-generation of STS (sequence tagged sites) which will be used to identify and verify clonal overlap at the level of the sequence; 4) incorporation of these sequence-anchored overlapping clones into existing cosmid/BAC restriction maps developed at Livermore National Laboratory; and 5) validation of the organization of this region utilizing high-resolution FISH techniques (extended chromatin analysis) on monochromosomal 19 somatic cell hybrids and parental cell lines of source material. The data generated will be used in the selection of the most parsimonious tiling path of BAC clones to be sequenced as part of the JGI effort on chromosome 19 and should serve as a model for the sequence characterization of other difficult regions of the human genome« less
Harrison, Thomas; Ruiz, Jaime; Sloan, Daniel B.; Ben-Hur, Asa; Boucher, Christina
2016-01-01
Pentatricopeptide repeat containing proteins (PPRs) bind to RNA transcripts originating from mitochondria and plastids. There are two classes of PPR proteins. The P class contains tandem P-type motif sequences, and the PLS class contains alternating P, L and S type sequences. In this paper, we describe a novel tool that predicts PPR-RNA interaction; specifically, our method, which we call aPPRove, determines where and how a PLS-class PPR protein will bind to RNA when given a PPR and one or more RNA transcripts by using a combinatorial binding code for site specificity proposed by Barkan et al. Our results demonstrate that aPPRove successfully locates how and where a PPR protein belonging to the PLS class can bind to RNA. For each binding event it outputs the binding site, the amino-acid-nucleotide interaction, and its statistical significance. Furthermore, we show that our method can be used to predict binding events for PLS-class proteins using a known edit site and the statistical significance of aligning the PPR protein to that site. In particular, we use our method to make a conjecture regarding an interaction between CLB19 and the second intronic region of ycf3. The aPPRove web server can be found at www.cs.colostate.edu/~approve. PMID:27560805
Detecting microsatellites within genomes: significant variation among algorithms.
Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe
2007-04-18
Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.
Detecting microsatellites within genomes: significant variation among algorithms
Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe
2007-01-01
Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102
Date Palm Genetic Diversity Analysis Using Microsatellite Polymorphism.
Khierallah, Hussam S M; Bader, Saleh M; Hamwieh, Alladin; Baum, Michael
2017-01-01
Date palm (Phoenix dactylifera L.) is considered one of the great socioeconomic resources in the Middle East and the Arab regions. The tree has been and still is at the center of the comprehensive agricultural development. The number of known date palm cultivars, distributed worldwide, is approximately 3000. The success of genetic diversity conservation or any breeding program depends on an understanding of the amount and distribution of the genetic variation already in existence in the genetic pool. Development of suitable DNA molecular markers for this tree may allow researchers to estimate genetic diversity, which will ultimately lead to the genetic conservation of date palm. Simple sequence repeats (SSRs) are DNA strands, consisting of tandemly repeated mono-, di-, tri-, tetra-, or penta-nucleotide units that are arranged throughout the genomes of most eukaryotic species. Microsatellite markers, developed from genomic libraries, belong to either the transcribed region or the non-transcribed region of the genome, and there is rarely available information on their functions. Microsatellite sequences are especially suited to distinguish closely related genotypes due to a high degree of variability making them ideally suitable in population studies and the identification of closely related cultivars. This chapter focuses on the methods employed to characterize date palm genotypes using SSR markers.
Coincidence of synteny breakpoints with malignancy-related deletions on human chromosome 3
Kost-Alimova, Maria; Kiss, Hajnalka; Fedorova, Ludmila; Yang, Ying; Dumanski, Jan P.; Klein, George; Imreh, Stefan
2003-01-01
We have found previously that during tumor growth intact human chromosome 3 transferred into tumor cells regularly looses certain 3p regions, among them the ≈1.4-Mb common eliminated region 1 (CER1) at 3p21.3. Fluorescence in situ hybridization analysis of 12 mouse orthologous loci revealed that CER1 splits into two segments in mouse and therefore contains a murine/human conservation breakpoint region (CBR). Several breaks occurred in tumors within the region surrounding the CBR, and this sequence has features that characterize unstable chromosomal regions: deletions in yeast artificial chromosome clones, late replication, gene and segment duplications, and pseudogene insertions. Sequence analysis of the entire 3p12-22 revealed that other cancer-associated deletions (regions eliminated from monochromosomal hybrids carrying an intact chromosome 3 during tumor growth and homozygous deletions found in human tumors) colocalized nonrandomly with murine/human CBRs and were characterized by an increased number of local gene duplications and murine/human conservation mismatches (single genes that do not match into the conserved chromosomal segment). The CBR within CER1 contains a simple tandem TATAGA repeat capable of forming a 40-bp-long secondary hairpin-like structure. This repeat is nonrandomly localized within the other tumor-associated deletions and in the vicinity of 3p12-22 CBRs. PMID:12738884
Valletta, Elisa; Kučera, Lukáš; Prokeš, Lubomír; Amato, Filippo; Pivetta, Tiziana; Hampl, Aleš; Havel, Josef; Vaňhara, Petr
2016-01-01
Cross-contamination of eukaryotic cell lines used in biomedical research represents a highly relevant problem. Analysis of repetitive DNA sequences, such as Short Tandem Repeats (STR), or Simple Sequence Repeats (SSR), is a widely accepted, simple, and commercially available technique to authenticate cell lines. However, it provides only qualitative information that depends on the extent of reference databases for interpretation. In this work, we developed and validated a rapid and routinely applicable method for evaluation of cell culture cross-contamination levels based on mass spectrometric fingerprints of intact mammalian cells coupled with artificial neural networks (ANNs). We used human embryonic stem cells (hESCs) contaminated by either mouse embryonic stem cells (mESCs) or mouse embryonic fibroblasts (MEFs) as a model. We determined the contamination level using a mass spectra database of known calibration mixtures that served as training input for an ANN. The ANN was then capable of correct quantification of the level of contamination of hESCs by mESCs or MEFs. We demonstrate that MS analysis, when linked to proper mathematical instruments, is a tangible tool for unraveling and quantifying heterogeneity in cell cultures. The analysis is applicable in routine scenarios for cell authentication and/or cell phenotyping in general.
Prokeš, Lubomír; Amato, Filippo; Pivetta, Tiziana; Hampl, Aleš; Havel, Josef; Vaňhara, Petr
2016-01-01
Cross-contamination of eukaryotic cell lines used in biomedical research represents a highly relevant problem. Analysis of repetitive DNA sequences, such as Short Tandem Repeats (STR), or Simple Sequence Repeats (SSR), is a widely accepted, simple, and commercially available technique to authenticate cell lines. However, it provides only qualitative information that depends on the extent of reference databases for interpretation. In this work, we developed and validated a rapid and routinely applicable method for evaluation of cell culture cross-contamination levels based on mass spectrometric fingerprints of intact mammalian cells coupled with artificial neural networks (ANNs). We used human embryonic stem cells (hESCs) contaminated by either mouse embryonic stem cells (mESCs) or mouse embryonic fibroblasts (MEFs) as a model. We determined the contamination level using a mass spectra database of known calibration mixtures that served as training input for an ANN. The ANN was then capable of correct quantification of the level of contamination of hESCs by mESCs or MEFs. We demonstrate that MS analysis, when linked to proper mathematical instruments, is a tangible tool for unraveling and quantifying heterogeneity in cell cultures. The analysis is applicable in routine scenarios for cell authentication and/or cell phenotyping in general. PMID:26821236
Characterization of the COL2A1 VNTR polymorphism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berg, E.S.; Olaisen, B.
1993-05-01
The variable number of tandem repeat (VNTR) region 3{prime} to the collagen type II gene (COL2A1) was amplified in vitro by the polymerase chain reaction. Subsequent high-resolution gel electrophoresis showed that the five earlier reported alleles could be further subtyped. A total of 17 allelic variants with a heterozygosity of 73.0% were found in 202 unrelated Norwegians. DNA sequencing of 19 COL2A1 alleles has been performed. The internal organization of the VNTR was common for all alleles, as previously shown for a few alleles. Moreover, the polymorphism in the COL2A1 locus is mainly due to variation in the numbers ofmore » copies of two repeat units, containing 34 and 31 bp, respectively, and/or to small deletions in either of the two units. DNA sequencing of alleles with the same electrophoretic size revealed no heterogeneity such as an alternating order of the different units, a feature that might have been expected to be the result of unequal crossing-over events. The observed ordered structure of the VNTR and the possibility of single-stranded DNA from the cores in the VNTR forming hairpins and loops suggest that the COL2A1 polymorphism may have evolved mainly by replication slippage mechanisms. 23 refs., 2 figs., 3 tabs.« less
Wolfgruber, Thomas K; Sharma, Anupma; Schneider, Kevin L; Albert, Patrice S; Koo, Dal-Hoe; Shi, Jinghua; Gao, Zhi; Han, Fangpu; Lee, Hyeran; Xu, Ronghui; Allison, Jamie; Birchler, James A; Jiang, Jiming; Dawe, R Kelly; Presting, Gernot G
2009-11-01
We describe a comprehensive and general approach for mapping centromeres and present a detailed characterization of two maize centromeres. Centromeres are difficult to map and analyze because they consist primarily of repetitive DNA sequences, which in maize are the tandem satellite repeat CentC and interspersed centromeric retrotransposons of maize (CRM). Centromeres are defined epigenetically by the centromeric histone H3 variant, CENH3. Using novel markers derived from centromere repeats, we have mapped all ten centromeres onto the physical and genetic maps of maize. We were able to completely traverse centromeres 2 and 5, confirm physical maps by fluorescence in situ hybridization (FISH), and delineate their functional regions by chromatin immunoprecipitation (ChIP) with anti-CENH3 antibody followed by pyrosequencing. These two centromeres differ substantially in size, apparent CENH3 density, and arrangement of centromeric repeats; and they are larger than the rice centromeres characterized to date. Furthermore, centromere 5 consists of two distinct CENH3 domains that are separated by several megabases. Succession of centromere repeat classes is evidenced by the fact that elements belonging to the recently active recombinant subgroups of CRM1 colonize the present day centromeres, while elements of the ancestral subgroups are also found in the flanking regions. Using abundant CRM and non-CRM retrotransposons that inserted in and near these two centromeres to create a historical record of centromere location, we show that maize centromeres are fluid genomic regions whose borders are heavily influenced by the interplay of retrotransposons and epigenetic marks. Furthermore, we propose that CRMs may be involved in removal of centromeric DNA (specifically CentC), invasion of centromeres by non-CRM retrotransposons, and local repositioning of the CENH3.
Albert, Patrice S.; Koo, Dal-Hoe; Shi, Jinghua; Gao, Zhi; Han, Fangpu; Lee, Hyeran; Xu, Ronghui; Allison, Jamie; Birchler, James A.; Jiang, Jiming; Dawe, R. Kelly; Presting, Gernot G.
2009-01-01
We describe a comprehensive and general approach for mapping centromeres and present a detailed characterization of two maize centromeres. Centromeres are difficult to map and analyze because they consist primarily of repetitive DNA sequences, which in maize are the tandem satellite repeat CentC and interspersed centromeric retrotransposons of maize (CRM). Centromeres are defined epigenetically by the centromeric histone H3 variant, CENH3. Using novel markers derived from centromere repeats, we have mapped all ten centromeres onto the physical and genetic maps of maize. We were able to completely traverse centromeres 2 and 5, confirm physical maps by fluorescence in situ hybridization (FISH), and delineate their functional regions by chromatin immunoprecipitation (ChIP) with anti-CENH3 antibody followed by pyrosequencing. These two centromeres differ substantially in size, apparent CENH3 density, and arrangement of centromeric repeats; and they are larger than the rice centromeres characterized to date. Furthermore, centromere 5 consists of two distinct CENH3 domains that are separated by several megabases. Succession of centromere repeat classes is evidenced by the fact that elements belonging to the recently active recombinant subgroups of CRM1 colonize the present day centromeres, while elements of the ancestral subgroups are also found in the flanking regions. Using abundant CRM and non-CRM retrotransposons that inserted in and near these two centromeres to create a historical record of centromere location, we show that maize centromeres are fluid genomic regions whose borders are heavily influenced by the interplay of retrotransposons and epigenetic marks. Furthermore, we propose that CRMs may be involved in removal of centromeric DNA (specifically CentC), invasion of centromeres by non-CRM retrotransposons, and local repositioning of the CENH3. PMID:19956743
Nonin-Lecomte, Sylvie; Dardel, Frédéric; Lestienne, Patrick
2005-08-01
Stretches of cytosines and guanosines have been shown in vitro to adopt non-canonical structures known as i-motifs and G-quartets, respectively. When combined, such sequences are expected to either retain their structure or form duplexes or triple helices. All these structures may occur in vivo whenever the sequence criteria are met. Such stretches are present in the circular genome of human mitochondria, as two 10 nucleotide-long perfect tandem direct repeats (DR1 and DR2). The DR1 and DR2 repeats are G-rich on the heavy strand and C-rich on the light strand. Previous results suggested that during replication, transient formation of a parallel GGC triple helix between the neo-synthesised G-rich DR1 and the double-stranded homologous DR2 could be involved in a rearrangement process leading to genome instability. In order to get structural insights into the interaction between the two repeats, we have studied by nuclear magnetic resonance (NMR) the assembly properties of a 24-mer oligodeoxyribonucleotide in which the C- and G-rich segments of the DRs are covalently tethered by a TTTT linker. We show here that this 24-mer self-associates into a triplex-containing symmetrical tetramer. The core of the structure is composed of anti-parallel Watson-Crick (WC) base pairs. Two additional strands are hydrogen-bonded to the Hoogsteen side of the Gs, thus forming CGC(+) triple helices, with G-rich ends folding into G-quartets. These results suggest that such structures could occur when the two DRs are put to close proximity in a biological context.
2011-01-01
Background The rpoB-psbZ (BZ) region of some fern plastid genomes (plastomes) has been noted to go through considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the fundamental process shaping fern plastome structure and organization. Results A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. We found that: (i) a tree fern Plagiogyria japonica contained a novel gene order that can be generated from either the ancestral Angiopteris type or the derived Adiantum type via a single inversion; (ii) the trnY-trnE intergenic spacer (IGS) of the filmy fern Vandenboschia radicans was expanded 3-fold due to the tandem 27-bp repeats which showed strong sequence similarity with the anticodon domain of trnY; (iii) the trnY-trnE IGSs of two horsetail ferns Equisetum ramosissimum and E. arvense underwent an unprecedented 5-kb long expansion, more than a quarter of which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv) ycf66 has independently lost at least four times in ferns. Conclusions Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ gene order was not detected, supporting that the Adiantum type was generated by two inversions occurring in pairs. The occurrence of Vandenboschia 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS. PMID:21486489
Gao, Lei; Zhou, Yuan; Wang, Zhi-Wei; Su, Ying-Juan; Wang, Ting
2011-04-13
The rpoB-psbZ (BZ) region of some fern plastid genomes (plastomes) has been noted to go through considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the fundamental process shaping fern plastome structure and organization. A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. We found that: (i) a tree fern Plagiogyria japonica contained a novel gene order that can be generated from either the ancestral Angiopteris type or the derived Adiantum type via a single inversion; (ii) the trnY-trnE intergenic spacer (IGS) of the filmy fern Vandenboschia radicans was expanded 3-fold due to the tandem 27-bp repeats which showed strong sequence similarity with the anticodon domain of trnY; (iii) the trnY-trnE IGSs of two horsetail ferns Equisetum ramosissimum and E. arvense underwent an unprecedented 5-kb long expansion, more than a quarter of which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv) ycf66 has independently lost at least four times in ferns. Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ gene order was not detected, supporting that the Adiantum type was generated by two inversions occurring in pairs. The occurrence of Vandenboschia 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS.
Yasukochi, Yuji; Miura, Nami; Nakano, Ryo; Sahara, Ken; Ishikawa, Yukio
2011-01-01
Background Tuning of the olfactory system of male moths to conspecific female sex pheromones is crucial for correct species recognition; however, little is known about the genetic changes that drive speciation in this system. Moths of the genus Ostrinia are good models to elucidate this question, since significant differences in pheromone blends are observed within and among species. Odorant receptors (ORs) play a critical role in recognition of female sex pheromones; eight types of OR genes expressed in male antennae were previously reported in Ostrinia moths. Methodology/Principal Findings We screened an O. nubilalis bacterial artificial chromosome (BAC) library by PCR, and constructed three contigs from isolated clones containing the reported OR genes. Fluorescence in situ hybridization (FISH) analysis using these clones as probes demonstrated that the largest contig, which contained eight OR genes, was located on the Z chromosome; two others harboring two and one OR genes were found on two autosomes. Sequence determination of BAC clones revealed the Z-linked OR genes were closely related and tandemly arrayed; moreover, four of them shared 181-bp direct repeats spanning exon 7 and intron 7. Conclusions/Significance This is the first report of tandemly arrayed sex pheromone receptor genes in Lepidoptera. The localization of an OR gene cluster on the Z chromosome agrees with previous findings for a Z-linked locus responsible for O. nubilalis male behavioral response to sex pheromone. The 181-bp direct repeats might enhance gene duplications by unequal crossovers. An autosomal locus responsible for male response to sex pheromone in Heliothis virescens and H. subflexa was recently reported to contain at least four OR genes. Taken together, these findings support the hypothesis that generation of additional copies of OR genes can increase the potential for male moths to acquire altered specificity for pheromone components, and accordingly, facilitate differentiation of sex pheromones. PMID:21526121
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jaing, Crystal; Vergez, Lisa; Hinckley, Aubree
2011-06-21
The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, Taqman PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. As the result of a different DHS project, we have selected for and isolated a large number of ciprofloxacin resistant B. anthracis Sterne isolates. These isolates vary in the concentrations of ciprofloxacin that they can tolerate, suggesting multiple mutations in the samples. In collaboration with University of Houston, Eureka Genomics and Oak Ridge National Laboratory, we analyzedmore » the ciprofloxacin resistant B. anthracis Sterne isolates by microarray hybridization, Illumina and Roche 454 sequencing to understand the error rates and sensitivity of the different methods. The report provides an assessment of the results and a complete set of all protocols used and all data generated along with information to interpret the protocols and data sets.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Doyle, C Kuyler; Lykidis, A
2006-01-01
Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, {alpha}-proteobacterium, is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, 17 putative pseudogenes, and a substantial proportion of noncoding sequence (27%). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences and a unique serine-threonine bias associated with the potential for O glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associatedmore » with immune evasion were identified, one of which contains poly(G-C) tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Genes associated with pathogen-host interactions were identified, including a small group encoding proteins (n = 12) with tandem repeats and another group encoding proteins with eukaryote-like ankyrin domains (n = 7).« less
Mitochondrial DNA repairs double-strand breaks in yeast chromosomes.
Ricchetti, M; Fairhead, C; Dujon, B
1999-11-04
The endosymbiotic theory for the origin of eukaryotic cells proposes that genetic information can be transferred from mitochondria to the nucleus of a cell, and genes that are probably of mitochondrial origin have been found in nuclear chromosomes. Occasionally, short or rearranged sequences homologous to mitochondrial DNA are seen in the chromosomes of different organisms including yeast, plants and humans. Here we report a mechanism by which fragments of mitochondrial DNA, in single or tandem array, are transferred to yeast chromosomes under natural conditions during the repair of double-strand breaks in haploid mitotic cells. These repair insertions originate from noncontiguous regions of the mitochondrial genome. Our analysis of the Saccharomyces cerevisiae mitochondrial genome indicates that the yeast nuclear genome does indeed contain several short sequences of mitochondrial origin which are similar in size and composition to those that repair double-strand breaks. These sequences are located predominantly in non-coding regions of the chromosomes, frequently in the vicinity of retrotransposon long terminal repeats, and appear as recent integration events. Thus, colonization of the yeast genome by mitochondrial DNA is an ongoing process.
Pavelitz, Thomas; Bailey, Arnold D.; Elco, Christopher P.; Weiner, Alan M.
2008-01-01
In mammals, small multigene families generate spliceosomal U snRNAs that are nearly as abundant as rRNA. Using the tandemly repeated human U2 genes as a model, we show by footprinting with DNase I and permanganate that nearly all sequences between the enhancer-like distal sequence element and the initiation site are protected during interphase whereas the upstream half of the U2 snRNA coding region is exposed. We also show by chromatin immunoprecipitation that the SNAPc complex, which binds the TATA-like proximal sequence element, is removed at metaphase but remains bound under conditions that induce locus-specific metaphase fragility of the U2 genes, such as loss of CSB, BRCA1, or BRCA2 function, treatment with actinomycin D, or overexpression of the tetrameric p53 C terminus. We propose that the U2 snRNA promoter establishes a persistently open state to facilitate rapid reinitiation and perhaps also to bypass TFIIH-dependent promoter melting; this open state would then be disassembled to allow metaphase chromatin condensation. PMID:18378697
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.
2005-09-01
Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein familiesmore » associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).« less
Short-read, high-throughput sequencing technology for STR genotyping
Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.
2013-01-01
DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315
Wegrzyn, Jill L.; Liechty, John D.; Stevens, Kristian A.; Wu, Le-Shin; Loopstra, Carol A.; Vasquez-Gross, Hans A.; Dougherty, William M.; Lin, Brian Y.; Zieve, Jacob J.; Martínez-García, Pedro J.; Holt, Carson; Yandell, Mark; Zimin, Aleksey V.; Yorke, James A.; Crepeau, Marc W.; Puiu, Daniela; Salzberg, Steven L.; de Jong, Pieter J.; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H.; Neale, David B.
2014-01-01
The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%. PMID:24653211
Wang, Jing; McCord, Bruce
2011-06-01
A common problem in the analysis of forensic DNA evidence is the presence of environmentally degraded and inhibited DNA. Such samples produce a variety of interpretational problems such as allele imbalance, allele dropout and sequence specific inhibition. In an attempt to develop methods to enhance the recovery of this type of evidence, magnetic bead hybridization has been applied to extract and preconcentrate DNA sequences containing short tandem repeat (STR) alleles of interest. In this work, genomic DNA was fragmented by heating, and sequences associated with STR alleles were selectively hybridized to allele-specific biotinylated probes. Each particular biotinylated probe-DNA complex was bound to streptavidin-coated magnetic beads using enabling enrichment of target DNA sequences. Experiments conducted using degraded DNA samples, as well as samples containing a large concentration of inhibitory substances, showed good specificity and recovery of missing alleles. Based on the favorable results obtained with these specific probes, this method should prove useful as a tool to improve the recovery of alleles from degraded and inhibited DNA samples. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Organisation of the plant genome in chromosomes.
Heslop-Harrison, J S Pat; Schwarzacher, Trude
2011-04-01
The plant genome is organized into chromosomes that provide the structure for the genetic linkage groups and allow faithful replication, transcription and transmission of the hereditary information. Genome sizes in plants are remarkably diverse, with a 2350-fold range from 63 to 149,000 Mb, divided into n=2 to n= approximately 600 chromosomes. Despite this huge range, structural features of chromosomes like centromeres, telomeres and chromatin packaging are well-conserved. The smallest genomes consist of mostly coding and regulatory DNA sequences present in low copy, along with highly repeated rDNA (rRNA genes and intergenic spacers), centromeric and telomeric repetitive DNA and some transposable elements. The larger genomes have similar numbers of genes, with abundant tandemly repeated sequence motifs, and transposable elements alone represent more than half the DNA present. Chromosomes evolve by fission, fusion, duplication and insertion events, allowing evolution of chromosome size and chromosome number. A combination of sequence analysis, genetic mapping and molecular cytogenetic methods with comparative analysis, all only becoming widely available in the 21st century, is elucidating the exact nature of the chromosome evolution events at all timescales, from the base of the plant kingdom, to intraspecific or hybridization events associated with recent plant breeding. As well as being of fundamental interest, understanding and exploiting evolutionary mechanisms in plant genomes is likely to be a key to crop development for food production. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.
Wu, S W; De Lencastre, H
1999-01-01
Screening of a library of Tn551 insertional mutants selected for reduction in the methicillin resistance level of the parental Staphylococcus aureus strain COL resulted in the isolation of mutant RUSA266 in which the minimal inhibitory concentration (MIC) of the parent was reduced from 1,600 to 1.5 micrograms/mL. Cloning and sequencing of the vicinity of the insertion site omega 726 identified an open reading frame (orf1365) encoding a very large polypeptide of more than 1,365 amino acids. A unique feature of the deduced amino acid sequence was the presence of multiple tandem repeats of 75 amino acids in the polypeptide, reminiscent of the structure of high-molecular-weight cell-surface proteins EF* and Emb identified in some streptococcal strains. Mutant RUSA266 with the inactivated gene, which we shall provisionally refer to as mrp (for multiple repeat polypeptide), produced a peptidoglycan with altered muropeptide composition, and both the reduced antibiotic resistance and the altered cell wall composition were co-transduced in back-crosses into the parental strain COL. Additional sequencing upstream of mrp has revealed that this gene was part of a five-gene cluster occupying a 9.2-kb region of the staphylococcal chromosome and was composed of glmM (directly upstream of mrp), two open reading frames orf310 and orf269 coding for two hypothetical proteins, and the gene encoding the staphylococcal arginase (arg). Transcriptional analysis demonstrated that the five genes in the cluster were transcribed together.
Yuryev, A.; Corden, J. L.
1996-01-01
The largest subunit of RNA polymerase II contains a repetitive C-terminal domain (CTD) consisting of tandem repeats of the consensus sequence Tyr(1)Ser(2)Pro(3)Thr(4) Ser(5)Pro(6) Ser(7). Substitution of nonphosphorylatable amino acids at positions two or five of the Saccharomyces cerevisiae CTD is lethal. We developed a selection ssytem for isolating suppressors of this lethal phenotype and cloned a gene, SCA1 (suppressor of CTD alanine), which complements recessive suppressors of lethal multiple-substitution mutations. A partial deletion of SCA1 (sca1Δ::hisG) suppresses alanine or glutamate substitutions at position two of the consensus CTD sequence, and a lethal CTD truncation mutation, but SCA1 deletion does not suppress alanine or glutamate substitutions at position five. SCA1 is identical to SRB9, a suppressor of a cold-sensitive CTD truncation mutation. Strains carrying dominant SRB mutations have the same suppression properties as a sca1Δ::hisG strain. These results reveal a functional difference between positions two and five of the consensus CTD heptapeptide repeat. The ability of SCA1 and SRB mutant alleles to suppress CTD truncation mutations suggest that substitutions at position two, but not at position five, cause a defect in RNA polymerase II function similar to that introduced by CTD truncation. PMID:8725217
Hou, Lifang; Zhang, Xiao; Zheng, Yinan; Wang, Sheng; Dou, Chang; Guo, Liqiong; Byun, Hyang-Min; Motta, Valeria; McCracken, John; Díaz, Anaité; Kang, Choong-Min; Koutrakis, Petros; Bertazzi, Pier Alberto; Li, Jingyun; Schwartz, Joel; Baccarelli, Andrea A.
2014-01-01
Exposure to particulate matter (PM) has been associated with lung cancer risk in epidemiology investigations. Elemental components of PM have been suggested to have critical roles in PM toxicity, but the molecular mechanisms underlying their association with cancer risks remain poorly understood. DNA methylation has emerged as a promising biomarker for environmental-related diseases, including lung cancer. In this study, we evaluated the effects of PM elemental components on methylation of three tandem repeats in a highly-exposed population in Beijing, China. The Beijing Truck Driver Air Pollution Study was conducted shortly before the 2008 Beijing Olympic Games (June 15-July 27, 2008) and included 60 truck drivers and 60 office workers. On two days separated by 1-2 weeks, we measured blood DNA methylation of SATα, NBL2, D4Z4, and personal exposure to eight elemental components in PM2.5, including aluminum (Al), silicon (Si), sulfur (S), potassium (K), calcium (Ca) titanium (Ti), iron (Fe), and zinc (Zn). We estimated the associations of individual elemental component with each tandem repeat methylation in generalized estimating equations (GEE) models adjusted for PM2.5 mass and other covariates. Out of the eight examined elements, NBL2 methylation was positively associated with concentrations of Si (0.121, 95%CI: 0.030; 0.212, FDR=0.047) and Ca (0.065, 95%CI: 0.014; 0.115, FDR=0.047) in truck drivers. In office workers, SATα methylation was positively associated with concentrations of S (0.115, 95%CI: 0.034; 0.196, FDR=0.042). PM-associated differences in blood tandem-repeat methylation may help detect biological effects of the exposure and identify individuals who may eventually experience higher lung cancer risk. PMID:24273195
Rahbarizadeh, Fatemeh; Rasaee, Mohammad J; Forouzandeh, Mehdi; Allameh, Abdolamir; Sarrami, Ramin; Nasiry, Habib; Sadeghizadeh, Majid
2005-01-01
Camelidae are known to produce immunoglobulins (Igs) devoid of light chains and constant heavy-chain domains (CH1). Antigen-specific fragments of these heavy-chain IgGs (VHH) are of great interest in biotechnology applications. This paper describes the first example of successfully raised heavy-chain antibodies in Camelus dromedarius (single-humped camel) and Camelus bactrianus (two-humped camel) against a MUC1 related peptide that is found to be an important epitope expressed in cancerous tissue. Camels were immunized against a synthetic peptide corresponding to the tandem repeat region of MUC1 mucin and cancerous tissue preparation obtained from patients suffering from breast carcinoma. Three IgG subclasses with different binding properties to protein A and G were purified by affinity chromatography. Both conventional and heavy-chain IgG antibodies were produced in response to MUC1-related peptide. The elicited antibodies could react specifically with the tandem repeat region of MUC1 mucin in an enzyme linked immunosorbant assay (ELISA). Anti-peptide antibodies were purified after passing antiserum over two affinity chromatography columns. Using ELISA, immunocytochemistry and Western blotting, the interaction of purified antibodies with different antigens was evaluated. The antibodies were observed to be selectively bound to antigens namely: MUC1 peptide (tandem repeat region), human milk fat globule membrane (HMFG), deglycosylated human milk fat globule membrane (D-HMFG), homogenized cancerous breast tissue and a native MUC1 purified from ascitic fluid. Ka values of specific polyclonal antipeptide antibodies were estimated in C. dromedarius and C. bactrianus, as 7 x 10(10) M(-1) and 1.4 x 10(10) M(-1) respectively.
Fine organization of genomic regions tagged to the 5S rDNA locus of the bread wheat 5B chromosome.
Sergeeva, Ekaterina M; Shcherban, Andrey B; Adonina, Irina G; Nesterov, Michail A; Beletsky, Alexey V; Rakitin, Andrey L; Mardanov, Andrey V; Ravin, Nikolai V; Salina, Elena A
2017-11-14
The multigene family encoding the 5S rRNA, one of the most important structurally-functional part of the large ribosomal subunit, is an obligate component of all eukaryotic genomes. 5S rDNA has long been a favored target for cytological and phylogenetic studies due to the inherent peculiarities of its structural organization, such as the tandem arrays of repetitive units and their high interspecific divergence. The complex polyploid nature of the genome of bread wheat, Triticum aestivum, and the technically difficult task of sequencing clusters of tandem repeats mean that the detailed organization of extended genomic regions containing 5S rRNA genes remains unclear. This is despite the recent progress made in wheat genomic sequencing. Using pyrosequencing of BAC clones, in this work we studied the organization of two distinct 5S rDNA-tagged regions of the 5BS chromosome of bread wheat. Three BAC-clones containing 5S rDNA were identified in the 5BS chromosome-specific BAC-library of Triticum aestivum. Using the results of pyrosequencing and assembling, we obtained six 5S rDNA- containing contigs with a total length of 140,417 bp, and two sets (pools) of individual 5S rDNA sequences belonging to separate, but closely located genomic regions on the 5BS chromosome. Both regions are characterized by the presence of approximately 70-80 copies of 5S rDNA, however, they are completely different in their structural organization. The first region contained highly diverged short-type 5S rDNA units that were disrupted by multiple insertions of transposable elements. The second region contained the more conserved long-type 5S rDNA, organized as a single tandem array. FISH using probes specific to both 5S rDNA unit types showed differences in the distribution and intensity of signals on the chromosomes of polyploid wheat species and their diploid progenitors. A detailed structural organization of two closely located 5S rDNA-tagged genomic regions on the 5BS chromosome of bread wheat has been established. These two regions differ in the organization of both 5S rDNA and the neighboring sequences comprised of transposable elements, implying different modes of evolution for these regions.
Ligand binding by repeat proteins: natural and designed
Grove, Tijana Z; Cortajarena, Aitziber L; Regan, Lynne
2012-01-01
Repeat proteins contain tandem arrays of small structural motifs. As a consequence of this architecture, they adopt non-globular, extended structures that present large, highly specific surfaces for ligand binding. Here we discuss recent advances toward understanding the functional role of this unique modular architecture. We showcase specific examples of natural repeat proteins interacting with diverse ligands and also present examples of designed repeat protein–ligand interactions. PMID:18602006
DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra
2013-01-01
De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.
Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald
2014-02-07
De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
Nozeret, Karine; Bonan, Marc; Yarmoluk, Serguiy M; Novopashina, Darya S; Boutorine, Alexandre S
2015-09-01
Synthetic minor groove-binding pyrrole-imidazole polyamides labeled by fluorophores are promising candidates for fluorescence imaging of double-stranded DNA in isolated chromosomes or fixed and living cells. We synthesized nine hairpin and two head-to-head tandem polyamides targeting repeated sequences from mouse major satellites. Their interaction with synthetic target dsDNA has been studied by physico-chemical methods in vitro before and after coupling to various fluorophores. Great variability in affinities and fluorescence properties reveals a conclusion that these properties do not only rely on recognition rules, but also on other known and unknown structural factors. Individual testing of each probe is needed before cellular applications. Copyright © 2015 Elsevier Ltd. All rights reserved.
Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.
Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L
1987-08-01
To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded.
Construction of trypanosome artificial mini-chromosomes.
Lee, M G; E, Y; Axelrod, N
1995-01-01
We report the preparation of two linear constructs which, when transformed into the procyclic form of Trypanosoma brucei, become stably inherited artificial mini-chromosomes. Both of the two constructs, one of 10 kb and the other of 13 kb, contain a T.brucei PARP promoter driving a chloramphenicol acetyltransferase (CAT) gene. In the 10 kb construct the CAT gene is followed by one hygromycin phosphotransferase (Hph) gene, and in the 13 kb construct the CAT gene is followed by three tandemly linked Hph genes. At each end of these linear molecules are telomere repeats and subtelomeric sequences. Electroporation of these linear DNA constructs into the procyclic form of T.brucei generated hygromycin-B resistant cell lines. In these cell lines, the input DNA remained linear and bounded by the telomere ends, but it increased in size. In the cell lines generated by the 10 kb construct, the input DNA increased in size to 20-50 kb. In the cell lines generated by the 13 kb constructs, two sizes of linear DNAs containing the input plasmid were detected: one of 40-50 kb and the other of 150 kb. The increase in size was not the result of in vivo tandem repetitions of the input plasmid, but represented the addition of new sequences. These Hph containing linear DNA molecules were maintained stably in cell lines for at least 20 generations in the absence of drug selection and were subsequently referred to as trypanosome artificial mini-chromosomes, or TACs. Images PMID:8532534
Characterization of a dopamine transporter polymorphism and behavior in Belgian Malinois
2013-01-01
Background The Belgian Malinois dog breed (MAL) is frequently used in law enforcement and military environments. Owners have reported seizures and unpredictable behavioral changes including dogs’ eyes “glazing over,” dogs’ lack of response to environmental stimuli, and loss of behavioral inhibition including owner-directed biting behavior. Dogs with severe behavioral changes may be euthanized as they can represent a danger to humans and other dogs. In the dog, the dopamine transporter gene (DAT) contains a 38-base pair variable number tandem repeat (DAT-VNTR); alleles have either one or two copies of the 38-base pair sequence. The objective of this study was to assess frequency of DAT-VNTR alleles, and characterize the association between DAT-VNTR alleles and behavior in MAL and other breeds. Results In an American sample of 280 dogs comprising 26 breeds, most breeds are predominantly homozygous for the DAT-VNTR two-tandem-repeat allele (2/2). The one-tandem-repeat allele is over-represented in American MAL (AM-MAL) (n = 144), both as heterozygotes (1/2) and homozygotes (1/1). All AM-MAL with reported seizures (n = 5) were 1/1 genotype. For AM-MAL with at least one “1” allele (1/1 or 1/2 genotype, n = 121), owners reported higher levels of attention, increased frequency of episodic aggression, and increased frequency of loss of responsiveness to environmental stimuli. In behavior observations, Belgian Military Working Dogs (MWD) with 1/1 or 1/2 genotypes displayed fewer distracted behaviors and more stress-related behaviors such as lower posture and increased yawning. Handlers’ treatment of MWD varied with DAT-VNTR genotype as did dogs’ responses to handlers’ behavior. For 1/1 or 1/2 genotype MWD, 1) lower posture after the first aversive stimulus given by handlers was associated with poorer obedience performance; 2) increased aversive stimuli during protection exercises were associated with decreased performance; 3) more aversive stimuli during obedience were associated with more aversive stimuli during protection; and 4) handlers used more aversive stimuli in protection compared with obedience exercises. Conclusions The single copy allele of DAT-VNTR is associated with owner-reported seizures, loss of responsiveness to environmental stimuli, episodic aggression, and hyper-vigilance in MAL. Behavioral changes are associated with differential treatment by handlers. Findings should be considered preliminary until replicated in a larger sample. PMID:23718893