single sequence repeat: Topics by Science.gov

Sample records for single sequence repeat

Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

PubMed

Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

2017-04-01

There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.
Analysis of sequence repeats of proteins in the PDB.

PubMed

Mary Rajathei, David; Selvaraj, Samuel

2013-12-01

Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
An integrated genetic linkage map of watermelon and genetic diversity based on single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers

USDA-ARS?s Scientific Manuscript database

Watermelon (Citrullus lanatus var. lanatus) is an important vegetable fruit throughout the world. A high number of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers should provide large coverage of the watermelon genome and high phylogenetic resolution of germplasm acces...
Variation, Repetition, And Choice

PubMed Central

Abreu-Rodrigues, Josele; Lattal, Kennon A; dos Santos, Cristiano V; Matos, Ricardo A

2005-01-01

Experiment 1 investigated the controlling properties of variability contingencies on choice between repeated and variable responding. Pigeons were exposed to concurrent-chains schedules with two alternatives. In the REPEAT alternative, reinforcers in the terminal link depended on a single sequence of four responses. In the VARY alternative, a response sequence in the terminal link was reinforced only if it differed from the n previous sequences (lag criterion). The REPEAT contingency generated low, constant levels of sequence variation whereas the VARY contingency produced levels of sequence variation that increased with the lag criterion. Preference for the REPEAT alternative tended to increase directly with the degree of variation required for reinforcement. Experiment 2 examined the potential confounding effects in Experiment 1 of immediacy of reinforcement by yoking the interreinforcer intervals in the REPEAT alternative to those in the VARY alternative. Again, preference for REPEAT was a function of the lag criterion. Choice between varying and repeating behavior is discussed with respect to obtained behavioral variability, probability of reinforcement, delay of reinforcement, and switching within a sequence. PMID:15828592
Algorithm to find distant repeats in a single protein sequence

PubMed Central

Banerjee, Nirjhar; Sarani, Rangarajan; Ranjani, Chellamuthu Vasuki; Sowmiya, Govindaraj; Michael, Daliah; Balakrishnan, Narayanasamy; Sekar, Kanagaraj

2008-01-01

Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies. PMID:19052663
The CentO satellite confers translational and rotational phasing on cenH3 nucleosomes in rice centromeres.

PubMed

Zhang, Tao; Talbert, Paul B; Zhang, Wenli; Wu, Yufeng; Yang, Zujun; Henikoff, Jorja G; Henikoff, Steven; Jiang, Jiming

2013-12-10

Plant and animal centromeres comprise megabases of highly repeated satellite sequences, yet centromere function can be specified epigenetically on single-copy DNA by the presence of nucleosomes containing a centromere-specific variant of histone H3 (cenH3). We determined the positions of cenH3 nucleosomes in rice (Oryza sativa), which has centromeres composed of both the 155-bp CentO satellite repeat and single-copy non-CentO sequences. We find that cenH3 nucleosomes protect 90-100 bp of DNA from micrococcal nuclease digestion, sufficient for only a single wrap of DNA around the cenH3 nucleosome core. cenH3 nucleosomes are translationally phased with 155-bp periodicity on CentO repeats, but not on non-CentO sequences. CentO repeats have an ∼10-bp periodicity in WW dinucleotides and in micrococcal nuclease cleavage, providing evidence for rotational phasing of cenH3 nucleosomes on CentO and suggesting that satellites evolve for translational and rotational stabilization of centromeric nucleosomes.
The complete chloroplast genome of Cinnamomum camphora and its comparison with related Lauraceae species.

PubMed

Chen, Caihui; Zheng, Yongjie; Liu, Sian; Zhong, Yongda; Wu, Yanfang; Li, Jiang; Xu, Li-An; Xu, Meng

2017-01-01

Cinnamomum camphora , a member of the Lauraceae family, is a valuable aromatic and timber tree that is indigenous to the south of China and Japan. All parts of Cinnamomum camphora have secretory cells containing different volatile chemical compounds that are utilized as herbal medicines and essential oils. Here, we reported the complete sequencing of the chloroplast genome of Cinnamomum camphora using illumina technology. The chloroplast genome of Cinnamomum camphora is 152,570 bp in length and characterized by a relatively conserved quadripartite structure containing a large single copy region of 93,705 bp, a small single copy region of 19,093 bp and two inverted repeat (IR) regions of 19,886 bp. Overall, the genome contained 123 coding regions, of which 15 were repeated in the IR regions. An analysis of chloroplast sequence divergence revealed that the small single copy region was highly variable among the different genera in the Lauraceae family. A total of 40 repeat structures and 83 simple sequence repeats were detected in both the coding and non-coding regions. A phylogenetic analysis indicated that Calycanthus is most closely related to Lauraceae , both being members of Laurales , which forms a sister group to Magnoliids . The complete sequence of the chloroplast of Cinnamomum camphora will aid in in-depth taxonomical studies of the Lauraceae family in the future. The genetic sequence information will also have valuable applications for chloroplast genetic engineering.
The complete chloroplast genome sequences of Lychnis wilfordii and Silene capitata and comparative analyses with other Caryophyllaceae genomes.

PubMed

Kang, Jong-Soo; Lee, Byoung Yoon; Kwak, Myounghai

2017-01-01

The complete chloroplast genomes of Lychnis wilfordii and Silene capitata were determined and compared with ten previously reported Caryophyllaceae chloroplast genomes. The chloroplast genome sequences of L. wilfordii and S. capitata contain 152,320 bp and 150,224 bp, respectively. The gene contents and orders among 12 Caryophyllaceae species are consistent, but several microstructural changes have occurred. Expansion of the inverted repeat (IR) regions at the large single copy (LSC)/IRb and small single copy (SSC)/IR boundaries led to partial or entire gene duplications. Additionally, rearrangements of the LSC region were caused by gene inversions and/or transpositions. The 18 kb inversions, which occurred three times in different lineages of tribe Sileneae, were thought to be facilitated by the intermolecular duplicated sequences. Sequence analyses of the L. wilfordii and S. capitata genomes revealed 39 and 43 repeats, respectively, including forward, palindromic, and reverse repeats. In addition, a total of 67 and 56 simple sequence repeats were discovered in the L. wilfordii and S. capitata chloroplast genomes, respectively. Finally, we constructed phylogenetic trees of the 12 Caryophyllaceae species and two Amaranthaceae species based on 73 protein-coding genes using both maximum parsimony and likelihood methods.
Direct repeat sequences in the Streptomyces chitinase-63 promoter direct both glucose repression and chitin induction

PubMed Central

Ni, Xiangyang; Westpheling, Janet

1997-01-01

The chi63 promoter directs glucose-sensitive, chitin-dependent transcription of a gene involved in the utilization of chitin as carbon source. Analysis of 5′ and 3′ deletions of the promoter region revealed that a 350-bp segment is sufficient for wild-type levels of expression and regulation. The analysis of single base changes throughout the promoter region, introduced by random and site-directed mutagenesis, identified several sequences to be important for activity and regulation. Single base changes at −10, −12, −32, −33, −35, and −37 upstream of the transcription start site resulted in loss of activity from the promoter, suggesting that bases in these positions are important for RNA polymerase interaction. The sequences centered around −10 (TATTCT) and −35 (TTGACC) in this promoter are, in fact, prototypical of eubacterial promoters. Overlapping the RNA polymerase binding site is a perfect 12-bp direct repeat sequence. Some base changes within this direct repeat resulted in constitutive expression, suggesting that this sequence is an operator for negative regulation. Other base changes resulted in loss of glucose repression while retaining the requirement for chitin induction, suggesting that this sequence is also involved in glucose repression. The fact that cis-acting mutations resulted in glucose resistance but not inducer independence rules out the possibility that glucose repression acts exclusively by inducer exclusion. The fact that mutations that affect glucose repression and chitin induction fall within the same direct repeat sequence module suggests that the direct repeat sequence facilitates both chitin induction and glucose repression. PMID:9371809
Evidence for Long-Timescale Patterns of Synaptic Inputs in CA1 of Awake Behaving Mice.

PubMed

Kolb, Ilya; Talei Franzesi, Giovanni; Wang, Michael; Kodandaramaiah, Suhasa B; Forest, Craig R; Boyden, Edward S; Singer, Annabelle C

2018-02-14

Repeated sequences of neural activity are a pervasive feature of neural networks in vivo and in vitro In the hippocampus, sequential firing of many neurons over periods of 100-300 ms reoccurs during behavior and during periods of quiescence. However, it is not known whether the hippocampus produces longer sequences of activity or whether such sequences are restricted to specific network states. Furthermore, whether long repeated patterns of activity are transmitted to single cells downstream is unclear. To answer these questions, we recorded intracellularly from hippocampal CA1 of awake, behaving male mice to examine both subthreshold activity and spiking output in single neurons. In eight of nine recordings, we discovered long (900 ms) reoccurring subthreshold fluctuations or "repeats." Repeats generally were high-amplitude, nonoscillatory events reoccurring with 10 ms precision. Using statistical controls, we determined that repeats occurred more often than would be expected from unstructured network activity (e.g., by chance). Most spikes occurred during a repeat, and when a repeat contained a spike, the spike reoccurred with precision on the order of ≤20 ms, showing that long repeated patterns of subthreshold activity are strongly connected to spike output. Unexpectedly, we found that repeats occurred independently of classic hippocampal network states like theta oscillations or sharp-wave ripples. Together, these results reveal surprisingly long patterns of repeated activity in the hippocampal network that occur nonstochastically, are transmitted to single downstream neurons, and strongly shape their output. This suggests that the timescale of information transmission in the hippocampal network is much longer than previously thought. SIGNIFICANCE STATEMENT We found long (≥900 ms), repeated, subthreshold patterns of activity in CA1 of awake, behaving mice. These repeated patterns ("repeats") occurred more often than expected by chance and with 10 ms precision. Most spikes occurred within repeats and reoccurred with a precision on the order of 20 ms. Surprisingly, there was no correlation between repeat occurrence and classical network states such as theta oscillations and sharp-wave ripples. These results provide strong evidence that long patterns of activity are repeated and transmitted to downstream neurons, suggesting that the hippocampus can generate longer sequences of repeated activity than previously thought. Copyright © 2018 the authors 0270-6474/18/381822-14$15.00/0.
Translocation and gross deletion breakpoints in human inherited disease and cancer II: Potential involvement of repetitive sequence elements in secondary structure formation between DNA ends.

PubMed

Chuzhanova, Nadia; Abeysinghe, Shaun S; Krawczak, Michael; Cooper, David N

2003-09-01

Translocations and gross deletions are responsible for a significant proportion of both cancer and inherited disease. Although such gene rearrangements are nonuniformly distributed in the human genome, the underlying mutational mechanisms remain unclear. We have studied the potential involvement of various types of repetitive sequence elements in the formation of secondary structure intermediates between the single-stranded DNA ends that recombine during rearrangements. Complexity analysis was used to assess the potential of these ends to form secondary structures, the maximum decrease in complexity consequent to a gross rearrangement being used as an indicator of the type of repeat and the specific DNA ends involved. A total of 175 pairs of deletion/translocation breakpoint junction sequences available from the Gross Rearrangement Breakpoint Database [GRaBD; www.uwcm.ac.uk/uwcm/mg/grabd/grabd.html] were analyzed. Potential secondary structure was noted between the 5' flanking sequence of the first breakpoint and the 3' flanking sequence of the second breakpoint in 49% of rearrangements and between the 5' flanking sequence of the second breakpoint and the 3' flanking sequence of the first breakpoint in 36% of rearrangements. Inverted repeats, inversions of inverted repeats, and symmetric elements were found in association with gross rearrangements at approximately the same frequency. However, inverted repeats and inversions of inverted repeats accounted for the vast majority (83%) of deletions plus small insertions, symmetric elements for one-half of all antigen receptor-mediated translocations, while direct repeats appear only to be involved in mediating simple deletions. These findings extend our understanding of illegitimate recombination by highlighting the importance of secondary structure formation between single-stranded DNA ends at breakpoint junctions. Copyright 2003 Wiley-Liss, Inc.
Nucleotide sequence of soybean chloroplast DNA regions which contain the psb A and trn H genes and cover the ends of the large single copy region and one end of the inverted repeats.

PubMed

Spielmann, A; Stutz, E

1983-10-25

The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2.
A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

USDA-ARS?s Scientific Manuscript database

A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...
Structural analysis of two length variants of the rDNA intergenic spacer from Eruca sativa.

PubMed

Lakshmikumaran, M; Negi, M S

1994-03-01

Restriction enzyme analysis of the rRNA genes of Eruca sativa indicated the presence of many length variants within a single plant and also between different cultivars which is unusual for most crucifers studied so far. Two length variants of the rDNA intergenic spacer (IGS) from a single individual E. sativa (cv. Itsa) plant were cloned and characterized. The complete nucleotide sequences of both the variants (3 kb and 4 kb) were determined. The intergenic spacer contains three families of tandemly repeated DNA sequences denoted as A, B and C. However, the long (4 kb) variant shows the presence of an additional repeat, denoted as D, which is a duplication of a 224 bp sequence just upstream of the putative transcription initiation site. Repeat units belonging to the three different families (A, B and C) were in the size range of 22 to 30 bp. Such short repeat elements are present in the IGS of most of the crucifers analysed so far. Sequence analysis of the variants (3 kb and 4 kb) revealed that the length heterogeneity of the spacer is located at three different regions and is due to the varying copy numbers of repeat units belonging to families A and B. Length variation of the spacer is also due to the presence of a large duplication (D repeats) in the 4 kb variant which is absent in the 3 kb variant. The putative transcription initiation site was identified by comparisons with the rDNA sequences from other plant species.
A novel tandem repeat sequence located on human chromosome 4p: isolation and characterization.

PubMed

Kogi, M; Fukushige, S; Lefevre, C; Hadano, S; Ikeda, J E

1997-06-01

In an effort to analyze the genomic region of the distal half of human chromosome 4p, to where Huntington disease and other diseases have been mapped, we have isolated the cosmid clone (CRS447) that was likely to contain a region with specific repeat sequences. Clone CRS447 was subjected to detailed analysis, including chromosome mapping, restriction mapping, and DNA sequencing. Chromosome mapping by both a human-CHO hybrid cell panel and FISH revealed that CRS447 was predominantly located in the 4p15.1-15.3 region. CRS447 was shown to consist of tandem repeats of 4.7-kb units present on chromosome 4p. A single EcoRI unit was subcloned (pRS447), and the complete sequence was determined as 4752 nucleotides. When pRS447 was used as a probe, the number of copies of this repeat per haploid genome was estimated to be 50-70. Sequence analysis revealed that it contained two internal CA repeats and one putative ORF. Database search established that this sequence was unreported. However, two homologous STS markers were found in the database. We concluded that CRS447/pRS447 is a novel tandem repeat sequence that is mainly specific to human chromosome 4p.
Stability of Tandem Repeats in the Drosophila Melanogaster HSR-Omega Nuclear RNA

PubMed Central

Hogan, N. C.; Slot, F.; Traverse, K. L.; Garbe, J. C.; Bendena, W. G.; Pardue, M. L.

1995-01-01

The Drosophila melanogaster Hsr-omega locus produces a nuclear RNA containing >5 kb of tandem repeat sequences. These repeats are unique to Hsr-omega and show concerted evolution similar to that seen with classical satellite DNAs. In D. melanogaster the monomer is ~280 bp. Sequences of 191/2 monomers differ by 8 +/- 5% (mean +/- SD), when all pairwise comparisons are considered. Differences are single nucleotide substitutions and 1-3 nucleotide deletions/insertions. Changes appear to be randomly distributed over the repeat unit. Outer repeats do not show the decrease in monomer homogeneity that might be expected if homogeneity is maintained by recombination. However, just outside the last complete repeat at each end, there are a few fragments of sequence similar to the monomer. The sequences in these flanking regions are not those predicted for sequences decaying in the absence of recombination. Instead, the fragmentation of the sequence homology suggests that flanking regions have undergone more severe disruptions, possibly during an insertion or amplification event. Hsr-omega alleles differing in the number of repeats are detected and appear to be stable over a few thousand generations; however, both increases and decreases in repeat numbers have been observed. The new alleles appear to be as stable as their predecessors. No alleles of less than ~5 kb nor more than ~16 kb of repeats were seen in any stocks examined. The evidence that there is a limit on the minimum number of repeats is consistent with the suggestion that these repeats are important in the function of the unusual Hsr-omega nuclear RNA. PMID:7540581
Long-read sequencing and de novo assembly of a Chinese genome

USDA-ARS?s Scientific Manuscript database

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arr...
Regions of conservation and divergence in the 3' untranslated sequences of genomic RNA from Ross River virus isolates.

PubMed

Faragher, S G; Dalgarno, L

1986-07-20

The 3' untranslated (UT) sequences of the genomic RNAs of five geographic variants of the alphavirus Ross River virus (RRV) were determined and compared with the 3' UT sequence of RRV T48, the prototype strain. Part of the 3' UT region of Getah virus, a close serological relative of RRV, was also sequenced. The RRV 3' UT region varies markedly in length between variants. Large deletions or insertions, sequence rearrangements and single nucleotide substitutions are observed. A sequence tract of 49 to 58 nucleotides, which is repeated as four blocks in the RRV T48 3' UT region, occurs only once in the 3' UT region of one RRV strain (NB5092), indicating that the existence of repeat sequence blocks is not essential for RRV replication. However, the precise sequence of the 3' proximal copy of the repeat block and its position relative to the poly(A) tail were identical in all RRV isolates examined, suggesting that it has an important role in RRV replication. Nucleotide substitutions between RRV variants are distributed non-randomly along the length of the 3' UT region. The sequence of 120 to 130 nucleotides adjacent to the poly(A) tail is strongly conserved. Getah virus RNA contains three repeat sequence blocks in the 3' UT region. These are similar in sequence to those in RRV RNA but differ in their arrangement. Homology between the RRV and Getah 3' UT sequences is greatest in the 3' proximal repeat sequence block that shows three differences in 49 nucleotides. The 3' proximal repeat in Getah RNA occurs at the same position, relative to the poly(A) tail, as in all RRV variants. The RRV and Getah virus 3' UT sequences show extensive homology in the region between the 3' proximal repeat and the poly(A) tail but, apart from the repeat blocks themselves, they show no significant homology elsewhere.
Nucleotide sequence of soybean chloroplast DNA regions which contain the psb A and trn H genes and cover the ends of the large single copy region and one end of the inverted repeats.

PubMed Central

Spielmann, A; Stutz, E

1983-01-01

The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2. PMID:6314279
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

PubMed

VanBuren, Robert; Bryant, Doug; Edger, Patrick P; Tang, Haibao; Burgess, Diane; Challabathula, Dinakar; Spittle, Kristi; Hall, Richard; Gu, Jenny; Lyons, Eric; Freeling, Michael; Bartels, Dorothea; Ten Hallers, Boudewijn; Hastie, Alex; Michael, Todd P; Mockler, Todd C

2015-11-26

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.

Repeatless and repeat-based centromeres in potato: implications for centromere evolution.

PubMed

Gong, Zhiyun; Wu, Yufeng; Koblízková, Andrea; Torres, Giovana A; Wang, Kai; Iovene, Marina; Neumann, Pavel; Zhang, Wenli; Novák, Petr; Buell, C Robin; Macas, Jirí; Jiang, Jiming

2012-09-01

Centromeres in most higher eukaryotes are composed of long arrays of satellite repeats. By contrast, most newly formed centromeres (neocentromeres) do not contain satellite repeats and instead include DNA sequences representative of the genome. An unknown question in centromere evolution is how satellite repeat-based centromeres evolve from neocentromeres. We conducted a genome-wide characterization of sequences associated with CENH3 nucleosomes in potato (Solanum tuberosum). Five potato centromeres (Cen4, Cen6, Cen10, Cen11, and Cen12) consisted primarily of single- or low-copy DNA sequences. No satellite repeats were identified in these five centromeres. At least one transcribed gene was associated with CENH3 nucleosomes. Thus, these five centromeres structurally resemble neocentromeres. By contrast, six potato centromeres (Cen1, Cen2, Cen3, Cen5, Cen7, and Cen8) contained megabase-sized satellite repeat arrays that are unique to individual centromeres. The satellite repeat arrays likely span the entire functional cores of these six centromeres. At least four of the centromeric repeats were amplified from retrotransposon-related sequences and were not detected in Solanum species closely related to potato. The presence of two distinct types of centromeres, coupled with the boom-and-bust cycles of centromeric satellite repeats in Solanum species, suggests that repeat-based centromeres can rapidly evolve from neocentromeres by de novo amplification and insertion of satellite repeats in the CENH3 domains.
Repeatless and Repeat-Based Centromeres in Potato: Implications for Centromere Evolution[C][W

PubMed Central

Gong, Zhiyun; Wu, Yufeng; Koblížková, Andrea; Torres, Giovana A.; Wang, Kai; Iovene, Marina; Neumann, Pavel; Zhang, Wenli; Novák, Petr; Buell, C. Robin; Macas, Jiří; Jiang, Jiming

2012-01-01

Centromeres in most higher eukaryotes are composed of long arrays of satellite repeats. By contrast, most newly formed centromeres (neocentromeres) do not contain satellite repeats and instead include DNA sequences representative of the genome. An unknown question in centromere evolution is how satellite repeat-based centromeres evolve from neocentromeres. We conducted a genome-wide characterization of sequences associated with CENH3 nucleosomes in potato (Solanum tuberosum). Five potato centromeres (Cen4, Cen6, Cen10, Cen11, and Cen12) consisted primarily of single- or low-copy DNA sequences. No satellite repeats were identified in these five centromeres. At least one transcribed gene was associated with CENH3 nucleosomes. Thus, these five centromeres structurally resemble neocentromeres. By contrast, six potato centromeres (Cen1, Cen2, Cen3, Cen5, Cen7, and Cen8) contained megabase-sized satellite repeat arrays that are unique to individual centromeres. The satellite repeat arrays likely span the entire functional cores of these six centromeres. At least four of the centromeric repeats were amplified from retrotransposon-related sequences and were not detected in Solanum species closely related to potato. The presence of two distinct types of centromeres, coupled with the boom-and-bust cycles of centromeric satellite repeats in Solanum species, suggests that repeat-based centromeres can rapidly evolve from neocentromeres by de novo amplification and insertion of satellite repeats in the CENH3 domains. PMID:22968715
Analysis of SSR information in EST resources of sugarcane

USDA-ARS?s Scientific Manuscript database

Expressed sequence tags ( ESTs) offer the opportunity to exploit single, low -copy, conserved sequence motifs for the development of simple sequence repeats ( SSRs). The total of 262 113 ESTs of sugarcane (Saccharum officinarum) in the database of NCBI were downloaded and analyzed, which resulted in...
Mapping Simple Repeated DNA Sequences in Heterochromatin of Drosophila Melanogaster

PubMed Central

Lohe, A. R.; Hilliker, A. J.; Roberts, P. A.

1993-01-01

Heterochromatin in Drosophila has unusual genetic, cytological and molecular properties. Highly repeated DNA sequences (satellites) are the principal component of heterochromatin. Using probes from cloned satellites, we have constructed a chromosome map of 10 highly repeated, simple DNA sequences in heterochromatin of mitotic chromosomes of Drosophila melanogaster. Despite extensive sequence homology among some satellites, chromosomal locations could be distinguished by stringent in situ hybridizations for each satellite. Only two of the localizations previously determined using gradient-purified bulk satellite probes are correct. Eight new satellite localizations are presented, providing a megabase-level chromosome map of one-quarter of the genome. Five major satellites each exhibit a multichromosome distribution, and five minor satellites hybridize to single sites on the Y chromosome. Satellites closely related in sequence are often located near one another on the same chromosome. About 80% of Y chromosome DNA is composed of nine simple repeated sequences, in particular (AAGAC)(n) (8 Mb), (AAGAG)(n) (7 Mb) and (AATAT)(n) (6 Mb). Similarly, more than 70% of the DNA in chromosome 2 heterochromatin is composed of five simple repeated sequences. We have also generated a high resolution map of satellites in chromosome 2 heterochromatin, using a series of translocation chromosomes whose breakpoints in heterochromatin were ordered by N-banding. Finally, staining and banding patterns of heterochromatic regions are correlated with the locations of specific repeated DNA sequences. The basis for the cytochemical heterogeneity in banding appears to depend exclusively on the different satellite DNAs present in heterochromatin. PMID:8375654
The complete chloroplast genome sequence of Curcuma flaviflora (Curcuma).

PubMed

Zhang, Yan; Deng, Jiabin; Li, Yangyi; Gao, Gang; Ding, Chunbang; Zhang, Li; Zhou, Yonghong; Yang, Ruiwu

2016-09-01

The complete chloroplast (cp) genome of Curcuma flaviflora, a medicinal plant in Southeast Asia, was sequenced. The genome size was 160 478 bp in length, with 36.3% GC content. A pair of inverted repeats (IRs) of 26 946 bp were separated by a large single copy (LSC) of 88 008 bp and a small single copy (SSC) of 18 578 bp, respectively. The cp genome contained 132 annotated genes, including 79 protein coding genes, 30 tRNA genes, and four rRNA genes. And 19 of these genes were duplicated in inverted repeat regions.
Characterization of the complete chloroplast genome of Platycarya strobilacea (Juglandaceae)

Treesearch

Jing Yan; Kai Han; Shuyun Zeng; Peng Zhao; Keith Woeste; Jianfang Li; Zhan-Lin Liu

2017-01-01

The whole chloroplast genome (cp genome) sequence of Platycarya strobilacea was characterized from Illumina pair-end sequencing data. The complete cp genome was 160,994 bp in length and contained a large single copy region (LSC) of 90,225 bp and a small single copy region (SSC) of 18,371 bp, which were separated by a pair of inverted repeat regions...
Molecular architecture of classical cytological landmarks: Centromeres and telomeres

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meyne, J.

1994-11-01

Both the human telomere repeat and the pericentromeric repeat sequence (GGAAT)n were isolated based on evolutionary conservation. Their isolation was based on the premise that chromosomal features as structurally and functionally important as telomeres and centromeres should be highly conserved. Both sequences were isolated by high stringency screening of a human repetitive DNA library with rodent repetitive DNA. The pHuR library (plasmid Human Repeat) used for this project was enriched for repetitive DNA by using a modification of the standard DNA library preparation method. Usually DNA for a library is cut with restriction enzymes, packaged, infected, and the library ismore » screened. A problem with this approach is that many tandem repeats don`t have any (or many) common restriction sites. Therefore, many of the repeat sequences will not be represented in the library because they are not restricted to a viable length for the vector used. To prepare the pHuR library, human DNA was mechanically sheared to a small size. These relatively short DNA fragments were denatured and then renatured to C{sub o}t 50. Theoretically only repetitive DNA sequences should renature under C{sub o}t 50 conditions. The single-stranded regions were digested using S1 nuclease, leaving the double-stranded, renatured repeat sequences.« less
Sequences spanning the leader-repeat junction mediate CRISPR adaptation to phage in Streptococcus thermophilus

PubMed Central

Wei, Yunzhou; Chesne, Megan T.; Terns, Rebecca M.; Terns, Michael P.

2015-01-01

CRISPR-Cas systems are RNA-based immune systems that protect prokaryotes from invaders such as phages and plasmids. In adaptation, the initial phase of the immune response, short foreign DNA fragments are captured and integrated into host CRISPR loci to provide heritable defense against encountered foreign nucleic acids. Each CRISPR contains a ∼100–500 bp leader element that typically includes a transcription promoter, followed by an array of captured ∼35 bp sequences (spacers) sandwiched between copies of an identical ∼35 bp direct repeat sequence. New spacers are added immediately downstream of the leader. Here, we have analyzed adaptation to phage infection in Streptococcus thermophilus at the CRISPR1 locus to identify cis-acting elements essential for the process. We show that the leader and a single repeat of the CRISPR locus are sufficient for adaptation in this system. Moreover, we identified a leader sequence element capable of stimulating adaptation at a dormant repeat. We found that sequences within 10 bp of the site of integration, in both the leader and repeat of the CRISPR, are required for the process. Our results indicate that information at the CRISPR leader-repeat junction is critical for adaptation in this Type II-A system and likely other CRISPR-Cas systems. PMID:25589547
Site directed recombination

DOEpatents

Jurka, Jerzy W.

1997-01-01

Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
Diversity and Plasticity of the Intracellular Plant Pathogen and Insect Symbiont “Candidatus Liberibacter asiaticus” as Revealed by Hypervariable Prophage Genes with Intragenic Tandem Repeats ▿ †

PubMed Central

Zhou, Lijuan; Powell, Charles A.; Hoffman, Michele T.; Li, Wenbin; Fan, Guocheng; Liu, Bo; Lin, Hong; Duan, Yongping

2011-01-01

“Candidatus Liberibacter asiaticus” is a psyllid-transmitted, phloem-limited alphaproteobacterium and the most prevalent species of “Ca. Liberibacter” associated with a devastating worldwide citrus disease known as huanglongbing (HLB). Two related and hypervariable genes (hyvI and hyvII) were identified in the prophage regions of the Psy62 “Ca. Liberibacter asiaticus” genome. Sequence analyses of the hyvI and hyvII genes in 35 “Ca. Liberibacter asiaticus” DNA isolates collected globally revealed that the hyvI gene contains up to 12 nearly identical tandem repeats (NITRs, 132 bp) and 4 partial repeats, while hyvII contains up to 2 NITRs and 4 partial repeats and shares homology with hyvI. Frequent deletions or insertions of these repeats within the hyvI and hyvII genes were observed, none of which disrupted the open reading frames. Sequence conservation within the individual repeats but an extensive variation in repeat numbers, rearrangement, and the sequences flanking the repeat region indicate the diversity and plasticity of “Ca. Liberibacter asiaticus” bacterial populations in the world. These differences were found not only in samples of distinct geographical origins but also in samples from a single origin and even from a single “Ca. Liberibacter asiaticus”-infected sample. This is the first evidence of different “Ca. Liberibacter asiaticus” populations coexisting in a single HLB-affected sample. The Florida “Ca. Liberibacter asiaticus” isolates contain both hyvI and hyvII, while all other global “Ca. Liberibacter asiaticus” isolates contain either one or the other. Interclade assignments of the putative HyvI and HyvII proteins from Florida isolates with other global isolates in phylogenetic trees imply multiple “Ca. Liberibacter asiaticus” populations in the world and a multisource introduction of the “Ca. Liberibacter asiaticus” bacterium into Florida. PMID:21784907
The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion.

PubMed

Ni, Lianghong; Zhao, Zhili; Xu, Hongxi; Chen, Shilin; Dorje, Gaawe

2016-02-15

Endemic to the Sino-Himalayan subregion, the medicinal alpine plant Gentiana straminea is a threatened species. The genetic and molecular data about it is deficient. Here we report the complete chloroplast (cp) genome sequence of G. straminea, as the first sequenced member of the family Gentianaceae. The cp genome is 148,991bp in length, including a large single copy (LSC) region of 81,240bp, a small single copy (SSC) region of 17,085bp and a pair of inverted repeats (IRs) of 25,333bp. It contains 112 unique genes, including 78 protein-coding genes, 30 tRNAs and 4 rRNAs. The rps16 gene lacks exon2 between trnK-UUU and trnQ-UUG, which is the first rps16 pseudogene found in the nonparasitic plants of Asterids clade. Sequence analysis revealed the presence of 13 forward repeats, 13 palindrome repeats and 39 simple sequence repeats (SSRs). An entire cp genome comparison study of G. straminea and four other species in Gentianales was carried out. Phylogenetic analyses using maximum likelihood (ML) and maximum parsimony (MP) were performed based on 69 protein-coding genes from 36 species of Asterids. The results strongly supported the position of Gentianaceae as one member of the order Gentianales. The complete chloroplast genome sequence will provide intragenic information for its conservation and contribute to research on the genetic and phylogenetic analyses of Gentianales and Asterids. Copyright © 2015 Elsevier B.V. All rights reserved.
Variation, Repetition, and Choice

ERIC Educational Resources Information Center

Abreu-Rodrigues, Josele; Lattal, Kennon A.; dos Santos, Cristiano V.; Matos, Ricardo A.

2005-01-01

Experiment 1 investigated the controlling properties of variability contingencies on choice between repeated and variable responding. Pigeons were exposed to concurrent-chains schedules with two alternatives. In the REPEAT alternative, reinforcers in the terminal link depended on a single sequence of four responses. In the VARY alternative, a…
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

DOE Office of Scientific and Technical Information (OSTI.GOV)

VanBuren, Robert; Bryant, Doug; Edger, Patrick P.

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

DOE PAGES

VanBuren, Robert; Bryant, Doug; Edger, Patrick P.; ...

2015-11-11

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Competition between B-Z and B-L transitions in a single DNA molecule: Computational studies

NASA Astrophysics Data System (ADS)

Kwon, Ah-Young; Nam, Gi-Moon; Johner, Albert; Kim, Seyong; Hong, Seok-Cheol; Lee, Nam-Kyung

2016-02-01

Under negative torsion, DNA adopts left-handed helical forms, such as Z-DNA and L-DNA. Using the random copolymer model developed for a wormlike chain, we represent a single DNA molecule with structural heterogeneity as a helical chain consisting of monomers which can be characterized by different helical senses and pitches. By Monte Carlo simulation, where we take into account bending and twist fluctuations explicitly, we study sequence dependence of B-Z transitions under torsional stress and tension focusing on the interaction with B-L transitions. We consider core sequences, (GC) n repeats or (TG) n repeats, which can interconvert between the right-handed B form and the left-handed Z form, imbedded in a random sequence, which can convert to left-handed L form with different (tension dependent) helical pitch. We show that Z-DNA formation from the (GC) n sequence is always supported by unwinding torsional stress but Z-DNA formation from the (TG) n sequence, which are more costly to convert but numerous, can be strongly influenced by the quenched disorder in the surrounding random sequence.
Stabilization of perfect and imperfect tandem repeats by single-strand DNA exonucleases

PubMed Central

Feschenko, Vladimir V.; Rajman, Luis A.; Lovett, Susan T.

2003-01-01

Rearrangements between tandemly repeated DNA sequences are a common source of genetic instability. Such rearrangements underlie several human genetic diseases. In many organisms, the mismatch-repair (MMR) system functions to stabilize repeats when the repeat unit is short or when sequence imperfections are present between the repeats. We show here that the action of single-stranded DNA (ssDNA) exonucleases plays an additional, important role in stabilizing tandem repeats, independent of their role in MMR. For perfect repeats of ≈100 bp in Escherichia coli that are not susceptible to MMR, exonuclease (Exo)-I, ExoX, and RecJ exonuclease redundantly inhibit deletion. Our data suggest that >90% of potential deletion events are avoided by the combined action of these three exonucleases. Imperfect tandem repeats, less prone to rearrangements, are stabilized by both the MMR-pathway and ssDNA-specific exonucleases. For 100-bp repeats containing four mispairs, ExoI alone aborts most deletion events, even in the presence of a functional MMR system. By genetic analysis, we show that the inhibitory effect of ssDNA exonucleases on deletion formation is independent of the MutS and UvrD proteins. Exonuclease degradation of DNA displaced during the deletion process may abort slipped misalignment. Exonuclease action is therefore a significant force in genetic stabilization of many forms of repetitive DNA. PMID:12538867
Single Amino Acid Substitutions at Specific Positions of the Heptad Repeat Sequence of Piscidin-1 Yielded Novel Analogs That Show Low Cytotoxicity and In Vitro and In Vivo Antiendotoxin Activity

PubMed Central

Kumar, Amit; Tripathi, Amit Kumar; Kathuria, Manoj; Shree, Sonal; Tripathi, Jitendra Kumar; Purshottam, R. K.; Ramachandran, Ravishankar; Mitra, Kalyan

2016-01-01

Piscidin-1 possesses significant antimicrobial and cytotoxic activities. To recognize the primary amino acid sequence(s) in piscidin-1 that could be important for its biological activity, a long heptad repeat sequence located in the region from amino acids 2 to 19 was identified. To comprehend the possible role of this motif, six analogs of piscidin-1 were designed by selectively replacing a single isoleucine residue at a d (5th) position or at an a (9th or 16th) position with either an alanine or a valine residue. Two more analogs, namely, I5F,F6A-piscidin-1 and V12I-piscidin-1, were designed for investigating the effect of interchanging an alanine residue at a d position with an adjacent phenylalanine residue and replacing a valine residue with an isoleucine residue at another d position of the heptad repeat of piscidin-1, respectively. Single alanine-substituted analogs exhibited significantly reduced cytotoxicity against mammalian cells compared with that of piscidin-1 but appreciably retained the antibacterial and antiendotoxin activities of piscidin-1. All the single valine-substituted piscidin-1 analogs and I5F,F6A-piscidin-1 showed cytotoxicity greater than that of the corresponding alanine-substituted analogs, antibacterial activity marginally greater than or similar to that of the corresponding alanine-substituted analogs, and also antiendotoxin activity superior to that of the corresponding alanine-substituted analogs. Interestingly, among these peptides, V12I-piscidin-1 showed the highest cytotoxicity and antibacterial and antiendotoxin activities. Lipopolysaccharide (12 mg/kg of body weight)-treated mice, further treated with I16A-piscidin-1, the piscidin-1 analog with the highest therapeutic index, at a single dose of 1 or 2 mg/kg of body weight, showed 80 and 100% survival, respectively. Structural and functional characterization of these peptides revealed the basis of their biological activity and demonstrated that nontoxic piscidin-1 analogs with significant antimicrobial and antiendotoxin activities can be designed by incorporating single alanine substitutions in the piscidin-1 heptad repeat. PMID:27067326
The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum).

PubMed

Zeng, Fan-chun; Gao, Cheng-wen; Gao, Li-zhi

2016-01-01

The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum) is reported and characterized in this study. The genome size is 156,612 bp, containing a pair of inverted repeats (IRs) of 25,776 bp separated by a large single-copy region of 87,213 bp and a small single-copy region of 17,851 bp. The chloroplast genome harbors 130 known genes, including 89 protein-coding genes, 8 ribosomal RNA genes, and 37 tRNA genes. A total of 18 of these genes are duplicated in the inverted repeat regions, 16 genes contain 1 intron, and 2 genes and one ycf have 2 introns.
Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

PubMed Central

Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

2007-01-01

We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688
Optimization of sequence alignment for simple sequence repeat regions.

PubMed

Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C

2011-07-20

Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.

Development of genome-wide SNP assays for rice

USDA-ARS?s Scientific Manuscript database

With the introduction of new sequencing technologies, single nucleotide polymorphisms (SNPs) are rapidly replacing simple sequence repeats (SSRs) as the DNA marker of choice for applications in plant breeding and genetics because they are more abundant, stable, amenable to automation, efficient, and...
Nucleotide sequences of Dictyostelium discoideum developmentally regulated cDNAs rich in (AAC) imply proteins that contain clusters of asparagine, glutamine, or threonine.

PubMed

Shaw, D R; Richter, H; Giorda, R; Ohmachi, T; Ennis, H L

1989-09-01

A Dictyostelium discoideum repetitive element composed of long repeats of the codon (AAC) is found in developmentally regulated transcripts. The concentration of (AAC) sequences is low in mRNA from dormant spores and growing cells and increases markedly during spore germination and multicellular development. The sequence hybridizes to many different sized Dictyostelium DNA restriction fragments indicating that it is scattered throughout the genome. Four cDNA clones isolated contain (AAC) sequences in the deduced coding region. Interestingly, the (AAC)-rich sequences are present in all three reading frames in the deduced proteins, i.e., AAC (asparagine), ACA (threonine) and CAA (glutamine). Three of the clones contain only one of these in-frame so that the individual proteins carry either asparagine, threonine, or glutamine clusters, not mixtures. However, one clone is both glutamine- and asparagine-rich. The (AAC) portion of the transcripts are reiterated 300 times in the haploid genome while the other portions of the cDNAs represent single copy genes, whose sequences show no similarity other than the (AAC) repeats. The repeated sequence is similar to the opa or M sequence found in Drosophila melanogaster notch and homeo box genes and in fly developmentally regulated transcripts. The transcripts are present on polysomes suggesting that they are translated. Although the function of these repeats is unknown, long amino acid repeats are a characteristic feature of extracellular proteins of lower eukaryotes.
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

PubMed

Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

2018-05-01

STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development.

PubMed

Sun, Cheng; Wyngaard, Grace; Walton, D Brian; Wichman, Holly A; Mueller, Rachel Lockridge

2014-03-11

Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution--some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 - 75 Gb, 12-74 Gb of which are lost from pre-somatic cell lineages at germline--soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms.
Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development

PubMed Central

2014-01-01

Background Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution — some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 – 75 Gb, 12–74 Gb of which are lost from pre-somatic cell lineages at germline – soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Results Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Conclusions Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms. PMID:24618421
Genetic and DNA sequence analysis of the kanamycin resistance transposon Tn903.

PubMed Central

Grindley, N D; Joyce, C M

1980-01-01

The kanamycin resistance transposon Tn903 consists of a unique region of about 1000 base pairs bounded by a pair of 1050-base-pair inverted repeat sequences. Each repeat contains two Pvu II endonuclease cleavage sites separated by 520 base pairs. We have constructed derivatives of Tn903 in which this 520-base-pair fragment is deleted from one or both repeats. Those derivatives that lack both 520-base-pair fragments cannot transpose, whereas those that lack just one remain transposition proficient. One such transposable derivative, Tn903 delta I, has been selected for further study. We have determined the sequence of the intact inverted repeat. The 18 base pairs at each end are identical and inverted relative to one another, a structure characteristic of insertion sequences. Additional experiments indicate that a single inverted repeat from Tn903 can, in fact, transpose; we propose that this element be called IS903. To correlate the DNA sequence with genetic activities, we have created mutations by inserting a 10-base-pair DNA fragment at several sites within the intact repeat of Tn903 delta 1, and we have examined the effect of such insertions on transposability. The results suggest that IS903 encodes a 307-amino-acid polypeptide (a "transposase") that is absolutely required for transposition of IS903 or Tn903. Images PMID:6261245
Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

PubMed

Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

2017-08-01

In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.
Novel protein domains and repeats in Drosophila melanogaster: insights into structure, function, and evolution.

PubMed

Ponting, C P; Mott, R; Bork, P; Copley, R R

2001-12-01

Sequence database searching methods such as BLAST, are invaluable for predicting molecular function on the basis of sequence similarities among single regions of proteins. Searches of whole databases however, are not optimized to detect multiple homologous regions within a single polypeptide. Here we have used the prospero algorithm to perform self-comparisons of all predicted Drosophila melanogaster gene products. Predicted repeats, and their homologs from all species, were analyzed further to detect hitherto unappreciated evolutionary relationships. Results included the identification of novel tandem repeats in the human X-linked retinitis pigmentosa type-2 gene product, repeated segments in cystinosin, associated with a defect in cystine transport, and 'nested' homologous domains in dysferlin, whose gene is mutated in limb girdle muscular dystrophy. Novel signaling domain families were found that may regulate the microtubule-based cytoskeleton and ubiquitin-mediated proteolysis, respectively. Two families of glycosyl hydrolases were shown to contain internal repetitions that hint at their evolution via a piecemeal, modular approach. In addition, three examples of fruit fly genes were detected with tandem exons that appear to have arisen via internal duplication. These findings demonstrate how completely sequenced genomes can be exploited to further understand the relationships between molecular structure, function, and evolution.
Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.

PubMed

Kippert, Fred; Gerloff, Dietlind L

2009-09-24

HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains.
Highly Sensitive Detection of Individual HEAT and ARM Repeats with HHpred and COACH

PubMed Central

Kippert, Fred; Gerloff, Dietlind L.

2009-01-01

Background HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. Methodology and Principal Findings Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. Significance A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains. PMID:19777061
Chloroplast genome of Aconitum barbatum var. puberulum (Ranunculaceae) derived from CCS reads using the PacBio RS platform.

PubMed

Chen, Xiaochen; Li, Qiushi; Li, Ying; Qian, Jun; Han, Jianping

2015-01-01

The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum.
Chloroplast genome of Aconitum barbatum var. puberulum (Ranunculaceae) derived from CCS reads using the PacBio RS platform

PubMed Central

Chen, Xiaochen; Li, Qiushi; Li, Ying; Qian, Jun; Han, Jianping

2015-01-01

The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum. PMID:25705213
A Simple Sequence Repeat- and Single-Nucleotide Polymorphism-Based Genetic Linkage Map of the Brown Planthopper, Nilaparvata lugens

PubMed Central

Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi

2013-01-01

In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH. PMID:23204257
Reducing assembly complexity of microbial genomes with single-molecule sequencing.

PubMed

Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M

2013-01-01

The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Regulation of cell fate determination by single-repeat R3 MYB transcription factors in Arabidopsis

PubMed Central

Wang, Shucai; Chen, Jin-Gui

2014-01-01

MYB transcription factors regulate multiple aspects of plant growth and development. Among the large family of MYB transcription factors, single-repeat R3 MYBs are characterized by their short sequence (<120 amino acids) consisting largely of the single MYB DNA-binding repeat. In the model plant Arabidopsis, R3 MYBs mediate lateral inhibition during epidermal patterning and are best characterized for their regulatory roles in trichome and root hair development. R3 MYBs act as negative regulators for trichome formation but as positive regulators for root hair development. In this article, we provide a comprehensive review on the role of R3 MYBs in the regulation of cell type specification in the model plant Arabidopsis. PMID:24782874
Regulation of Cell Fate Determination by Single-Repeat R3 MYB Transcription Factors in Arabidopsis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Shucai; Chen, Jay

2014-01-01

MYB transcription factors regulate multiple aspects of plant growth and development. Among the large family of MYB transcription factors, single-repeat R3 MYB are characterized by their short sequence (<120 amino acids) consisting largely of the single MYB DNA-binding repeat. In the model plant Arabidopsis, R3 MYBs mediate lateral inhibition during epidermal patterning and are best characterized for their regulatory roles in trichome and root hair development. R3 MYBs act as negative regulators for trichome formation but as positive regulators for root hair development. In this article, we provide a comprehensive review on the role of R3 MYBs in the regulationmore » of cell type specification in the model plant Arabidopsis.« less
The impact of CRISPR repeat sequence on structures of a Cas6 protein-RNA complex

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Ruiying; Zheng, Han; Preamplume, Gan

The repeat-associated mysterious proteins (RAMPs) comprise the most abundant family of proteins involved in prokaryotic immunity against invading genetic elements conferred by the clustered regularly interspaced short palindromic repeat (CRISPR) system. Cas6 is one of the first characterized RAMP proteins and is a key enzyme required for CRISPR RNA maturation. Despite a strong structural homology with other RAMP proteins that bind hairpin RNA, Cas6 distinctly recognizes single-stranded RNA. Previous structural and biochemical studies show that Cas6 captures the 5' end while cleaving the 3' end of the CRISPR RNA. Here, we describe three structures and complementary biochemical analysis of amore » noncatalytic Cas6 homolog from Pyrococcus horikoshii bound to CRISPR repeat RNA of different sequences. Our study confirms the specificity of the Cas6 protein for single-stranded RNA and further reveals the importance of the bases at Positions 5-7 in Cas6-RNA interactions. Substitutions of these bases result in structural changes in the protein-RNA complex including its oligomerization state.« less
Transcription arrest by a G quadruplex forming-trinucleotide repeat sequence from the human c-myb gene.

PubMed

Broxson, Christopher; Beckett, Joshua; Tornaletti, Silvia

2011-05-17

Non canonical DNA structures correspond to genomic regions particularly susceptible to genetic instability. The transcription process facilitates formation of these structures and plays a major role in generating the instability associated with these genomic sites. However, little is known about how non canonical structures are processed when encountered by an elongating RNA polymerase. Here we have studied the behavior of T7 RNA polymerase (T7RNAP) when encountering a G quadruplex forming-(GGA)(4) repeat located in the human c-myb proto-oncogene. To make direct correlations between formation of the structure and effects on transcription, we have taken advantage of the ability of the T7 polymerase to transcribe single-stranded substrates and of G4 DNA to form in single-stranded G-rich sequences in the presence of potassium ions. Under physiological KCl concentrations, we found that T7 RNAP transcription was arrested at two sites that mapped to the c-myb (GGA)(4) repeat sequence. The extent of arrest did not change with time, indicating that the c-myb repeat represented an absolute block and not a transient pause to T7 RNAP. Consistent with G4 DNA formation, arrest was not observed in the absence of KCl or in the presence of LiCl. Furthermore, mutations in the c-myb (GGA)(4) repeat, expected to prevent transition to G4, also eliminated the transcription block. We show T7 RNAP arrest at the c-myb repeat in double-stranded DNA under conditions mimicking the cellular concentration of biomolecules and potassium ions, suggesting that the G4 structure formed in the c-myb repeat may represent a transcription roadblock in vivo. Our results support a mechanism of transcription-coupled DNA repair initiated by arrest of transcription at G4 structures.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

PubMed

Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

PubMed Central

Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

2018-01-01

Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441

DNA sequences and composition from 12 BAC clones-derived MUSB SSR markers mapped to cotton (Gossypium Hirsutum L. x G. Barbadense L.)chromosomes 11 and 21

USDA-ARS?s Scientific Manuscript database

To discover resistance (R) and/or pathogen-induced (PR) genes involved in disease response, 12 bacterial artificial chromosome (BAC) clones from cv. Acala Maxxa (G. hirsutum) were sequenced at the Clemson University, Genomics Institute, Clemson, SC. These BACs derived MUSB single sequence repeat (SS...
The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing.

PubMed

Raveendar, Sebastin; Na, Young-Wang; Lee, Jung-Ro; Shim, Donghwan; Ma, Kyung-Ho; Lee, Sok-Young; Chung, Jong-Wook

2015-07-20

Chloroplast (cp) genome sequences provide a valuable source for DNA barcoding. Molecular phylogenetic studies have concentrated on DNA sequencing of conserved gene loci. However, this approach is time consuming and more difficult to implement when gene organization differs among species. Here we report the complete re-sequencing of the cp genome of Capsicum pepper (Capsicum annuum var. glabriusculum) using the Illumina platform. The total length of the cp genome is 156,817 bp with a 37.7% overall GC content. A pair of inverted repeats (IRs) of 50,284 bp were separated by a small single copy (SSC; 18,948 bp) and a large single copy (LSC; 87,446 bp). The number of cp genes in C. annuum var. glabriusculum is the same as that in other Capsicum species. Variations in the lengths of LSC; SSC and IR regions were the main contributors to the size variation in the cp genome of this species. A total of 125 simple sequence repeat (SSR) and 48 insertions or deletions variants were found by sequence alignment of Capsicum cp genome. These findings provide a foundation for further investigation of cp genome evolution in Capsicum and other higher plants.
DNA CTG triplet repeats involved in dynamic mutations of neurologically related gene sequences form stable duplexes

NASA Technical Reports Server (NTRS)

Smith, G. K.; Jie, J.; Fox, G. E.; Gao, X.

1995-01-01

DNA triplet repeats, 5'-d(CTG)n and 5'-d(CAG)n, are present in genes which have been implicated in several neurodegenerative disorders. To investigate possible stable structures formed by these repeating sequences, we have examined d(CTG)n, d(CAG)n and d(CTG).d(CAG)n (n = 2 and 3) using NMR and UV optical spectroscopy. These studies reveal that single stranded (CTG)n (n > 2) forms stable, antiparallel helical duplexes, while the single stranded (CAG)n requires at least three repeating units to form a duplex. NMR and UV melting experiments show that the Tm increases in the order of [(CAG)3]2 < [(CTG)3]2 << (CAG)3.(CTG)3. The (CTG)3 duplex is stable and exhibits similar NMR spectra in solutions containing 0.1-4 M NaCl and at a pH range from 4.6 to 8.8. The (CTG)3 duplex, which contains multiple-T.T mismatches, displays many NMR spectral characteristics similar to those of B-form DNA. However, unique NOE and 1H-31P coupling patterns associated with the repetitive T.T mismatches in the CTG repeats are discerned. These results, in conjunction with recent in vitro studies suggest that longer CTG repeats may form hairpin structures, which can potentially cause interruption in replication, leading to dynamic expansion or deletion of triplet repeats.
A Dynamic Tandem Repeat in Monocotyledons Inferred from a Comparative Analysis of Chloroplast Genomes in Melanthiaceae.

PubMed

Do, Hoang Dang Khoa; Kim, Joo-Hwan

2017-01-01

Chloroplast genomes (cpDNA) are highly valuable resources for evolutionary studies of angiosperms, since they are highly conserved, are small in size, and play critical roles in plants. Slipped-strand mispairing (SSM) was assumed to be a mechanism for generating repeat units in cpDNA. However, research on the employment of different small repeated sequences through SSM events, which may induce the accumulation of distinct types of repeats within the same region in cpDNA, has not been documented. Here, we sequenced two chloroplast genomes from the endemic species Heloniopsis tubiflora (Korea) and Xerophyllum tenax (USA) to cover the gap between molecular data and explore "hot spots" for genomic events in Melanthiaceae. Comparative analysis of 23 complete cpDNA sequences revealed that there were different stages of deletion in the rps16 region across the Melanthiaceae. Based on the partial or complete loss of rps16 gene in cpDNA, we have firstly reported potential molecular markers for recognizing two sections ( Veratrum and Fuscoveratrum ) of Veratrum . Melathiaceae exhibits a significant change in the junction between large single copy and inverted repeat regions, ranging from trnH_GUG to a part of rps3 . Our results show an accumulation of tandem repeats in the rpl23-ycf2 regions of cpDNAs. Small conserved sequences exist and flank tandem repeats in further observation of this region across most of the examined taxa of Liliales. Therefore, we propose three scenarios in which different small repeated sequences were used during SSM events to generate newly distinct types of repeats. Occasionally, prior to the SSM process, point mutation event and double strand break repair occurred and induced the formation of initial repeat units which are indispensable in the SSM process. SSM may have likely occurred more frequently for short repeats than for long repeat sequences in tribe Parideae (Melanthiaceae, Liliales). Collectively, these findings add new evidence of dynamic results from SSM in chloroplast genomes which can be useful for further evolutionary studies in angiosperms. Additionally, genomics events in cpDNA are potential resources for mining molecular markers in Liliales.
Direct repeat sequences are essential for function of the cis-acting locus of transfer (clt) of Streptomyces phaeochromogenes plasmid pJV1.

PubMed

Franco, Bernardo; González-Cerón, Gabriela; Servín-González, Luis

2003-11-01

The functionality of direct and inverted repeat sequences inside the cis acting locus of transfer (clt) of the Streptomyces plasmid pJV1 was determined by testing the effect of different deletions on plasmid transfer. The results show that the single most important element for pJV1 clt function is a series of evenly spaced 9 bp long direct repeats which match the consensus CCGCACA(C/G)(C/G), since their deletion caused a dramatic reduction in plasmid transfer. The presence of these repeats in the absence of any other clt sequences allowed plasmid transfer to occur at a frequency that was at least two orders of magnitude higher than that obtained in the complete absence of clt. A database search revealed regions with a similar organization, and in the same position, in Streptomyces plasmids pSN22 and pSLS, which have transfer proteins homologous to those of pJV1.
Short intronic repeat sequences facilitate circular RNA production

PubMed Central

Liang, Dongming

2014-01-01

Recent deep sequencing studies have revealed thousands of circular noncoding RNAs generated from protein-coding genes. These RNAs are produced when the precursor messenger RNA (pre-mRNA) splicing machinery “backsplices” and covalently joins, for example, the two ends of a single exon. However, the mechanism by which the spliceosome selects only certain exons to circularize is largely unknown. Using extensive mutagenesis of expression plasmids, we show that miniature introns containing the splice sites along with short (∼30- to 40-nucleotide) inverted repeats, such as Alu elements, are sufficient to allow the intervening exons to circularize in cells. The intronic repeats must base-pair to one another, thereby bringing the splice sites into close proximity to each other. More than simple thermodynamics is clearly at play, however, as not all repeats support circularization, and increasing the stability of the hairpin between the repeats can sometimes inhibit circular RNA biogenesis. The intronic repeats and exonic sequences must collaborate with one another, and a functional 3′ end processing signal is required, suggesting that circularization may occur post-transcriptionally. These results suggest detailed and generalizable models that explain how the splicing machinery determines whether to produce a circular noncoding RNA or a linear mRNA. PMID:25281217
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

PubMed Central

2011-01-01

Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
Electronic Transport in Single-Stranded DNA Molecule Related to Huntington's Disease

NASA Astrophysics Data System (ADS)

Sarmento, R. G.; Silva, R. N. O.; Madeira, M. P.; Frazão, N. F.; Sousa, J. O.; Macedo-Filho, A.

2018-04-01

We report a numerical analysis of the electronic transport in single chain DNA molecule consisting of 182 nucleotides. The DNA chains studied were extracted from a segment of the human chromosome 4p16.3, which were modified by expansion of CAG (cytosine-adenine-guanine) triplet repeats to mimics Huntington's disease. The mutated DNA chains were connected between two platinum electrodes to analyze the relationship between charge propagation in the molecule and Huntington's disease. The computations were performed within a tight-binding model, together with a transfer matrix technique, to investigate the current-voltage (I-V) of 23 types of DNA sequence and compare them with the distributions of the related CAG repeat numbers with the disease. All DNA sequences studied have a characteristic behavior of a semiconductor. In addition, the results showed a direct correlation between the current-voltage curves and the distributions of the CAG repeat numbers, suggesting possible applications in the development of DNA-based biosensors for molecular diagnostics.
Modular probes for enriching and detecting complex nucleic acid sequences

NASA Astrophysics Data System (ADS)

Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David Yu

2017-12-01

Complex DNA sequences are difficult to detect and profile, but are important contributors to human health and disease. Existing hybridization probes lack the capability to selectively bind and enrich hypervariable, long or repetitive sequences. Here, we present a generalized strategy for constructing modular hybridization probes (M-Probes) that overcomes these challenges. We demonstrate that M-Probes can tolerate sequence variations of up to 7 nt at prescribed positions while maintaining single nucleotide sensitivity at other positions. M-Probes are also shown to be capable of sequence-selectively binding a continuous DNA sequence of more than 500 nt. Furthermore, we show that M-Probes can detect genes with triplet repeats exceeding a programmed threshold. As a demonstration of this technology, we have developed a hybrid capture method to determine the exact triplet repeat expansion number in the Huntington's gene of genomic DNA using quantitative PCR.
Evolution and selection of Rhg1, a copy-number variant nematode-resistance locus

PubMed Central

Lee, Tong Geon; Kumar, Indrajit; Diers, Brian W; Hudson, Matthew E

2015-01-01

The soybean cyst nematode (SCN) resistance locus Rhg1 is a tandem repeat of a 31.2 kb unit of the soybean genome. Each 31.2-kb unit contains four genes. One allele of Rhg1, Rhg1-b, is responsible for protecting most US soybean production from SCN. Whole-genome sequencing was performed, and PCR assays were developed to investigate allelic variation in sequence and copy number of the Rhg1 locus across a population of soybean germplasm accessions. Four distinct sequences of the 31.2-kb repeat unit were identified, and some Rhg1 alleles carry up to three different types of repeat unit. The total number of copies of the repeat varies from 1 to 10 per haploid genome. Both copy number and sequence of the repeat correlate with the resistance phenotype, and the Rhg1 locus shows strong signatures of selection. Significant linkage disequilibrium in the genome outside the boundaries of the repeat allowed the Rhg1 genotype to be inferred using high-density single nucleotide polymorphism genotyping of 15 996 accessions. Over 860 germplasm accessions were found likely to possess Rhg1 alleles. The regions surrounding the repeat show indications of non-neutral evolution and high genetic variability in populations from different geographic locations, but without evidence of fixation of the resistant genotype. A compelling explanation of these results is that balancing selection is in operation at Rhg1. PMID:25735447
Targeting of Repeated Sequences Unique to a Gene Results in Significant Increases in Antisense Oligonucleotide Potency

PubMed Central

Vickers, Timothy A.; Freier, Susan M.; Bui, Huynh-Hoa; Watt, Andrew; Crooke, Stanley T.

2014-01-01

A new strategy for identifying potent RNase H-dependent antisense oligonucleotides (ASOs) is presented. Our analysis of the human transcriptome revealed that a significant proportion of genes contain unique repeated sequences of 16 or more nucleotides in length. Activities of ASOs targeting these repeated sites in several representative genes were compared to those of ASOs targeting unique single sites in the same transcript. Antisense activity at repeated sites was also evaluated in a highly controlled minigene system. Targeting both native and minigene repeat sites resulted in significant increases in potency as compared to targeting of non-repeated sites. The increased potency at these sites is a result of increased frequency of ASO/RNA interactions which, in turn, increases the probability of a productive interaction between the ASO/RNA heteroduplex and human RNase H1 in the cell. These results suggest a new, highly efficient strategy for rapid identification of highly potent ASOs. PMID:25334092
The complete chloroplast genome sequence of Epipremnum aureum and its comparative analysis among eight Araceae species

PubMed Central

Han, Limin; Chen, Chen; Wang, Zhezhi

2018-01-01

Epipremnum aureum is an important foliage plant in the Araceae family. In this study, we have sequenced the complete chloroplast genome of E. aureum by using Illumina Hiseq sequencing platforms. This genome is a double-stranded circular DNA sequence of 164,831 bp that contains 35.8% GC. The two inverted repeats (IRa and IRb; 26,606 bp) are spaced by a small single-copy region (22,868 bp) and a large single-copy region (88,751 bp). The chloroplast genome has 131 (113 unique) functional genes, including 86 (79 unique) protein-coding genes, 37 (30 unique) tRNA genes, and eight (four unique) rRNA genes. Tandem repeats comprise the majority of the 43 long repetitive sequences. In addition, 111 simple sequence repeats are present, with mononucleotides being the most common type and di- and tetranucleotides being infrequent events. Positive selection pressure on rps12 in the E. aureum chloroplast has been demonstrated via synonymous and nonsynonymous substitution rates and selection pressure sites analyses. Ycf15 and infA are pseudogenes in this species. We constructed a Maximum Likelihood phylogenetic tree based on the complete chloroplast genomes of 38 species from 13 families. Those results strongly indicated that E. aureum is positioned as the sister of Colocasia esculenta within the Araceae family. This work may provide information for further study of the molecular phylogenetic relationships within Araceae, as well as molecular markers and breeding novel varieties by chloroplast genetic-transformation of E. aureum in particular. PMID:29529038
Construction of a SNP and SSR linkage map in autotetraploid blueberry using genotyping by sequencing

USDA-ARS?s Scientific Manuscript database

A mapping population developed from a cross between two key highbush blueberry cultivars, Draper × Jewel (Vaccinium corymbosum), segregating for a number of important phenotypic traits, has been utilized to produce a genetic linkage map. Data on 233 single sequence repeat (SSR) markers and 1794 sing...
Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

PubMed Central

Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

2012-01-01

Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604
Complete Sequence and Comparative Analysis of the Chloroplast Genome of Coconut Palm (Cocos nucifera)

PubMed Central

Huang, Ya-Yi; Matzke, Antonius J. M.; Matzke, Marjori

2013-01-01

Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available. PMID:24023703
Complete sequence and comparative analysis of the chloroplast genome of coconut palm (Cocos nucifera).

PubMed

Huang, Ya-Yi; Matzke, Antonius J M; Matzke, Marjori

2013-01-01

Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available.
Discovery, Validation and Characterization of 1039 Cattle Single Nucleotide Polymorphisms

USDA-ARS?s Scientific Manuscript database

We identified approximately 13000 putative single nucleotide polymorphisms (SNPs) by comparison of repeat-masked BAC-end sequences from the cattle RPCI-42 BAC library with whole-genome shotgun contigs of cattle genome assembly Btau 1.0. Genotyping of a subset of these SNPs was performed on a panel ...
Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

PubMed Central

Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

2007-01-01

Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571
Complete Chloroplast Genome of Pinus massoniana (Pinaceae): Gene Rearrangements, Loss of ndh Genes, and Short Inverted Repeats Contraction, Expansion.

PubMed

Ni, ZhouXian; Ye, YouJu; Bai, Tiandao; Xu, Meng; Xu, Li-An

2017-09-11

The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.
Complete Chloroplast Genome Sequences of Important Oilseed Crop Sesamum indicum L

PubMed Central

Yi, Dong-Keun; Kim, Ki-Joong

2012-01-01

Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques. PMID:22606240

Typing of artiodactyl MHC-DRB genes with the help of intronic simple repeated DNA sequences.

PubMed

Schwaiger, F W; Buitkamp, J; Weyers, E; Epplen, J T

1993-02-01

An efficient oligonucleotide typing method for the highly polymorphic MHC-DRB genes is described for artiodactyls like cattle, sheep and goat. By means of the polymerase chain reaction, the second exon of MHC-DRB is amplified as well as part of the adjacent intron containing a mixed simple repeat sequence. Using this primer combination we were able to amplify the MHC-DRB exons 2 and adjacent introns from all of the investigated 10 species of the family of Bovidae and giraffes. Therefore, the DRB genes of novel artiodactyl species can also be readily studied. Oligonucleotide probes specific for the polymorphisms of ungulate DRB genes are used with which sequences differing in at least one single base can be distinguished. Exonic polymorphism was found to be correlated with the allele lengths and the patterns of the repeat structures. Hence oligonucleotide probes specific for different simple repeats and polymorphic positions serve also for typing across species barriers. The strict correlation of sequence length and exonic polymorphism permits a preselection of specific oligonucleotides for hybridization. Thus more than 20 alleles can already be differentiated from each of the three species.
Construction of a small Mus musculus repetitive DNA library: identification of a new satellite sequence in Mus musculus.

PubMed Central

Pietras, D F; Bennett, K L; Siracusa, L D; Woodworth-Gutai, M; Chapman, V M; Gross, K W; Kane-Haas, C; Hastie, N D

1983-01-01

We report the construction of a small library of recombinant plasmids containing Mus musculus repetitive DNA inserts. The repetitive cloned fraction was derived from denatured genomic DNA by reassociation to a Cot value at which repetitive, but not unique, sequences have reannealed followed by exhaustive S1 nuclease treatment to degrade single stranded DNA. Initial characterizations of this library by colony filter hybridizations have led to the identification of a previously undetected M. musculus minor satellite as well as to clones containing M. musculus major satellite sequences. This new satellite is repeated 10-20 times less than the major satellite in the M. musculus genome. It has a repeat length of 130 nucleotides compared with the M. musculus major satellite with a repeat length of 234 nucleotides. Sequence analysis of the minor satellite has shown that it has a 29 base pair region with extensive homology to one of the major satellite repeating subunits. We also show by in situ hybridization that this minor satellite sequence is located at the centromeres and possibly the arms of at least half the M musculus chromosomes. Sequences related to the minor satellite have been found in the DNA of a related Mus species, Mus spretus, and may represent the major satellite of that species. Images PMID:6314268
The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three asterids.

PubMed

Yi, Dong-Keun; Lee, Hae-Lim; Sun, Byung-Yun; Chung, Mi Yoon; Kim, Ki-Joong

2012-05-01

This study reports the complete chloroplast (cp) DNA sequence of Eleutherococcus senticosus (GenBank: JN 637765), an endangered endemic species. The genome is 156,768 bp in length, and contains a pair of inverted repeat (IR) regions of 25,930 bp each, a large single copy (LSC) region of 86,755 bp and a small single copy (SSC) region of 18,153 bp. The structural organization, gene and intron contents, gene order, AT content, codon usage, and transcription units of the E. senticosus chloroplast genome are similar to that of typical land plant cp DNA. We aligned and analyzed the sequences of 86 coding genes, 19 introns and 113 intergenic spacers (IGS) in three different taxonomic hierarchies; Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. The distribution of indels, the number of polymorphic sites and nucleotide diversity indicate that positional constraint is more important than functional constraint for the evolution of cp genome sequences in Asterids. For example, the intron sequences in the LSC region exhibited base substitution rates 5-11-times higher than that of the IR regions, while the intron sequences in the SSC region evolved 7-14-times faster than those in the IR region. Furthermore, the Ka/Ks ratio of the gene coding sequences supports a stronger evolutionary constraint in the IR region than in the LSC or SSC regions. Therefore, our data suggest that selective sweeps by base collection mechanisms more frequently eliminate polymorphisms in the IR region than in other regions. Chloroplast genome regions that have high levels of base substitutions also show higher incidences of indels. Thirty-five simple sequence repeat (SSR) loci were identified in the Eleutherococcus chloroplast genome. Of these, 27 are homopolymers, while six are di-polymers and two are tri-polymers. In addition to the SSR loci, we also identified 18 medium size repeat units ranging from 22 to 79 bp, 11 of which are distributed in the IGS or intron regions. These medium size repeats may contribute to developing a cp genome-specific gene introduction vector because the region may use for specific recombination sites.
Isolation and mapping of telomeric pentanucleotide (TAACC)n repeats of the Pacific whiteleg shrimp, Penaeus vannamei, using fluorescence in situ hybridization.

PubMed

Alcivar-Warren, Acacia; Meehan-Meola, Dawn; Wang, Yongping; Guo, Ximing; Zhou, Linghua; Xiang, Jianhai; Moss, Shaun; Arce, Steve; Warren, William; Xu, Zhenkang; Bell, Kireina

2006-01-01

To develop genetic and physical maps for shrimp, accurate information on the actual number of chromosomes and a large number of genetic markers is needed. Previous reports have shown two different chromosome numbers for the Pacific whiteleg shrimp, Penaeus vannamei, the most important penaeid shrimp species cultured in the Western hemisphere. Preliminary results obtained by direct sequencing of clones from a Sau3A-digested genomic library of P. vannamei ovary identified a large number of (TAACC/GGTTA)-containing SSRs. The objectives of this study were to (1) examine the frequency of (TAACC)n repeats in 662 P. vannamei genomic clones that were directly sequenced, and perform homology searches of these clones, (2) confirm the number of chromosomes in testis of P. vannamei, and (3) localize the TAACC repeats in P. vannamei chromosome spreads using fluorescence in situ hybridization (FISH). Results for objective 1 showed that 395 out of the 662 clones sequenced contained single or multiple SSRs with three or more repeat motifs, 199 of which contained variable tandem repeats of the pentanucleotide (TAACC/GGTTA)n, with 3 to 14 copies per sequence. The frequency of (TAACC)n repeats in P. vannamei is 4.68 kb for SSRs with five or more repeat motifs. Sequence comparisons using the BLASTN nonredundant and expressed sequence tag (EST) databases indicated that most of the TAACC-containing clones were similar to either the core pentanucleotide repeat in PVPENTREP locus (GenBank accession no. X82619) or portions of 28S rRNA. Transposable elements (transposase for Tn1000 and reverse transcriptase family members), hypothetical or unnamed protein products, and genes of known function such as 18S and 28S rRNAs, heat shock protein 70, and thrombospondin were identified in non-TAACC-containing clones. For objective 2, the meiotic chromosome number of P. vannamei was confirmed as N = 44. For objective 3, four FISH probes (P1 to P4) containing different numbers of TAACC repeats produced positive signals on telomeres of P. vannamei chromosomes. A few chromosomes had positive signals interstitially. Probe signal strength and chromosome coverage differed in the general order of P1>P2>P3>P4, which correlated with the length of TAACC repeats within the probes: 83, 66, 35, and 30 bp, respectively, suggesting that the TAACC repeats, and not the flanking sequences, produced the TAACC signals at chromosome ends and TAACC is likely the telomere sequence for P. vannamei.
Short-Sequence DNA Repeats in Prokaryotic Genomes

PubMed Central

van Belkum, Alex; Scherer, Stewart; van Alphen, Loek; Verbrugh, Henri

1998-01-01

Short-sequence DNA repeat (SSR) loci can be identified in all eukaryotic and many prokaryotic genomes. These loci harbor short or long stretches of repeated nucleotide sequence motifs. DNA sequence motifs in a single locus can be identical and/or heterogeneous. SSRs are encountered in many different branches of the prokaryote kingdom. They are found in genes encoding products as diverse as microbial surface components recognizing adhesive matrix molecules and specific bacterial virulence factors such as lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and consequently phenotypic flexibility. SSRs function at various levels of gene expression regulation. Variations in the number of repeat units per locus or changes in the nature of the individual repeat sequences may result from recombination processes or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in combination with DNA repair deficiencies. These rather complex phenomena can occur with relative ease, with SSM approaching a frequency of 10−4 per bacterial cell division and allowing high-frequency genetic switching. Bacteria use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. SSR-mediated variation has important implications for bacterial pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows epidemiological studies on the spread of pathogenic bacteria. The occurrence, evolution and function of SSRs, and the molecular methods used to analyze them are discussed in the context of responsiveness to environmental factors, bacterial pathogenicity, epidemiology, and the availability of full-genome sequences for increasing numbers of microorganisms, especially those that are medically relevant. PMID:9618442
Selective recognition of N4-methylcytosine in DNA by engineered transcription-activator-like effectors.

PubMed

Rathi, Preeti; Maurer, Sara; Summerer, Daniel

2018-06-05

The epigenetic DNA nucleobases 5-methylcytosine (5mC) and N 4-methylcytosine (4mC) coexist in bacterial genomes and have important functions in host defence and transcription regulation. To better understand the individual biological roles of both methylated nucleobases, analytical strategies for distinguishing unmodified cytosine (C) from 4mC and 5mC are required. Transcription-activator-like effectors (TALEs) are programmable DNA-binding repeat proteins, which can be re-engineered for the direct detection of epigenetic nucleobases in user-defined DNA sequences. We here report the natural, cytosine-binding TALE repeat to not strongly differentiate between 5mC and 4mC. To engineer repeats with selectivity in the context of C, 5mC and 4mC, we developed a homogeneous fluorescence assay and screened a library of size-reduced TALE repeats for binding to all three nucleobases. This provided insights into the requirements of size-reduced TALE repeats for 4mC binding and revealed a single mutant repeat as a selective binder of 4mC. Employment of a TALE with this repeat in affinity enrichment enabled the isolation of a user-defined DNA sequence containing a single 4mC but not C or 5mC from the background of a bacterial genome. Comparative enrichments with TALEs bearing this or the natural C-binding repeat provides an approach for the complete, programmable decoding of all cytosine nucleobases found in bacterial genomes.This article is part of a discussion meeting issue 'Frontiers in epigenetic chemical biology'. © 2018 The Author(s).
Recombination–deletion between homologous cassettes in retrovirus is suppressed via a strategy of degenerate codon substitution

PubMed Central

Im, Eung Jun; Bais, Anthony J; Yang, Wen; Ma, Qiangzhong; Guo, Xiuyang; Sepe, Steven M; Junghans, Richard P

2014-01-01

Transduction and expression procedures in gene therapy protocols may optimally transfer more than a single gene to correct a defect and/or transmit new functions to recipient cells or organisms. This may be accomplished by transduction with two (or more) vectors, or, more efficiently, in a single vector. Occasionally, it may be useful to coexpress homologous genes or chimeric proteins with regions of shared homology. Retroviridae include the dominant vector systems for gene transfer (e.g., gamma-retro and lentiviruses) and are capable of such multigene expression. However, these same viruses are known for efficient recombination–deletion when domains are duplicated within the viral genome. This problem can be averted by resorting to two-vector strategies (two-chain two-vector), but at a penalty to cost, convenience, and efficiency. Employing a chimeric antigen receptor system as an example, we confirm that coexpression of two genes with homologous domains in a single gamma-retroviral vector (two-chain single-vector) leads to recombination–deletion between repeated sequences, excising the equivalent of one of the chimeric antigen receptors. Here, we show that a degenerate codon substitution strategy in the two-chain single-vector format efficiently suppressed intravector deletional loss with rescue of balanced gene coexpression by minimizing sequence homology between repeated domains and preserving the final protein sequence. PMID:25419532
Short intronic repeat sequences facilitate circular RNA production.

PubMed

Liang, Dongming; Wilusz, Jeremy E

2014-10-15

Recent deep sequencing studies have revealed thousands of circular noncoding RNAs generated from protein-coding genes. These RNAs are produced when the precursor messenger RNA (pre-mRNA) splicing machinery "backsplices" and covalently joins, for example, the two ends of a single exon. However, the mechanism by which the spliceosome selects only certain exons to circularize is largely unknown. Using extensive mutagenesis of expression plasmids, we show that miniature introns containing the splice sites along with short (∼ 30- to 40-nucleotide) inverted repeats, such as Alu elements, are sufficient to allow the intervening exons to circularize in cells. The intronic repeats must base-pair to one another, thereby bringing the splice sites into close proximity to each other. More than simple thermodynamics is clearly at play, however, as not all repeats support circularization, and increasing the stability of the hairpin between the repeats can sometimes inhibit circular RNA biogenesis. The intronic repeats and exonic sequences must collaborate with one another, and a functional 3' end processing signal is required, suggesting that circularization may occur post-transcriptionally. These results suggest detailed and generalizable models that explain how the splicing machinery determines whether to produce a circular noncoding RNA or a linear mRNA. © 2014 Liang and Wilusz; Published by Cold Spring Harbor Laboratory Press.
New chloroplast microsatellite markers suitable for assessing genetic diversity of Lolium perenne and other related grass species

PubMed Central

Diekmann, Kerstin; Hodkinson, Trevor R.; Barth, Susanne

2012-01-01

Background and Aims Lolium perenne (perennial ryegrass) is the most important forage grass species of temperate regions. We have previously released the chloroplast genome sequence of L. perenne ‘Cashel’. Here nine chloroplast microsatellite markers are published, which were designed based on knowledge about genetically variable regions within the L. perenne chloroplast genome. These markers were successfully used for characterizing the genetic diversity in Lolium and different grass species. Methods Chloroplast genomes of 14 Poaceae taxa were screened for mononucleotide microsatellite repeat regions and primers designed for their amplification from nine loci. The potential of these markers to assess genetic diversity was evaluated on a set of 16 Irish and 15 European L. perenne ecotypes, nine L. perenne cultivars, other Lolium taxa and other grass species. Key Results All analysed Poaceae chloroplast genomes contained more than 200 mononucleotide repeats (chloroplast simple sequence repeats, cpSSRs) of at least 7 bp in length, concentrated mainly in the large single copy region of the genome. Nucleotide composition varied considerably among subfamilies (with Pooideae biased towards poly A repeats). The nine new markers distinguish L. perenne from all non-Lolium taxa. TeaCpSSR28 was able to distinguish between all Lolium species and Lolium multiflorum due to an elongation of an A8 mononucleotide repeat in L. multiflorum. TeaCpSSR31 detected a considerable degree of microsatellite length variation and single nucleotide polymorphism. TeaCpSSR27 revealed variation within some L. perenne accessions due to a 44-bp indel and was hence readily detected by simple agarose gel electrophoresis. Smaller insertion/deletion events or single nucleotide polymorphisms detected by these new markers could be visualized by polyacrylamide gel electrophoresis or DNA sequencing, respectively. Conclusions The new markers are a valuable tool for plant breeding companies, seed testing agencies and the wider scientific community due to their ability to monitor genetic diversity within breeding pools, to trace maternal inheritance and to distinguish closely related species. PMID:22419761
New chloroplast microsatellite markers suitable for assessing genetic diversity of Lolium perenne and other related grass species.

PubMed

Diekmann, Kerstin; Hodkinson, Trevor R; Barth, Susanne

2012-11-01

Lolium perenne (perennial ryegrass) is the most important forage grass species of temperate regions. We have previously released the chloroplast genome sequence of L. perenne 'Cashel'. Here nine chloroplast microsatellite markers are published, which were designed based on knowledge about genetically variable regions within the L. perenne chloroplast genome. These markers were successfully used for characterizing the genetic diversity in Lolium and different grass species. Chloroplast genomes of 14 Poaceae taxa were screened for mononucleotide microsatellite repeat regions and primers designed for their amplification from nine loci. The potential of these markers to assess genetic diversity was evaluated on a set of 16 Irish and 15 European L. perenne ecotypes, nine L. perenne cultivars, other Lolium taxa and other grass species. All analysed Poaceae chloroplast genomes contained more than 200 mononucleotide repeats (chloroplast simple sequence repeats, cpSSRs) of at least 7 bp in length, concentrated mainly in the large single copy region of the genome. Nucleotide composition varied considerably among subfamilies (with Pooideae biased towards poly A repeats). The nine new markers distinguish L. perenne from all non-Lolium taxa. TeaCpSSR28 was able to distinguish between all Lolium species and Lolium multiflorum due to an elongation of an A(8) mononucleotide repeat in L. multiflorum. TeaCpSSR31 detected a considerable degree of microsatellite length variation and single nucleotide polymorphism. TeaCpSSR27 revealed variation within some L. perenne accessions due to a 44-bp indel and was hence readily detected by simple agarose gel electrophoresis. Smaller insertion/deletion events or single nucleotide polymorphisms detected by these new markers could be visualized by polyacrylamide gel electrophoresis or DNA sequencing, respectively. The new markers are a valuable tool for plant breeding companies, seed testing agencies and the wider scientific community due to their ability to monitor genetic diversity within breeding pools, to trace maternal inheritance and to distinguish closely related species.
Action potential propagation recorded from single axonal arbors using multi-electrode arrays.

PubMed

Tovar, Kenneth R; Bridges, Daniel C; Wu, Bian; Randall, Connor; Audouard, Morgane; Jang, Jiwon; Hansma, Paul K; Kosik, Kenneth S

2018-04-11

We report the presence of co-occurring extracellular action potentials (eAPs) from cultured mouse hippocampal neurons among groups of planar electrodes on multi-electrode arrays (MEAs). The invariant sequences of eAPs among co-active electrode groups, repeated co-occurrences and short inter-electrode latencies are consistent with action potential propagation in unmyelinated axons. Repeated eAP co-detection by multiple electrodes was widespread in all our data records. Co-detection of eAPs confirms they result from the same neuron and allows these eAPs to be isolated from all other spikes independently of spike sorting algorithms. We averaged co-occurring events and revealed additional electrodes with eAPs that would otherwise be below detection threshold. We used these eAP cohorts to explore the temperature sensitivity of action potential propagation and the relationship between voltage-gated sodium channel density and propagation velocity. The sequence of eAPs among co-active electrodes 'fingerprints' neurons giving rise to these events and identifies them within neuronal ensembles. We used this property and the non-invasive nature of extracellular recording to monitor changes in excitability at multiple points in single axonal arbors simultaneously over several hours, demonstrating independence of axonal segments. Over several weeks, we recorded changes in inter-electrode propagation latencies and ongoing changes in excitability in different regions of single axonal arbors. Our work illustrates how repeated eAP co-occurrences can be used to extract physiological data from single axons with low electrode density MEAs. However, repeated eAP co-occurrences leads to over-sampling spikes from single neurons and thus can confound traditional spike-train analysis.
Alu repeated DNAs are differentially methylated in primate germ cells.

PubMed Central

Rubin, C M; VandeVoort, C A; Teplitz, R L; Schmid, C W

1994-01-01

A significant fraction of Alu repeats in human sperm DNA, previously found to be unmethylated, is nearly completely methylated in DNA from many somatic tissues. A similar fraction of unmethylated Alus is observed here in sperm DNA from rhesus monkey. However, Alus are almost completely methylated at the restriction sites tested in monkey follicular oocyte DNA. The Alu methylation patterns in mature male and female monkey germ cells are consistent with Alu methylation in human germ cell tumors. Alu sequences are hypomethylated in seminoma DNAs and more methylated in a human ovarian dysgerminoma. These results contrast with methylation patterns reported for germ cell single-copy, CpG island, satellite, and L1 sequences. The function of Alu repeats is not known, but differential methylation of Alu repeats in the male and female germ lines suggests that they may serve as markers for genomic imprinting or in maintaining differences in male and female meiosis. Images PMID:7800508
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files

PubMed Central

2014-01-01

Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

USGS Publications Warehouse

Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

PubMed

Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Estimation of pea (Pisum sativum L.) microsatellite mutation rate based on pedigree and single-seed descent analyses.

PubMed

Cieslarová, Jaroslava; Hanáček, Pavel; Fialová, Eva; Hýbl, Miroslav; Smýkal, Petr

2011-11-01

Microsatellites, or simple sequence repeats (SSRs) are widespread class of repetitive DNA sequences, used in population genetics, genetic diversity and mapping studies. In spite of the SSR utility, the genetic and evolutionary mechanisms are not fully understood. We have investigated three microsatellite loci with different position in the pea (Pisum sativum L.) genome, the A9 locus residing in LTR region of abundant retrotransposon, AD270 as intergenic and AF016458 located in 5'untranslated region of expressed gene. Comparative analysis of a 35 pair samples from seven pea varieties propagated by single-seed descent for ten generations, revealed single 4 bp mutation in 10th generation sample at AD270 locus corresponding to stepwise increase in one additional ATCT repeat unit. The estimated mutation rate was 4.76 × 10(-3) per locus per generation, with a 95% confidence interval of 1.2 × 10(-4) to 2.7 × 10(-2). The comparison of cv. Bohatýr accessions retrieved from different collections, showed intra-, inter-accession variation and differences in flanking and repeat sequences. Fragment size and sequence alternations were also found in long term in vitro organogenic culture, established at 1983, indicative of somatic mutation process. The evidence of homoplasy was detected across of unrelated pea genotypes, which adversaly affects the reliability of diversity estimates not only for diverse germplasm but also highly bred material. The findings of this study have important implications for Pisum phylogeny studies, variety identification and registration process in pea breeding where mutation rate influences the genetic diversity and the effective population size estimates.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PubMed

Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi

2017-07-01

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
2-D Structure of the A Region of Xist RNA and Its Implication for PRC2 Association

PubMed Central

Maenner, Sylvain; Blaud, Magali; Fouillen, Laetitia; Savoye, Anne; Marchand, Virginie; Dubois, Agnès; Sanglier-Cianférani, Sarah; Van Dorsselaer, Alain; Clerc, Philippe; Avner, Philip; Visvikis, Athanase; Branlant, Christiane

2010-01-01

In placental mammals, inactivation of one of the X chromosomes in female cells ensures sex chromosome dosage compensation. The 17 kb non-coding Xist RNA is crucial to this process and accumulates on the future inactive X chromosome. The most conserved Xist RNA region, the A region, contains eight or nine repeats separated by U-rich spacers. It is implicated in the recruitment of late inactivated X genes to the silencing compartment and likely in the recruitment of complex PRC2. Little is known about the structure of the A region and more generally about Xist RNA structure. Knowledge of its structure is restricted to an NMR study of a single A repeat element. Our study is the first experimental analysis of the structure of the entire A region in solution. By the use of chemical and enzymatic probes and FRET experiments, using oligonucleotides carrying fluorescent dyes, we resolved problems linked to sequence redundancies and established a 2-D structure for the A region that contains two long stem-loop structures each including four repeats. Interactions formed between repeats and between repeats and spacers stabilize these structures. Conservation of the spacer terminal sequences allows formation of such structures in all sequenced Xist RNAs. By combination of RNP affinity chromatography, immunoprecipitation assays, mass spectrometry, and Western blot analysis, we demonstrate that the A region can associate with components of the PRC2 complex in mouse ES cell nuclear extracts. Whilst a single four-repeat motif is able to associate with components of this complex, recruitment of Suz12 is clearly more efficient when the entire A region is present. Our data with their emphasis on the importance of inter-repeat pairing change fundamentally our conception of the 2-D structure of the A region of Xist RNA and support its possible implication in recruitment of the PRC2 complex. PMID:20052282
Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

PubMed

Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

2016-01-01

Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.
Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis

PubMed Central

Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

2016-01-01

Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros ‘Jinzaoshi’ were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. ‘Jinzaoshi’, support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales. PMID:27442423

The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

PubMed

Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

2015-01-01

Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence.

PubMed

Lahr, Roni M; Mack, Seshat M; Héroux, Annie; Blagden, Sarah P; Bousquet-Antonelli, Cécile; Deragon, Jean-Marc; Berman, Andrea J

2015-09-18

La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence

DOE PAGES

Lahr, Roni M.; Mack, Seshat M.; Heroux, Annie; ...

2015-07-22

La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. Amore » putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. Ultimately, these studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.« less
Complete chloroplast genome sequence and comparative analysis of loblolly pine (Pinus taeda L.) with related species

PubMed Central

Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

2018-01-01

Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414
A divergent Pumilio repeat protein family for pre-rRNA processing and mRNA localization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Qiu, Chen; McCann, Kathleen L.; Wine, Robert N.

Pumilio/feminization of XX and XO animals (fem)-3 mRNA-binding factor (PUF) proteins bind sequence specifically to mRNA targets using a single-stranded RNA-binding domain comprising eight Pumilio (PUM) repeats. PUM repeats have now been identified in proteins that function in pre-rRNA processing, including human Puf-A and yeast Puf6. This is a role not previously ascribed to PUF proteins. In this paper we present crystal structures of human Puf-A that reveal a class of nucleic acid-binding proteins with 11 PUM repeats arranged in an “L”-like shape. In contrast to classical PUF proteins, Puf-A forms sequence-independent interactions with DNA or RNA, mediated by conservedmore » basic residues. We demonstrate that equivalent basic residues in yeast Puf6 are important for RNA binding, pre-rRNA processing, and mRNA localization. Finally, PUM repeats can be assembled into alternative folds that bind to structured nucleic acids in addition to forming canonical eight-repeat crescent-shaped RNA-binding domains found in classical PUF proteins.« less
A divergent Pumilio repeat protein family for pre-rRNA processing and mRNA localization

DOE PAGES

Qiu, Chen; McCann, Kathleen L.; Wine, Robert N.; ...

2014-12-15

Pumilio/feminization of XX and XO animals (fem)-3 mRNA-binding factor (PUF) proteins bind sequence specifically to mRNA targets using a single-stranded RNA-binding domain comprising eight Pumilio (PUM) repeats. PUM repeats have now been identified in proteins that function in pre-rRNA processing, including human Puf-A and yeast Puf6. This is a role not previously ascribed to PUF proteins. In this paper we present crystal structures of human Puf-A that reveal a class of nucleic acid-binding proteins with 11 PUM repeats arranged in an “L”-like shape. In contrast to classical PUF proteins, Puf-A forms sequence-independent interactions with DNA or RNA, mediated by conservedmore » basic residues. We demonstrate that equivalent basic residues in yeast Puf6 are important for RNA binding, pre-rRNA processing, and mRNA localization. Finally, PUM repeats can be assembled into alternative folds that bind to structured nucleic acids in addition to forming canonical eight-repeat crescent-shaped RNA-binding domains found in classical PUF proteins.« less
Epitope mapping of the variable repetitive region with the MB antigen of Ureaplasma urealyticum.

PubMed Central

Zheng, X; Lau, K; Frazier, M; Cassell, G H; Watson, H L

1996-01-01

One of the major surface structures of Ureaplasma urealyticum recognized by antibodies of patients during infection is the MB antigen. Previously, we showed by Western blot (immunoblot) analysis that any one of the anti-MB monoclonal antibodies (MAbs) 3B1.5, 5B1.1, and 10C6.6 could block the binding of patient antibodies to MB. Subsequent DNA sequencing revealed that a unique six-amino-acid direct tandem repeat region composed the carboxy two-thirds of this antigen. In the present study, using antibody-reactive peptide scanning of this repeat region, we demonstrated that the amino acids defining the epitopes for MAbs 3B1.5 5B1.1 and 10C6.6 are EQP, GK, and KEQPA, respectively. Peptide scanning analysis of an infected patient's serum antibody response showed that the dominant epitope was defined by the sequence PAGK. Mapping of these continuous epitopes revealed overlap between all MAb and patient polyclonal antibody binding sites, thus explaining the ability of a single MAb to apparently block all polyclonal antibody binding sites. We also show that a single amino acid difference in the sequence of the repeats of serovars 3 and 14 accounts for the lack of reactivity with serovar 14 of two of the serovar 3-specific MAbs. Finally, the data demonstrate the need to obtain the sequences of the mba genes of all serovars before an effective serovar-specific antibody detection method can be developed. PMID:8914774
Intrastrain heterogeneity of the mgpB gene in Mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences.

PubMed

Iverson-Cabral, Stefanie L; Astete, Sabina G; Cohen, Craig R; Rocha, Eduardo P C; Totten, Patricia A

2006-07-01

Mycoplasma genitalium is associated with reproductive tract disease in women and may persist in the lower genital tract for months, potentially increasing the risk of upper tract infection and transmission to uninfected partners. Despite its exceptionally small genome (580 kb), approximately 4% is composed of repeated elements known as MgPar sequences (MgPa repeats) based on their homology to the mgpB gene that encodes the immunodominant MgPa adhesin protein. The presence of these MgPar sequences, as well as mgpB variability between M. genitalium strains, suggests that mgpB and MgPar sequences recombine to produce variant MgPa proteins. To examine the extent and generation of diversity within single strains of the organism, we examined mgpB variation within M. genitalium strain G-37 and observed sequence heterogeneity that could be explained by recombination between the mgpB expression site and putative donor MgPar sequences. Similarly, we analyzed mgpB sequences from cervical specimens from a persistently infected woman (21 months) and identified 17 different mgpB variants within a single infecting M. genitalium strain, confirming that mgpB heterogeneity occurs over the course of a natural infection. These observations support the hypothesis that recombination occurs between the mgpB gene and MgPar sequences and that the resulting antigenically distinct MgPa variants may contribute to immune evasion and persistence of infection.
The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform

PubMed Central

Lin, Miaomiao; Qi, Xiujuan; Chen, Jinyong; Sun, Leiming; Zhong, Yunpeng; Fang, Jinbao; Hu, Chungen

2018-01-01

Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics. PMID:29795601
The complete chloroplast genome sequence of Dendrobium officinale.

PubMed

Yang, Pei; Zhou, Hong; Qian, Jun; Xu, Haibin; Shao, Qingsong; Li, Yonghua; Yao, Hui

2016-01-01

The complete chloroplast sequence of Dendrobium officinale, an endangered and economically important traditional Chinese medicine, was reported and characterized. The genome size is 152,018 bp, with 37.5% GC content. A pair of inverted repeats (IRs) of 26,284 bp are separated by a large single-copy region (LSC, 84,944 bp) and a small single-copy region (SSC, 14,506 bp). The complete cp DNA contains 83 protein-coding genes, 39 tRNA genes and 8 rRNA genes. Fourteen genes contained one or two introns.
spa typing for epidemiological surveillance of Staphylococcus aureus.

PubMed

Hallin, Marie; Friedrich, Alexander W; Struelens, Marc J

2009-01-01

The spa typing method is based on sequencing of the polymorphic X region of the protein A gene (spa), present in all strains of Staphylococcus aureus. The X region is constituted of a variable number of 24-bp repeats flanked by well-conserved regions. This single-locus sequence-based typing method combines a number of technical advantages, such as rapidity, reproducibility, and portability. Moreover, due to its repeat structure, the spa locus simultaneously indexes micro- and macrovariations, enabling the use of spa typing in both local and global epidemiological studies. These studies are facilitated by the establishment of standardized spa type nomenclature and Internet shared databases.
Single-Stranded Condensation Stochastically Blocks G-Quadruplex Assembly in Human Telomeric RNA.

PubMed

Gutiérrez, Irene; Garavís, Miguel; de Lorenzo, Sara; Villasante, Alfredo; González, Carlos; Arias-Gonzalez, J Ricardo

2018-05-17

TERRA is an RNA molecule transcribed from human subtelomeric regions toward chromosome ends potentially involved in regulation of heterochromatin stability, semiconservative replication, and telomerase inhibition, among others. TERRA contains tandem repeats of the sequence GGGUUA, with a strong tendency to fold into a four-stranded arrangement known as a parallel G-quadruplex. Here, we demonstrate by using single-molecule force spectroscopy that this potential is limited by the inherent capacity of RNA to self-associate randomly and further condense into entropically more favorable structures. We stretched RNA constructions with more than four and less than eight hexanucleotide repeats, thus unable to form several G-quadruplexes in tandem, flanked by non-G-rich overhangs of random sequence by optical tweezers on a one by one basis. We found that condensed RNA stochastically blocks G-quadruplex folding pathways with a near 20% probability, a behavior that is not found in DNA analogous molecules.
Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

PubMed

Redwan, R M; Saidin, A; Kumar, S V

2015-08-12

Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
The Fusarium Graminearum Genome Reveals a Link Between Localized Polymorphism and Pathogen Specialization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cuomo, Christina A.; Guldener, Ulrich; Xu, Jin Rong

2007-09-07

We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher ratesmore » of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.« less
The structure of cell adhesion molecule uvomorulin. Insights into the molecular mechanism of Ca2+-dependent cell adhesion.

PubMed Central

Ringwald, M; Schuh, R; Vestweber, D; Eistetter, H; Lottspeich, F; Engel, J; Dölz, R; Jähnig, F; Epplen, J; Mayer, S

1987-01-01

We have determined the amino acid sequence of the Ca2+-dependent cell adhesion molecule uvomorulin as it appears on the cell surface. The extracellular part of the molecule exhibits three internally repeated domains of 112 residues which are most likely generated by gene duplication. Each of the repeated domains contains two highly conserved units which could represent putative Ca2+-binding sites. Secondary structure predictions suggest that the putative Ca2+-binding units are located in external loops at the surface of the protein. The protein sequence exhibits a single membrane-spanning region and a cytoplasmic domain. Sequence comparison reveals extensive homology to the chicken L-CAM. Both uvomorulin and L-CAM are identical in 65% of their entire amino acid sequence suggesting a common origin for both CAMs. Images Fig. 1. Fig. 4. Fig. 7. PMID:3501370
The sequence and de novo assembly of the giant panda genome

PubMed Central

Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

2013-01-01

Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

PubMed Central

Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.

2015-01-01

Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456
Two different factors act separately or together to specify functionally distinct activities at a single transcriptional enhancer.

PubMed Central

DeFranco, D; Yamamoto, K R

1986-01-01

The expression of genes fused downstream of the Moloney murine sarcoma virus (MoMSV) long terminal repeat is stimulated by glucocorticoids. We mapped the glucocorticoid response element that conferred this hormonal regulation and found that it is a hormone-dependent transcriptional enhancer, designated Sg; it resides within DNA fragments that also carry a previously described enhancer element (B. Levinson, G. Khoury, G. Vande Woude, and P. Gruss, Nature [London] 295:568-572, 1982), here termed Sa, whose activity is independent of the hormone. Nuclease footprinting revealed that purified glucocorticoid receptor bound at multiple discrete sites within and at the borders of the tandemly repeated sequence motif that defines Sa. The Sa and Sg activities stimulated the apparent efficiency of cognate or heterologous promoter utilization, individually providing modest enhancement and in concert yielding higher levels of activity. A deletion mutant lacking most of the tandem repeat but retaining a single receptor footprint sequence lost Sa activity but still conferred Sg activity. The two enhancer components could also be distinguished physiologically: both were operative within cultured rat fibroblasts, but only Sg activity was detectable in rat exocrine pancreas cells. Therefore, the sequence determinants of Sa and Sg activity may be interdigitated, and when both components are active, the receptor and a putative Sa factor can apparently bind and act simultaneously. We concluded that MoMSV enhancer activity is effected by at least two distinct binding factors, suggesting that combinatorial regulation of promoter function can be mediated even from a single genetic element. Images PMID:3023887
Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fields, C.A.

1996-06-01

The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
A set of plastid loci for use in multiplex fragment length genotyping for intraspecific variation in Pinus (Pinaceae)1

PubMed Central

Wofford, Austin M.; Finch, Kristen; Bigott, Adam; Willyard, Ann

2014-01-01

• Premise of the study: Recently released Pinus plastome sequences support characterization of 15 plastid simple sequence repeat (cpSSR) loci originally published for P. contorta and P. thunbergii. This allows selection of loci for single-tube PCR multiplexed genotyping in any subsection of the genus. • Methods: Unique placement of primers and primer conservation across the genus were investigated, and a set of six loci were selected for single-tube multiplexing. We compared interspecific variation between cpSSRs and nucleotide sequences of ycf1 and tested intraspecific variation for cpSSRs using 911 samples in the P. ponderosa species complex. • Results: The cpSSR loci contain mononucleotide and complex repeats with additional length variation in flanking regions. They are not located in hypervariable regions, and most primers are conserved across the genus. A single PCR per sample multiplexed for six loci yielded 45 alleles in 911 samples. • Discussion: The protocol allows efficient genotyping of many samples. The cpSSR loci are too variable for Pinus phylogenies but are useful for the study of genetic structure within and among populations. The multiplex method could easily be extended to other plant groups by choosing primers for cpSSR loci in a plastome alignment for the target group. PMID:25202625

Whole genome sequencing of Salmonella Typhimurium illuminates distinct outbreaks caused by an endemic multi-locus variable number tandem repeat analysis type in Australia, 2014.

PubMed

Phillips, Anastasia; Sotomayor, Cristina; Wang, Qinning; Holmes, Nadine; Furlong, Catriona; Ward, Kate; Howard, Peter; Octavia, Sophie; Lan, Ruiting; Sintchenko, Vitali

2016-09-15

Salmonella Typhimurium (STM) is an important cause of foodborne outbreaks worldwide. Subtyping of STM remains critical to outbreak investigation, yet current techniques (e.g. multilocus variable number tandem repeat analysis, MLVA) may provide insufficient discrimination. Whole genome sequencing (WGS) offers potentially greater discriminatory power to support infectious disease surveillance. We performed WGS on 62 STM isolates of a single, endemic MLVA type associated with two epidemiologically independent, food-borne outbreaks along with sporadic cases in New South Wales, Australia, during 2014. Genomes of case and environmental isolates were sequenced using HiSeq (Illumina) and the genetic distance between them was assessed by single nucleotide polymorphism (SNP) analysis. SNP analysis was compared to the epidemiological context. The WGS analysis supported epidemiological evidence and genomes of within-outbreak isolates were nearly identical. Sporadic cases differed from outbreak cases by a small number of SNPs, although their close relationship to outbreak cases may represent an unidentified common food source that may warrant further public health follow up. Previously unrecognised mini-clusters were detected. WGS of STM can discriminate foodborne community outbreaks within a single endemic MLVA clone. Our findings support the translation of WGS into public health laboratory surveillance of salmonellosis.
Heterogeneity of the Epstein-Barr Virus (EBV) Major Internal Repeat Reveals Evolutionary Mechanisms of EBV and a Functional Defect in the Prototype EBV Strain B95-8.

PubMed

Ba Abdullah, Mohammed M; Palermo, Richard D; Palser, Anne L; Grayson, Nicholas E; Kellam, Paul; Correia, Samantha; Szymula, Agnieszka; White, Robert E

2017-12-01

Epstein-Barr virus (EBV) is a ubiquitous pathogen of humans that can cause several types of lymphoma and carcinoma. Like other herpesviruses, EBV has diversified through both coevolution with its host and genetic exchange between virus strains. Sequence analysis of the EBV genome is unusually challenging because of the large number and lengths of repeat regions within the virus. Here we describe the sequence assembly and analysis of the large internal repeat 1 of EBV (IR1; also known as the BamW repeats) for more than 70 strains. The diversity of the latency protein EBV nuclear antigen leader protein (EBNA-LP) resides predominantly within the exons downstream of IR1. The integrity of the putative BWRF1 open reading frame (ORF) is retained in over 80% of strains, and deletions truncating IR1 always spare BWRF1. Conserved regions include the IR1 latency promoter (Wp) and one zone upstream of and two within BWRF1. IR1 is heterogeneous in 70% of strains, and this heterogeneity arises from sequence exchange between strains as well as from spontaneous mutation, with interstrain recombination being more common in tumor-derived viruses. This genetic exchange often incorporates regions of <1 kb, and allelic gene conversion changes the frequency of small regions within the repeat but not close to the flanks. These observations suggest that IR1-and, by extension, EBV-diversifies through both recombination and breakpoint repair, while concerted evolution of IR1 is driven by gene conversion of small regions. Finally, the prototype EBV strain B95-8 contains four nonconsensus variants within a single IR1 repeat unit, including a stop codon in the EBNA-LP gene. Repairing IR1 improves EBNA-LP levels and the quality of transformation by the B95-8 bacterial artificial chromosome (BAC). IMPORTANCE Epstein-Barr virus (EBV) infects the majority of the world population but causes illness in only a small minority of people. Nevertheless, over 1% of cancers worldwide are attributable to EBV. Recent sequencing projects investigating virus diversity to see if different strains have different disease impacts have excluded regions of repeating sequence, as they are more technically challenging. Here we analyze the sequence of the largest repeat in EBV (IR1). We first characterized the variations in protein sequences encoded across IR1. In studying variations within the repeat of each strain, we identified a mutation in the main laboratory strain of EBV that impairs virus function, and we suggest that tumor-associated viruses may be more likely to contain DNA mixed from two strains. The patterns of this mixing suggest that sequences can spread between strains (and also within the repeat) by copying sequence from another strain (or repeat unit) to repair DNA damage. Copyright © 2017 Ba abdullah et al.
Comparative Chloroplast Genomics of Gossypium Species: Insights Into Repeat Sequence Variations and Phylogeny

PubMed Central

Wu, Ying; Liu, Fang; Yang, Dai-Gang; Li, Wei; Zhou, Xiao-Jian; Pei, Xiao-Yu; Liu, Yan-Gai; He, Kun-Lun; Zhang, Wen-Sheng; Ren, Zhong-Ying; Zhou, Ke-Hai; Ma, Xiong-Feng; Li, Zhong-Hu

2018-01-01

Cotton is one of the most economically important fiber crop plants worldwide. The genus Gossypium contains a single allotetraploid group (AD) and eight diploid genome groups (A–G and K). However, the evolution of repeat sequences in the chloroplast genomes and the phylogenetic relationships of Gossypium species are unclear. Thus, we determined the variations in the repeat sequences and the evolutionary relationships of 40 cotton chloroplast genomes, which represented the most diverse in the genus, including five newly sequenced diploid species, i.e., G. nandewarense (C1-n), G. armourianum (D2-1), G. lobatum (D7), G. trilobum (D8), and G. schwendimanii (D11), and an important semi-wild race of upland cotton, G. hirsutum race latifolium (AD1). The genome structure, gene order, and GC content of cotton species were similar to those of other higher plant plastid genomes. In total, 2860 long sequence repeats (>10 bp in length) were identified, where the F-genome species had the largest number of repeats (G. longicalyx F1: 108) and E-genome species had the lowest (G. stocksii E1: 53). Large-scale repeat sequences possibly enrich the genetic information and maintain genome stability in cotton species. We also identified 10 divergence hotspot regions, i.e., rpl33-rps18, psbZ-trnG (GCC), rps4-trnT (UGU), trnL (UAG)-rpl32, trnE (UUC)-trnT (GGU), atpE, ndhI, rps2, ycf1, and ndhF, which could be useful molecular genetic markers for future population genetics and phylogenetic studies. Site-specific selection analysis showed that some of the coding sites of 10 chloroplast genes (atpB, atpE, rps2, rps3, petB, petD, ccsA, cemA, ycf1, and rbcL) were under protein sequence evolution. Phylogenetic analysis based on the whole plastomes suggested that the Gossypium species grouped into six previously identified genetic clades. Interestingly, all 13 D-genome species clustered into a strong monophyletic clade. Unexpectedly, the cotton species with C, G, and K-genomes were admixed and nested in a large clade, which could have been due to their recent radiation, incomplete lineage sorting, and introgression hybridization among different cotton lineages. In conclusion, the results of this study provide new insights into the evolution of repeat sequences in chloroplast genomes and interspecific relationships in the genus Gossypium. PMID:29619041
The first complete chloroplast genome sequence of a lycophyte,Huperzia lucidula (Lycopodiaceae)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Paul G.; Karol, Kenneth G.; Mandoli, Dina F.

2005-02-01

We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8x depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,671 bp. Gene order is more similarmore » to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperziachloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophytechloroplast genome data also enable a better reconstruction of the basaltracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.« less
A single bout of high-intensity aerobic exercise facilitates response to paired associative stimulation and promotes sequence-specific implicit motor learning

PubMed Central

Mang, Cameron S.; Snow, Nicholas J.; Campbell, Kristin L.; Ross, Colin J. D.

2014-01-01

The objectives of the present study were to evaluate the impact of a single bout of high-intensity aerobic exercise on 1) long-term potentiation (LTP)-like neuroplasticity via response to paired associative stimulation (PAS) and 2) the temporal and spatial components of sequence-specific implicit motor learning. Additionally, relationships between exercise-induced increases in systemic brain-derived neurotrophic factor (BDNF) and response to PAS and motor learning were evaluated. Sixteen young healthy participants completed six experimental sessions, including the following: 1) rest followed by PAS; 2) aerobic exercise followed by PAS; 3) rest followed by practice of a continuous tracking (CT) task and 4) a no-exercise 24-h retention test; and 5) aerobic exercise followed by CT task practice and 6) a no-exercise 24-h retention test. The CT task included an embedded repeated sequence allowing for evaluation of sequence-specific implicit learning. Slope of motor-evoked potential recruitment curves generated with transcranial magnetic stimulation showed larger increases when PAS was preceded by aerobic exercise (59.8% increase) compared with rest (14.2% increase, P = 0.02). Time lag of CT task performance on the repeated sequence improved under the aerobic exercise condition from early (−100.8 ms) to late practice (−75.2 ms, P < 0.001) and was maintained at retention (−79.2 ms, P = 0.004) but did not change under the rest condition (P > 0.16). Systemic BDNF increased on average by 3.4-fold following aerobic exercise (P = 0.003), but the changes did not relate to neurophysiological or behavioral measures (P > 0.42). These results indicate that a single bout of high-intensity aerobic exercise can prime LTP-like neuroplasticity and promote sequence-specific implicit motor learning. PMID:25257866
A space-efficient algorithm for local similarities.

PubMed

Huang, X Q; Hardison, R C; Miller, W

1990-10-01

Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.
Molecular tandem repeat strategy for elucidating mechanical properties of high-strength proteins

PubMed Central

Jung, Huihun; Pena-Francesch, Abdon; Saadat, Alham; Sebastian, Aswathy; Kim, Dong Hwan; Hamilton, Reginald F.; Albert, Istvan; Allen, Benjamin D.; Demirel, Melik C.

2016-01-01

Many globular and structural proteins have repetitions in their sequences or structures. However, a clear relationship between these repeats and their contribution to the mechanical properties remains elusive. We propose a new approach for the design and production of synthetic polypeptides that comprise one or more tandem copies of a single unit with distinct amorphous and ordered regions. Our designed sequences are based on a structural protein produced in squid suction cups that has a segmented copolymer structure with amorphous and crystalline domains. We produced segmented polypeptides with varying repeat number, while keeping the lengths and compositions of the amorphous and crystalline regions fixed. We showed that mechanical properties of these synthetic proteins could be tuned by modulating their molecular weights. Specifically, the toughness and extensibility of synthetic polypeptides increase as a function of the number of tandem repeats. This result suggests that the repetitions in native squid proteins could have a genetic advantage for increased toughness and flexibility. PMID:27222581
Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing

PubMed Central

Staton, Margaret; Best, Teodora; Khodwekar, Sudhir; Owusu, Sandra; Xu, Tao; Xu, Yi; Jennings, Tara; Cronn, Richard; Arumuganathan, A. Kathiravetpilla; Coggeshall, Mark; Gailing, Oliver; Liang, Haiying; Romero-Severson, Jeanne; Schlarbaum, Scott; Carlson, John E.

2015-01-01

Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence. PMID:26698853
The complete chloroplast genome sequence of Hibiscus syriacus.

PubMed

Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin

2016-09-01

The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.
Evolutional dynamics of 45S and 5S ribosomal DNA in ancient allohexaploid Atropa belladonna.

PubMed

Volkov, Roman A; Panchuk, Irina I; Borisjuk, Nikolai V; Hosiawa-Baranska, Marta; Maluszynska, Jolanta; Hemleben, Vera

2017-01-23

Polyploid hybrids represent a rich natural resource to study molecular evolution of plant genes and genomes. Here, we applied a combination of karyological and molecular methods to investigate chromosomal structure, molecular organization and evolution of ribosomal DNA (rDNA) in nightshade, Atropa belladonna (fam. Solanaceae), one of the oldest known allohexaploids among flowering plants. Because of their abundance and specific molecular organization (evolutionarily conserved coding regions linked to variable intergenic spacers, IGS), 45S and 5S rDNA are widely used in plant taxonomic and evolutionary studies. Molecular cloning and nucleotide sequencing of A. belladonna 45S rDNA repeats revealed a general structure characteristic of other Solanaceae species, and a very high sequence similarity of two length variants, with the only difference in number of short IGS subrepeats. These results combined with the detection of three pairs of 45S rDNA loci on separate chromosomes, presumably inherited from both tetraploid and diploid ancestor species, example intensive sequence homogenization that led to substitution/elimination of rDNA repeats of one parent. Chromosome silver-staining revealed that only four out of six 45S rDNA sites are frequently transcriptionally active, demonstrating nucleolar dominance. For 5S rDNA, three size variants of repeats were detected, with the major class represented by repeats containing all functional IGS elements required for transcription, the intermediate size repeats containing partially deleted IGS sequences, and the short 5S repeats containing severe defects both in the IGS and coding sequences. While shorter variants demonstrate increased rate of based substitution, probably in their transition into pseudogenes, the functional 5S rDNA variants are nearly identical at the sequence level, pointing to their origin from a single parental species. Localization of the 5S rDNA genes on two chromosome pairs further supports uniparental inheritance from the tetraploid progenitor. The obtained molecular, cytogenetic and phylogenetic data demonstrate complex evolutionary dynamics of rDNA loci in allohexaploid species of Atropa belladonna. The high level of sequence unification revealed in 45S and 5S rDNA loci of this ancient hybrid species have been seemingly achieved by different molecular mechanisms.
Genetic diversity of Babesia bovis in virulent and attenuated strains.

PubMed

Mazuz, M L; Molad, T; Fish, L; Leibovitz, B; Wolkomirsky, R; Fleiderovitz, L; Shkap, V

2012-03-01

The aim of this study was to compare the genetic diversity of the single copy Bv80 gene sequences of Babesia bovis in populations of attenuated and virulent parasites. PCR/ RT-PCR followed by cloning and sequence analyses of 4 attenuated and 4 virulent strains were performed. Multiple fragments in the range of 420 to 744 bp were amplified by PCR or RT-PCR. Cloning of the PCR fragments and sequence analyses revealed the presence of mixed subpopulations in either virulent or attenuated parasites with a total of 19 variants with 12 different sequences that differed in number and type of tandem repeats. High levels of intra- and inter-strain diversity of the Bv80 gene, with the presence of mixed populations of parasites were found in both the virulent field isolates and the attenuated vaccine strains. In addition, during the attenuation process, sequence analyses showed changes in the pattern of the parasite subpopulations. Despite high polymorphism found by sequence analyses, the patterns observed and the number of repeats, order, or motifs found could not discriminate between virulent field isolates and attenuated vaccine strains of the parasite.
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

PubMed

Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas

2013-06-01

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
Chloroplast genomes of Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea: Structures and comparative analysis.

PubMed

Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung

2017-08-08

We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.
The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis.

PubMed

Duan, Naibin; Sun, Honghe; Wang, Nan; Fei, Zhangjun; Chen, Xuesen

2016-07-01

The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis, a widely used apple rootstock, was determined using the Illumina high-throughput sequencing approach. The genome is 422,555 bp in length and has a GC content of 45.21%. It is separated by a pair of inverted repeats of 32,504 bp, to form a large single copy region of 213,055 bp and a small single copy region of 144,492 bp. The genome contains 38 protein-coding genes, four pseudogenes, 25 tRNA genes, and three rRNA genes. The genome is 25,608 bp longer than that of M. domestica, and several structural variations between these two mitogenomes were detected.
Duplication in DNA Sequences

NASA Astrophysics Data System (ADS)

Ito, Masami; Kari, Lila; Kincaid, Zachary; Seki, Shinnosuke

The duplication and repeat-deletion operations are the basis of a formal language theoretic model of errors that can occur during DNA replication. During DNA replication, subsequences of a strand of DNA may be copied several times (resulting in duplications) or skipped (resulting in repeat-deletions). As formal language operations, iterated duplication and repeat-deletion of words and languages have been well studied in the literature. However, little is known about single-step duplications and repeat-deletions. In this paper, we investigate several properties of these operations, including closure properties of language families in the Chomsky hierarchy and equations involving these operations. We also make progress toward a characterization of regular languages that are generated by duplicating a regular language.
Bioinformatic mining of EST-SSR loci in the Pacific oyster, Crassostrea gigas.

PubMed

Wang, Y; Ren, R; Yu, Z

2008-06-01

A set of expressed sequence tag-simple sequence repeat (EST-SSR) markers of the Pacific oyster, Crassostrea gigas, was developed through bioinformatic mining of the GenBank public database. As of June 30, 2007, a total of 5132 EST sequences from GenBank were downloaded and screened for di-, tri- and tetra-nucleotide repeats, with criteria set at a minimum of 5, 4 and 4 repeats for the three categories of SSRs respectively. Seventeen polymorphic microsatellite markers were characterized. Allele numbers ranged from 3 to 10, and the observed and expected heterozygosity values varied from 0.125 to 0.770 and from 0.113 to 0.732 respectively. Eleven loci were at Hardy-Weinberg equilibrium (HWE); the other six loci showed significant departure from HWE (P < 0.01), suggesting possible presence of null alleles. Pairwise check of linkage disequilibrium (LD) indicated that 11 of 136 pairs of loci showed significant LD (P < 0.01), likely due to HWE present in single markers. Cross-species amplification was examined for five other Crassostrea species and reasonable results were obtained, promising usefulness of these markers in oyster genetics.
Genotyping and Molecular Identification of Date Palm Cultivars Using Inter-Simple Sequence Repeat (ISSR) Markers.

PubMed

Ayesh, Basim M

2017-01-01

Molecular markers are credible for the discrimination of genotypes and estimation of the extent of genetic diversity and relatedness in a set of genotypes. Inter-simple sequence repeat (ISSR) markers rapidly reveal high polymorphic fingerprints and have been used frequently to determine the genetic diversity among date palm cultivars. This chapter describes the application of ISSR markers for genotyping of date palm cultivars. The application involves extraction of genomic DNA from the target cultivars with reliable quality and quantity. Subsequently the extracted DNA serves as a template for amplification of genomic regions flanked by inverted simple sequence repeats using a single primer. The similarity of each pair of samples is measured by calculating the number of mono- and polymorphic bands revealed by gel electrophoresis. Matrices constructed for similarity and genetic distance are used to build a phylogenetic tree and cluster analysis, to determine the molecular relatedness of cultivars. The protocol describes 3 out of 9 tested primers consistently amplified 31 loci in 6 date palm cultivars, with 28 polymorphic loci.
Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA Amplification

PubMed Central

Brewer, Bonita J.; Payen, Celia; Di Rienzi, Sara C.; Higgins, Megan M.; Ong, Giang; Dunham, Maitreya J.; Raghuraman, M. K.

2015-01-01

DNA replication errors are a major driver of evolution—from single nucleotide polymorphisms to large-scale copy number variations (CNVs). Here we test a specific replication-based model to explain the generation of interstitial, inverted triplications. While no genetic information is lost, the novel inversion junctions and increased copy number of the included sequences create the potential for adaptive phenotypes. The model—Origin-Dependent Inverted-Repeat Amplification (ODIRA)—proposes that a replication error at pre-existing short, interrupted, inverted repeats in genomic sequences generates an extrachromosomal, inverted dimeric, autonomously replicating intermediate; subsequent genomic integration of the dimer yields this class of CNV without loss of distal chromosomal sequences. We used a combination of in vitro and in vivo approaches to test the feasibility of the proposed replication error and its downstream consequences on chromosome structure in the yeast Saccharomyces cerevisiae. We show that the proposed replication error—the ligation of leading and lagging nascent strands to create “closed” forks—can occur in vitro at short, interrupted inverted repeats. The removal of molecules with two closed forks results in a hairpin-capped linear duplex that we show replicates in vivo to create an inverted, dimeric plasmid that subsequently integrates into the genome by homologous recombination, creating an inverted triplication. While other models have been proposed to explain inverted triplications and their derivatives, our model can also explain the generation of human, de novo, inverted amplicons that have a 2:1 mixture of sequences from both homologues of a single parent—a feature readily explained by a plasmid intermediate that arises from one homologue and integrates into the other homologue prior to meiosis. Our tests of key features of ODIRA lend support to this mechanism and suggest further avenues of enquiry to unravel the origins of interstitial, inverted CNVs pivotal in human health and evolution. PMID:26700858
Origin-Dependent Inverted-Repeat Amplification: Tests of a Model for Inverted DNA Amplification.

PubMed

Brewer, Bonita J; Payen, Celia; Di Rienzi, Sara C; Higgins, Megan M; Ong, Giang; Dunham, Maitreya J; Raghuraman, M K

2015-12-01

DNA replication errors are a major driver of evolution--from single nucleotide polymorphisms to large-scale copy number variations (CNVs). Here we test a specific replication-based model to explain the generation of interstitial, inverted triplications. While no genetic information is lost, the novel inversion junctions and increased copy number of the included sequences create the potential for adaptive phenotypes. The model--Origin-Dependent Inverted-Repeat Amplification (ODIRA)-proposes that a replication error at pre-existing short, interrupted, inverted repeats in genomic sequences generates an extrachromosomal, inverted dimeric, autonomously replicating intermediate; subsequent genomic integration of the dimer yields this class of CNV without loss of distal chromosomal sequences. We used a combination of in vitro and in vivo approaches to test the feasibility of the proposed replication error and its downstream consequences on chromosome structure in the yeast Saccharomyces cerevisiae. We show that the proposed replication error-the ligation of leading and lagging nascent strands to create "closed" forks-can occur in vitro at short, interrupted inverted repeats. The removal of molecules with two closed forks results in a hairpin-capped linear duplex that we show replicates in vivo to create an inverted, dimeric plasmid that subsequently integrates into the genome by homologous recombination, creating an inverted triplication. While other models have been proposed to explain inverted triplications and their derivatives, our model can also explain the generation of human, de novo, inverted amplicons that have a 2:1 mixture of sequences from both homologues of a single parent--a feature readily explained by a plasmid intermediate that arises from one homologue and integrates into the other homologue prior to meiosis. Our tests of key features of ODIRA lend support to this mechanism and suggest further avenues of enquiry to unravel the origins of interstitial, inverted CNVs pivotal in human health and evolution.
Efficient generation of complete sequences of MDR-encoding plasmids by rapid assembly of MinION barcoding sequencing data.

PubMed

Li, Ruichao; Xie, Miaomiao; Dong, Ning; Lin, Dachuan; Yang, Xuemei; Wong, Marcus Ho Yin; Chan, Edward Wai-Chi; Chen, Sheng

2018-03-01

Multidrug resistance (MDR)-encoding plasmids are considered major molecular vehicles responsible for transmission of antibiotic resistance genes among bacteria of the same or different species. Delineating the complete sequences of such plasmids could provide valuable insight into the evolution and transmission mechanisms underlying bacterial antibiotic resistance development. However, due to the presence of multiple repeats of mobile elements, complete sequencing of MDR plasmids remains technically complicated, expensive, and time-consuming. Here, we demonstrate a rapid and efficient approach to obtaining multiple MDR plasmid sequences through the use of the MinION nanopore sequencing platform, which is incorporated in a portable device. By assembling the long sequencing reads generated by a single MinION run according to a rapid barcoding sequencing protocol, we obtained the complete sequences of 20 plasmids harbored by multiple bacterial strains. Importantly, single long reads covering a plasmid end-to-end were recorded, indicating that de novo assembly may be unnecessary if the single reads exhibit high accuracy. This workflow represents a convenient and cost-effective approach for systematic assessment of MDR plasmids responsible for treatment failure of bacterial infections, offering the opportunity to perform detailed molecular epidemiological studies to probe the evolutionary and transmission mechanisms of MDR-encoding elements.

PolyQ repeat expansions in ATXN2 associated with ALS are CAA interrupted repeats.

PubMed

Yu, Zhenming; Zhu, Yongqing; Chen-Plotkin, Alice S; Clay-Falcone, Dana; McCluskey, Leo; Elman, Lauren; Kalb, Robert G; Trojanowski, John Q; Lee, Virginia M-Y; Van Deerlin, Vivianna M; Gitler, Aaron D; Bonini, Nancy M

2011-03-29

Amyotrophic lateral sclerosis (ALS) is a devastating, rapidly progressive disease leading to paralysis and death. Recently, intermediate length polyglutamine (polyQ) repeats of 27-33 in ATAXIN-2 (ATXN2), encoding the ATXN2 protein, were found to increase risk for ALS. In ATXN2, polyQ expansions of ≥ 34, which are pure CAG repeat expansions, cause spinocerebellar ataxia type 2. However, similar length expansions that are interrupted with other codons, can present atypically with parkinsonism, suggesting that configuration of the repeat sequence plays an important role in disease manifestation in ATXN2 polyQ expansion diseases. Here we determined whether the expansions in ATXN2 associated with ALS were pure or interrupted CAG repeats, and defined single nucleotide polymorphisms (SNPs) rs695871 and rs695872 in exon 1 of the gene, to assess haplotype association. We found that the expanded repeat alleles of 40 ALS patients and 9 long-repeat length controls were all interrupted, bearing 1-3 CAA codons within the CAG repeat. 21/21 expanded ALS chromosomes with 3CAA interruptions arose from one haplotype (GT), while 18/19 expanded ALS chromosomes with <3CAA interruptions arose from a different haplotype (CC). Moreover, age of disease onset was significantly earlier in patients bearing 3 interruptions vs fewer, and was distinct between haplotypes. These results indicate that CAG repeat expansions in ATXN2 associated with ALS are uniformly interrupted repeats and that the nature of the repeat sequence and haplotype, as well as length of polyQ repeat, may play a role in the neurological effect conferred by expansions in ATXN2.
Mutations that alter a repeated ACCA element located at the 5' end of the Potato virus X genome affect RNA accumulation.

PubMed

Park, Mi-Ri; Kwon, Sun-Jung; Choi, Hong-Soo; Hemenway, Cynthia L; Kim, Kook-Hyung

2008-08-15

The repeated ACCA or AC-rich sequence and structural (SL1) elements in the 5' non-translated region (NTR) of the Potato virus X (PVX) RNA play vital roles in the PVX life cycle by controlling translation, RNA replication, movement, and assembly. It has already been shown that the repeated ACCA or AC-rich sequence affect both gRNA and sgRNA accumulation, while not affecting minus-strand RNA accumulation, and are also required for host protein binding. The functional significance of the repeated ACCA sequence elements in the 5' NTR region was investigated by analyzing the effects of deletion and site-directed mutations on PVX replication in Nicotiana benthamiana plants and NT1 protoplasts. Substitution (ACCA into AAAA or UUUU) mutations introduced in the first (nt 10-13) element in the 5' NTR of the PVX RNA significantly affected viral replication, while mutations introduced in the second (nt 17-20) and third (nt 20-23) elements did not. The fourth (nt 29-32) ACCA element weakly affected virus replication, whereas mutations in the fifth (nt 38-41) significantly reduced virus replication due to the structure disruption of SL1 by AAAA and/or UUUU substitutions. Further characterization of the first ACCA element indicated that duplication of ACCA at nt 10-13 (nt 10-17, ACCAACCA) caused severe symptom development as compared to that of wild type, while deletion of the single element (nt 10-13), DeltaACCA) or tripling of this element caused reduced symptom development. Single- and double-nucleotide substitutions introduced into the first ACCA element revealed the importance of CC located at nt positions 11 and 12. Altogether, these results indicate that the first ACCA element is important for PVX replication.
Repeated-Sprint Sequences During Female Soccer Matches Using Fixed and Individual Speed Thresholds.

PubMed

Nakamura, Fábio Y; Pereira, Lucas A; Loturco, Irineu; Rosseti, Marcelo; Moura, Felipe A; Bradley, Paul S

2017-07-01

Nakamura, FY, Pereira, LA, Loturco, I, Rosseti, M, Moura, FA, and Bradley, PS. Repeated-sprint sequences during female soccer matches using fixed and individual speed thresholds. J Strength Cond Res 31(7): 1802-1810, 2017-The main objective of this study was to characterize the occurrence of single sprint and repeated-sprint sequences (RSS) during elite female soccer matches, using fixed (20 km·h) and individually based speed thresholds (>90% of the mean speed from a 20-m sprint test). Eleven elite female soccer players from the same team participated in the study. All players performed a 20-m linear sprint test, and were assessed in up to 10 official matches using Global Positioning System technology. Magnitude-based inferences were used to test for meaningful differences. Results revealed that irrespective of adopting fixed or individual speed thresholds, female players produced only a few RSS during matches (2.3 ± 2.4 sequences using the fixed threshold and 3.3 ± 3.0 sequences using the individually based threshold), with most sequences composing of just 2 sprints. Additionally, central defenders performed fewer sprints (10.2 ± 4.1) than other positions (fullbacks: 28.1 ± 5.5; midfielders: 21.9 ± 10.5; forwards: 31.9 ± 11.1; with the differences being likely to almost certainly associated with effect sizes ranging from 1.65 to 2.72), and sprinting ability declined in the second half. The data do not support the notion that RSS occurs frequently during soccer matches in female players, irrespective of using fixed or individual speed thresholds to define sprint occurrence. However, repeated-sprint ability development cannot be ruled out from soccer training programs because of its association with match-related performance.
Detection of possible restriction sites for type II restriction enzymes in DNA sequences.

PubMed

Gagniuc, P; Cimponeriu, D; Ionescu-Tîrgovişte, C; Mihai, Andrada; Stavarachi, Monica; Mihai, T; Gavrilă, L

2011-01-01

In order to make a step forward in the knowledge of the mechanism operating in complex polygenic disorders such as diabetes and obesity, this paper proposes a new algorithm (PRSD -possible restriction site detection) and its implementation in Applied Genetics software. This software can be used for in silico detection of potential (hidden) recognition sites for endonucleases and for nucleotide repeats identification. The recognition sites for endonucleases may result from hidden sequences through deletion or insertion of a specific number of nucleotides. Tests were conducted on DNA sequences downloaded from NCBI servers using specific recognition sites for common type II restriction enzymes introduced in the software database (n = 126). Each possible recognition site indicated by the PRSD algorithm implemented in Applied Genetics was checked and confirmed by NEBcutter V2.0 and Webcutter 2.0 software. In the sequence NG_008724.1 (which includes 63632 nucleotides) we found a high number of potential restriction sites for ECO R1 that may be produced by deletion (n = 43 sites) or insertion (n = 591 sites) of one nucleotide. The second module of Applied Genetics has been designed to find simple repeats sizes with a real future in understanding the role of SNPs (Single Nucleotide Polymorphisms) in the pathogenesis of the complex metabolic disorders. We have tested the presence of simple repetitive sequences in five DNA sequence. The software indicated exact position of each repeats detected in the tested sequences. Future development of Applied Genetics can provide an alternative for powerful tools used to search for restriction sites or repetitive sequences or to improve genotyping methods.
The Complete Chloroplast Genome of Banana (Musa acuminata, Zingiberales): Insight into Plastid Monocotyledon Evolution

PubMed Central

Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D’Hont, Angélique

2013-01-01

Background Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. Methodology/Principal Findings The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. Conclusion The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas. PMID:23840670
The complete chloroplast genome of banana (Musa acuminata, Zingiberales): insight into plastid monocotyledon evolution.

PubMed

Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D'Hont, Angélique

2013-01-01

Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas.
Sequence repeats and protein structure

NASA Astrophysics Data System (ADS)

Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

2012-11-01

Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.
Repeated detection of European bat lyssavirus type 2 in dead bats found at a single roost site in the UK.

PubMed

Banyard, Ashley C; Johnson, N; Voller, K; Hicks, D; Nunez, A; Hartley, M; Fooks, A R

2009-01-01

In August 2007, European bat lyssavirus type 2 (EBLV-2) was isolated from a Daubenton's bat found at Stokesay Castle. In September 2008, another bat from the same vicinity of Stokesay Castle also tested positive for EBLV-2. This is the first occurrence of repeated detection of EBLV-2 from a single site. Here, we report the detection of low levels of viral RNA in various bat organs by qRT-PCR and detection of viral antigen by immunohistochemistry. We also report sequence data from both cases and compare data with those derived from other EBLV-2 isolations in the UK.
Exploring the loblolly pine (Pinus taeda L.) genome by BAC sequencing and Cot analysis.

PubMed

Perera, Dinum; Magbanua, Zenaida V; Thummasuwan, Supaphan; Mukherjee, Dipaloke; Arick, Mark; Chouvarine, Philippe; Nairn, Campbell J; Schmutz, Jeremy; Grimwood, Jane; Dean, Jeffrey F D; Peterson, Daniel G

2018-07-15

Loblolly pine (LP; Pinus taeda L.) is an economically and ecologically important tree in the southeastern U.S. To advance understanding of the loblolly pine (LP; Pinus taeda L.) genome, we sequenced and analyzed 100 BAC clones and performed a Cot analysis. The Cot analysis indicates that the genome is composed of 57, 24, and 10% highly-repetitive, moderately-repetitive, and single/low-copy sequences, respectively (the remaining 9% of the genome is a combination of fold back and damaged DNA). Although single/low-copy DNA only accounts for 10% of the LP genome, the amount of single/low-copy DNA in LP is still 14 times the size of the Arabidopsis genome. Since gene numbers in LP are similar to those in Arabidopsis, much of the single/low-copy DNA of LP would appear to be composed of DNA that is both gene- and repeat-poor. Macroarrays prepared from a LP bacterial artificial chromosome (BAC) library were hybridized with probes designed from cell wall synthesis/wood development cDNAs, and 50 of the "targeted" clones were selected for further analysis. An additional 25 clones were selected because they contained few repeats, while 25 more clones were selected at random. The 100 BAC clones were Sanger sequenced and assembled. Of the targeted BACs, 80% contained all or part of the cDNA used to target them. One targeted BAC was found to contain fungal DNA and was eliminated from further analysis. Combinations of similarity-based and ab initio gene prediction approaches were utilized to identify and characterize potential coding regions in the 99 BACs containing LP DNA. From this analysis, we identified 154 gene models (GMs) representing both putative protein-coding genes and likely pseudogenes. Ten of the GMs (all of which were specifically targeted) had enough support to be classified as intact genes. Interestingly, the 154 GMs had statistically indistinguishable (α = 0.05) distributions in the targeted and random BAC clones (15.18 and 12.61 GM/Mb, respectively), whereas the low-repeat BACs contained significantly fewer GMs (7.08 GM/Mb). However, when GM length was considered, the targeted BACs had a significantly greater percentage of their length in GMs (3.26%) when compared to random (1.63%) and low-repeat (0.62%) BACs. The results of our study provide insight into LP evolution and inform ongoing efforts to produce a reference genome sequence for LP, while characterization of genes involved in cell wall production highlights carbon metabolism pathways that can be leveraged for increasing wood production. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Contrasting Patterns of rDNA Homogenization within the Zygosaccharomyces rouxii Species Complex

PubMed Central

Chand Dakal, Tikam; Giudici, Paolo; Solieri, Lisa

2016-01-01

Arrays of repetitive ribosomal DNA (rDNA) sequences are generally expected to evolve as a coherent family, where repeats within such a family are more similar to each other than to orthologs in related species. The continuous homogenization of repeats within individual genomes is a recombination process termed concerted evolution. Here, we investigated the extent and the direction of concerted evolution in 43 yeast strains of the Zygosaccharomyces rouxii species complex (Z. rouxii, Z. sapae, Z. mellis), by analyzing two portions of the 35S rDNA cistron, namely the D1/D2 domains at the 5’ end of the 26S rRNA gene and the segment including the internal transcribed spacers (ITS) 1 and 2 (ITS regions). We demonstrate that intra-genomic rDNA sequence variation is unusually frequent in this clade and that rDNA arrays in single genomes consist of an intermixing of Z. rouxii, Z. sapae and Z. mellis-like sequences, putatively evolved by reticulate evolutionary events that involved repeated hybridization between lineages. The levels and distribution of sequence polymorphisms vary across rDNA repeats in different individuals, reflecting four patterns of rDNA evolution: I) rDNA repeats that are homogeneous within a genome but are chimeras derived from two parental lineages via recombination: Z. rouxii in the ITS region and Z. sapae in the D1/D2 region; II) intra-genomic rDNA repeats that retain polymorphisms only in ITS regions; III) rDNA repeats that vary only in their D1/D2 domains; IV) heterogeneous rDNA arrays that have both polymorphic ITS and D1/D2 regions. We argue that an ongoing process of homogenization following allodiplodization or incomplete lineage sorting gave rise to divergent evolutionary trajectories in different strains, depending upon temporal, structural and functional constraints. We discuss the consequences of these findings for Zygosaccharomyces species delineation and, more in general, for yeast barcoding. PMID:27501051
The complete chloroplast genome sequence of Dianthus superbus var. longicalycinus.

PubMed

Gurusamy, Raman; Lee, Do-Hyung; Park, SeonJoo

2016-05-01

The complete chloroplast genome (cpDNA) sequence of Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicine was reported and characterized. The cpDNA of Dianthus superbus var. longicalycinus is 149,539 bp, with 36.3% GC content. A pair of inverted repeats (IRs) of 24,803 bp is separated by a large single-copy region (LSC, 82,805 bp) and a small single-copy region (SSC, 17,128 bp). It encodes 85 protein-coding genes, 36 tRNA genes and 8 rRNA genes. Of 129 individual genes, 13 genes encoded one intron and three genes have two introns.
The complete chloroplast genome sequence of Dendrobium nobile.

PubMed

Yan, Wenjin; Niu, Zhitao; Zhu, Shuying; Ye, Meirong; Ding, Xiaoyu

2016-11-01

The complete chloroplast (cp) genome sequence of Dendrobium nobile, an endangered and traditional Chinese medicine with important economic value, is presented in this article. The total genome size is 150,793 bp, containing a large single copy (LSC) region (84,939 bp) and a small single copy region (SSC) (13,310 bp) which were separated by two inverted repeat (IRs) regions (26,272 bp). The overall GC contents of the plastid genome were 38.8%. In total, 130 unique genes were annotated and they were consisted of 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. Fourteen genes contained one or two introns.
The structural basis of actinomycin D–binding induces nucleotide flipping out, a sharp bend and a left-handed twist in CGG triplet repeats

PubMed Central

Lo, Yu-Sheng; Tseng, Wen-Hsuan; Chuang, Chien-Ying; Hou, Ming-Hon

2013-01-01

The potent anticancer drug actinomycin D (ActD) functions by intercalating into DNA at GpC sites, thereby interrupting essential biological processes including replication and transcription. Certain neurological diseases are correlated with the expansion of (CGG)n trinucleotide sequences, which contain many contiguous GpC sites separated by a single G:G mispair. To characterize the binding of ActD to CGG triplet repeat sequences, the structural basis for the strong binding of ActD to neighbouring GpC sites flanking a G:G mismatch has been determined based on the crystal structure of ActD bound to ATGCGGCAT, which contains a CGG triplet sequence. The binding of ActD molecules to GCGGC causes many unexpected conformational changes including nucleotide flipping out, a sharp bend and a left-handed twist in the DNA helix via a two site-binding model. Heat denaturation, circular dichroism and surface plasmon resonance analyses showed that adjacent GpC sequences flanking a G:G mismatch are preferred ActD-binding sites. In addition, ActD was shown to bind the hairpin conformation of (CGG)16 in a pairwise combination and with greater stability than that of other DNA intercalators. Our results provide evidence of a possible biological consequence of ActD binding to CGG triplet repeat sequences. PMID:23408860
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.

PubMed Central

Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.

1995-01-01

The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
The Complete Chloroplast Genome Sequences of Five Epimedium Species: Lights into Phylogenetic and Taxonomic Analyses

PubMed Central

Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying

2016-01-01

Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326
Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences.

PubMed

Bacolla, Albino; Tainer, John A; Vasquez, Karen M; Cooper, David N

2016-07-08

Gross chromosomal rearrangements (including translocations, deletions, insertions and duplications) are a hallmark of cancer genomes and often create oncogenic fusion genes. An obligate step in the generation of such gross rearrangements is the formation of DNA double-strand breaks (DSBs). Since the genomic distribution of rearrangement breakpoints is non-random, intrinsic cellular factors may predispose certain genomic regions to breakage. Notably, certain DNA sequences with the potential to fold into secondary structures [potential non-B DNA structures (PONDS); e.g. triplexes, quadruplexes, hairpin/cruciforms, Z-DNA and single-stranded looped-out structures with implications in DNA replication and transcription] can stimulate the formation of DNA DSBs. Here, we tested the postulate that these DNA sequences might be found at, or in close proximity to, rearrangement breakpoints. By analyzing the distribution of PONDS-forming sequences within ±500 bases of 19 947 translocation and 46 365 sequence-characterized deletion breakpoints in cancer genomes, we find significant association between PONDS-forming repeats and cancer breakpoints. Specifically, (AT)n, (GAA)n and (GAAA)n constitute the most frequent repeats at translocation breakpoints, whereas A-tracts occur preferentially at deletion breakpoints. Translocation breakpoints near PONDS-forming repeats also recur in different individuals and patient tumor samples. Hence, PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Gene Conversion Violates the Stepwise Mutation Model for Microsatellites in Y-Chromosomal Palindromic Repeats

PubMed Central

Balaresque, Patricia; King, Turi E; Parkin, Emma J; Heyer, Evelyne; Carvalho-Silva, Denise; Kraaijenbrink, Thirsa; de Knijff, Peter; Tyler-Smith, Chris; Jobling, Mark A

2014-01-01

The male-specific region of the human Y chromosome (MSY) contains eight large inverted repeats (palindromes), in which high-sequence similarity between repeat arms is maintained by gene conversion. These palindromes also harbor microsatellites, considered to evolve via a stepwise mutation model (SMM). Here, we ask whether gene conversion between palindrome microsatellites contributes to their mutational dynamics. First, we study the duplicated tetranucleotide microsatellite DYS385a,b lying in palindrome P4. We show, by comparing observed data with simulated data under a SMM within haplogroups, that observed heteroallelic combinations in which the modal repeat number difference between copies was large, can give rise to homoallelic combinations with zero-repeats difference, equivalent to many single-step mutations. These are unlikely to be generated under a strict SMM, suggesting the action of gene conversion. Second, we show that the intercopy repeat number difference for a large set of duplicated microsatellites in all palindromes in the MSY reference sequence is significantly reduced compared with that for nonpalindrome-duplicated microsatellites, suggesting that the former are characterized by unusual evolutionary dynamics. These observations indicate that gene conversion violates the SMM for microsatellites in palindromes, homogenizing copies within individual Y chromosomes, but increasing overall haplotype diversity among chromosomes within related groups. PMID:24610746
Fragile DNA Motifs Trigger Mutagenesis at Distant Chromosomal Loci in Saccharomyces cerevisiae

PubMed Central

Saini, Natalie; Zhang, Yu; Nishida, Yuri; Sheng, Ziwei; Choudhury, Shilpa; Mieczkowski, Piotr; Lobachev, Kirill S.

2013-01-01

DNA sequences capable of adopting non-canonical secondary structures have been associated with gross-chromosomal rearrangements in humans and model organisms. Previously, we have shown that long inverted repeats that form hairpin and cruciform structures and triplex-forming GAA/TTC repeats induce the formation of double-strand breaks which trigger genome instability in yeast. In this study, we demonstrate that breakage at both inverted repeats and GAA/TTC repeats is augmented by defects in DNA replication. Increased fragility is associated with increased mutation levels in the reporter genes located as far as 8 kb from both sides of the repeats. The increase in mutations was dependent on the presence of inverted or GAA/TTC repeats and activity of the translesion polymerase Polζ. Mutagenesis induced by inverted repeats also required Sae2 which opens hairpin-capped breaks and initiates end resection. The amount of breakage at the repeats is an important determinant of mutations as a perfect palindromic sequence with inherently increased fragility was also found to elevate mutation rates even in replication-proficient strains. We hypothesize that the underlying mechanism for mutagenesis induced by fragile motifs involves the formation of long single-stranded regions in the broken chromosome, invasion of the undamaged sister chromatid for repair, and faulty DNA synthesis employing Polζ. These data demonstrate that repeat-mediated breaks pose a dual threat to eukaryotic genome integrity by inducing chromosomal aberrations as well as mutations in flanking genes. PMID:23785298
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

PubMed

Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

2013-01-01

Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
The complete chloroplast genome sequence of Euonymus japonicus (Celastraceae).

PubMed

Choi, Kyoung Su; Park, SeonJoo

2016-09-01

The complete chloroplast (cp) genome sequence of the Euonymus japonicus, the first sequenced of the genus Euonymus, was reported in this study. The total length was 157 637 bp, containing a pair of 26 678 bp inverted repeat region (IR), which were separated by small single copy (SSC) region and large single copy (LSC) region of 18 340 bp and 85 941 bp, respectively. This genome contains 107 unique genes, including 74 coding genes, four rRNA genes, and 29 tRNA genes. Seventeen genes contain intron of E. japonicus, of which three genes (clpP, ycf3, and rps12) include two introns. The maximum likelihood (ML) phylogenetic analysis revealed that E. japonicus was closely related to Manihot and Populus.

The repeat organizer, a specialized insulator element within the intergenic spacer of the Xenopus rRNA genes.

PubMed Central

Robinett, C C; O'Connor, A; Dunaway, M

1997-01-01

We have identified a novel activity for the region of the intergenic spacer of the Xenopus laevis rRNA genes that contains the 35- and 100-bp repeats. We devised a new assay for this region by constructing DNA plasmids containing a tandem repeat of rRNA reporter genes that were separated by the 35- and 100-bp repeat region and a rRNA gene enhancer. When the 35- and 100-bp repeat region is present in its normal position and orientation at the 3' end of the rRNA reporter genes, the enhancer activates the adjacent downstream promoter but not the upstream rRNA promoter on the same plasmid. Because this element can restrict the range of an enhancer's activity in the context of tandem genes, we have named it the repeat organizer (RO). The ability to restrict enhancer action is a feature of insulator elements, but unlike previously described insulator elements the RO does not block enhancer action in a simple enhancer-blocking assay. Instead, the activity of the RO requires that it be in its normal position and orientation with respect to the other sequence elements of the rRNA genes. The enhancer-binding transcription factor xUBF also binds to the repetitive sequences of the RO in vitro, but these sequences do not activate transcription in vivo. We propose that the RO is a specialized insulator element that organizes the tandem array of rRNA genes into single-gene expression units by promoting activation of a promoter by its proximal enhancers. PMID:9111359
Inheritance patterns of ATCCT repeat interruptions in spinocerebellar ataxia type 10 (SCA10) expansions.

PubMed

Landrian, Ivette; McFarland, Karen N; Liu, Jilin; Mulligan, Connie J; Rasmussen, Astrid; Ashizawa, Tetsuo

2017-01-01

Spinocerebellar ataxia type 10 (SCA10), an autosomal dominant cerebellar ataxia disorder, is caused by a non-coding ATTCT microsatellite repeat expansion in the ataxin 10 gene. In a subset of SCA10 families, the 5'-end of the repeat expansion contains a complex sequence of penta- and heptanucleotide interruption motifs which is followed by a pure tract of tandem ATCCT repeats of unknown length at its 3'-end. Intriguingly, expansions that carry these interruption motifs correlate with an epileptic seizure phenotype and are unstable despite the theory that interruptions are expected to stabilize expanded repeats. To examine the apparent contradiction of unstable, interruption-positive SCA10 expansion alleles and to determine whether the instability originates outside of the interrupted region, we sequenced approximately 1 kb of the 5'-end of SCA10 expansions using the ATCCT-PCR product in individuals across multiple generations from four SCA10 families. We found that the greatest instability within this region occurred in paternal transmissions of the allele in stretches of pure ATTCT motifs while the intervening interrupted sequences were stable. Overall, the ATCCT interruption changes by only one to three repeat units and therefore cannot account for the instability across the length of the disease allele. We conclude that the AT-rich interruptions locally stabilize the SCA10 expansion at the 5'-end but do not completely abolish instability across the entire span of the expansion. In addition, analysis of the interruption alleles across these families support a parsimonious single origin of the mutation with a shared distant ancestor.
High Genetic Diversity Revealed by Variable-Number Tandem Repeat Genotyping and Analysis of hsp65 Gene Polymorphism in a Large Collection of “Mycobacterium canettii” Strains Indicates that the M. tuberculosis Complex Is a Recently Emerged Clone of “M. canettii”

PubMed Central

Fabre, Michel; Koeck, Jean-Louis; Le Flèche, Philippe; Simon, Fabrice; Hervé, Vincent; Vergnaud, Gilles; Pourcel, Christine

2004-01-01

We have analyzed, using complementary molecular methods, the diversity of 43 strains of “Mycobacterium canettii” originating from the Republic of Djibouti, on the Horn of Africa, from 1998 to 2003. Genotyping by multiple-locus variable-number tandem repeat analysis shows that all the strains belong to a single but very distant group when compared to strains of the Mycobacterium tuberculosis complex (MTBC). Thirty-one strains cluster into one large group with little variability and five strains form another group, whereas the other seven are more diverged. In total, 14 genotypes are observed. The DR locus analysis reveals additional variability, some strains being devoid of a direct repeat locus and others having unique spacers. The hsp65 gene polymorphism was investigated by restriction enzyme analysis and sequencing of PCR amplicons. Four new single nucleotide polymorphisms were discovered. One strain was characterized by three nucleotide changes in 441 bp, creating new restriction enzyme polymorphisms. As no sequence variability was found for hsp65 in the whole MTBC, and as a single point mutation separates M. tuberculosis from the closest “M. canettii” strains, this diversity within “M. canettii” subspecies strongly suggests that it is the most probable source species of the MTBC rather than just another branch of the MTBC. PMID:15243089
Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

PubMed

Xu, Qifang; Dunbrack, Roland L

2012-11-01

Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
Single Molecule Study of the Intrinsically Disordered FG-Repeat Nucleoporin 153

PubMed Central

Milles, Sigrid; Lemke, Edward A.

2011-01-01

Nucleoporins (Nups), which are intrinsically disordered, form a selectivity filter inside the nuclear pore complex, taking a central role in the vital nucleocytoplasmic transport mechanism. These Nups display a complex and nonrandom amino-acid architecture of phenylalanine glycine (FG)-repeat clusters and intra-FG linkers. How such heterogeneous sequence composition relates to function and could give rise to a transport mechanism is still unclear. Here we describe a combined chemical biology and single-molecule fluorescence approach to study the large human Nup153 FG-domain. In order to obtain insights into the properties of this domain beyond the average behavior, we probed the end-to-end distance (RE) of several ∼50-residues long FG-repeat clusters in the context of the whole protein domain. Despite the sequence heterogeneity of these FG-clusters, we detected a reoccurring and consistent compaction from a relaxed coil behavior under denaturing conditions (RE/RE,RC = 0.99 ± 0.15 with RE,RC corresponding to ideal relaxed coil behavior) to a collapsed state under native conditions (RE/RE,RC = 0.79 ± 0.09). We then analyzed the properties of this protein on the supramolecular level, and determined that this human FG-domain was in fact able to form a hydrogel with physiological permeability barrier properties. PMID:21961597
Ancient DNA in human bone remains from Pompeii archaeological site.

PubMed

Cipollaro, M; Di Bernardo, G; Galano, G; Galderisi, U; Guarino, F; Angelini, F; Cascino, A

1998-06-29

aDNA extraction and amplification procedures have been optimized for Pompeian human bone remains whose diagenesis has been determined by histological analysis. Single copy genes amplification (X and Y amelogenin loci and Y specific alphoid repeat sequences) have been performed and compared with anthropometric data on sexing.
Interstitial telomere-like repeats in the Arabidopsis thaliana genome.

PubMed

Uchida, Wakana; Matsunaga, Sachihiro; Sugiyama, Ryuji; Kawano, Shigeyuki

2002-02-01

Eukaryotic chromosomal ends are protected by telomeres, which are thought to play an important role in ensuring the complete replication of chromosomes. On the other hand, non-functional telomere-like repeats in the interchromosomal regions (interstitial telomeric repeats; ITRs) have been reported in several eukaryotes. In this study, we identified eight ITRs in the Arabidopsis thaliana genome, each consisting of complete and degenerate 300- to 1200-bp sequences. The ITRs were grouped into three classes (class IA-B, class II, and class IIIA-E) based on the degeneracy of the telomeric repeats in ITRs. The telomeric repeats of the two ITRs in class I were conserved for the most part, whereas the single ITR in class II, and the five ITRs in class III were relatively degenerated. In addition, degenerate ITRs were surrounded by common sequences that shared 70-100% homology to each other; these are named ITR-adjacent sequences (IAS). Although the genomic regions around ITRs in class I lacked IAS, those around ITRs in class II contained IAS (IASa), and those around five ITRs in class III had nine types of IAS (IASb, c, d, e, f, g, h, i, and j). Ten IAS types in classes II and III showed no significant homology to each other. The chromosomal locations of ITRs and IAS were not category-related, but most of them were adjacent to, or part of, a centromere. These results show that the A. thaliana genome has undergone chromosomal rearrangements, such as end-fusions and segmental duplications.
Comparison of taxon-specific versus general locus sets for targeted sequence capture in plant phylogenomics.

PubMed

Chau, John H; Rahfeldt, Wolfgang A; Olmstead, Richard G

2018-03-01

Targeted sequence capture can be used to efficiently gather sequence data for large numbers of loci, such as single-copy nuclear loci. Most published studies in plants have used taxon-specific locus sets developed individually for a clade using multiple genomic and transcriptomic resources. General locus sets can also be developed from loci that have been identified as single-copy and have orthologs in large clades of plants. We identify and compare a taxon-specific locus set and three general locus sets (conserved ortholog set [COSII], shared single-copy nuclear [APVO SSC] genes, and pentatricopeptide repeat [PPR] genes) for targeted sequence capture in Buddleja (Scrophulariaceae) and outgroups. We evaluate their performance in terms of assembly success, sequence variability, and resolution and support of inferred phylogenetic trees. The taxon-specific locus set had the most target loci. Assembly success was high for all locus sets in Buddleja samples. For outgroups, general locus sets had greater assembly success. Taxon-specific and PPR loci had the highest average variability. The taxon-specific data set produced the best-supported tree, but all data sets showed improved resolution over previous non-sequence capture data sets. General locus sets can be a useful source of sequence capture targets, especially if multiple genomic resources are not available for a taxon.
Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results.

PubMed

Just, Rebecca S; Irwin, Jodi A

2018-05-01

Some of the expected advantages of next generation sequencing (NGS) for short tandem repeat (STR) typing include enhanced mixture detection and genotype resolution via sequence variation among non-homologous alleles of the same length. However, at the same time that NGS methods for forensic DNA typing have advanced in recent years, many caseworking laboratories have implemented or are transitioning to probabilistic genotyping to assist the interpretation of complex autosomal STR typing results. Current probabilistic software programs are designed for length-based data, and were not intended to accommodate sequence strings as the product input. Yet to leverage the benefits of NGS for enhanced genotyping and mixture deconvolution, the sequence variation among same-length products must be utilized in some form. Here, we propose use of the longest uninterrupted stretch (LUS) in allele designations as a simple method to represent sequence variation within the STR repeat regions and facilitate - in the nearterm - probabilistic interpretation of NGS-based typing results. An examination of published population data indicated that a reference LUS region is straightforward to define for most autosomal STR loci, and that using repeat unit plus LUS length as the allele designator can represent greater than 80% of the alleles detected by sequencing. A proof of concept study performed using a freely available probabilistic software demonstrated that the LUS length can be used in allele designations when a program does not require alleles to be integers, and that utilizing sequence information improves interpretation of both single-source and mixed contributor STR typing results as compared to using repeat unit information alone. The LUS concept for allele designation maintains the repeat-based allele nomenclature that will permit backward compatibility to extant STR databases, and the LUS lengths themselves will be concordant regardless of the NGS assay or analysis tools employed. Further, these biologically based, easy-to-derive designations uphold clear relationships between parent alleles and their stutter products, enabling analysis in fully continuous probabilistic programs that model stutter while avoiding the algorithmic complexities that come with string based searches. Though using repeat unit plus LUS length as the allele designator does not capture variation that occurs outside of the core repeat regions, this straightforward approach would permit the large majority of known STR sequence variation to be used for mixture deconvolution and, in turn, result in more informative mixture statistics in the near term. Ultimately, the method could bridge the gap from current length-based probabilistic systems to facilitate broader adoption of NGS by forensic DNA testing laboratories. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Markers and mapping revisited: finding your gene.

PubMed

Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

2009-01-01

This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.
MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform

PubMed Central

Suyama, Yoshihisa; Matsuki, Yu

2015-01-01

Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239
Analyses of Expressed Sequence Tags from Apple1

PubMed Central

Newcomb, Richard D.; Crowhurst, Ross N.; Gleave, Andrew P.; Rikkerink, Erik H.A.; Allan, Andrew C.; Beuning, Lesley L.; Bowen, Judith H.; Gera, Emma; Jamieson, Kim R.; Janssen, Bart J.; Laing, William A.; McArtney, Steve; Nain, Bhawana; Ross, Gavin S.; Snowden, Kimberley C.; Souleyre, Edwige J.F.; Walton, Eric F.; Yauk, Yar-Khing

2006-01-01

The domestic apple (Malus domestica; also known as Malus pumila Mill.) has become a model fruit crop in which to study commercial traits such as disease and pest resistance, grafting, and flavor and health compound biosynthesis. To speed the discovery of genes involved in these traits, develop markers to map genes, and breed new cultivars, we have produced a substantial expressed sequence tag collection from various tissues of apple, focusing on fruit tissues of the cultivar Royal Gala. Over 150,000 expressed sequence tags have been collected from 43 different cDNA libraries representing 34 different tissues and treatments. Clustering of these sequences results in a set of 42,938 nonredundant sequences comprising 17,460 tentative contigs and 25,478 singletons, together representing what we predict are approximately one-half the expressed genes from apple. Many potential molecular markers are abundant in the apple transcripts. Dinucleotide repeats are found in 4,018 nonredundant sequences, mainly in the 5′-untranslated region of the gene, with a bias toward one repeat type (containing AG, 88%) and against another (repeats containing CG, 0.1%). Trinucleotide repeats are most common in the predicted coding regions and do not show a similar degree of sequence bias in their representation. Bi-allelic single-nucleotide polymorphisms are highly abundant with one found, on average, every 706 bp of transcribed DNA. Predictions of the numbers of representatives from protein families indicate the presence of many genes involved in disease resistance and the biosynthesis of flavor and health-associated compounds. Comparisons of some of these gene families with Arabidopsis (Arabidopsis thaliana) suggest instances where there have been duplications in the lineages leading to apple of biosynthetic and regulatory genes that are expressed in fruit. This resource paves the way for a concerted functional genomics effort in this important temperate fruit crop. PMID:16531485
The complete chloroplast genome of Sinopodophyllum hexandrum Ying (Berberidaceae).

PubMed

Meng, Lihua; Liu, Ruijuan; Chen, Jianbing; Ding, Chenxu

2017-05-01

The complete nucleotide sequence of the Sinopodophyllum hexandrum Ying chloroplast genome (cpDNA) was determined based on next-generation sequencing technologies in this study. The genome was 157 203 bp in length, containing a pair of inverted repeat (IRa and IRb) regions of 25 960 bp, which were separated by a large single-copy (LSC) region of 87 065 bp and a small single-copy (SSC) region of 18 218 bp, respectively. The cpDNA contained 148 genes, including 96 protein-coding genes, 8 ribosomal RNA genes, and 44 tRNA genes. In these genes, eight harbored a single intron, and two (ycf3 and clpP) contained a couple of introns. The cpDNA AT content of S. hexandrum cpDNA is 61.5%.
Nature of frequent deletions in CEBPA.

PubMed

Fuchs, Ota; Kostecka, Arnost; Provaznikova, Dana; Krasna, Blazena; Brezinova, Jana; Filkukova, Jitka; Kotlin, Roman; Kouba, Michal; Kobylka, Petr; Neuwirtova, Radana; Jonasova, Anna; Caniga, Miroslav; Schwarz, Jiri; Markova, Jana; Maaloufova, Jacqueline; Sponerova, Dana; Novakova, Ludmila; Cermak, Jaroslav

2009-01-01

C/EBPalpha (CCAAT/enhancer binding protein alpha) belongs to the family of leucine zipper transcription factors and is necessary for transcriptional control of granulocyte, adipocyte and hepatocyte differentiation, glucose metabolism and lung development. C/EBPalpha is encoded by an intronless gene. CEBPA mutations cause a myeloid differentiation block and were detected in acute myeloid leukemia (AML), myelodysplastic syndrome (MDS), multiple myeloma and non-Hodgkin's lymphoma (NHL) patients. In this study we identified in 41 individuals from 824 screened individuals (290 AML patients, 382 MDS patients, 56 NHL patients and 96 healthy individuals) a single class of 23 deletions in CEBPA gene which involved a direct repeat of at least 2 bp. These mutations are characterised by the loss of one of two same repeats at the ends of deleted sequence. Three most frequent repeats included in these deletions in CEBPA gene are CGCGAG (493-498_865-870), GCCAAGCAGC (508-517_907-916) and GG (486-487_885-886), all according to GenBank accession no. NM_004364.2. A mechanism for deletion formation between two repetitive sequences can be recombination events in the repair process. Double-stranded cut in DNA can initiate these recombination events of adjacent DNA sequences.
Comparison of simple sequence repeats in 19 Archaea.

PubMed

Trivedi, S

2006-12-05

All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights

PubMed Central

Kumar, Amitha Sampath; Sowpati, Divya Tej; Mishra, Rakesh K.

2016-01-01

Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features. PMID:27893794
The complete chloroplast genome of salt cress (Eutrema salsugineum).

PubMed

Guo, Xinyi; Hao, Guoqian; Ma, Tao

2016-07-01

The complete chloroplast (cp) sequence of the salt cress (Eutrema salsugineum), a plant well-adapted to salt stress, was presented in this study. The circular molecule is 153,407 bp in length and exhibit a typical quadripartite structure containing an 83,894 bp large single copy (LSC) region, a 17,607 bp small single copy (SSC) region, and the two 25,953 bp inverted repeats (IRs). The salt cress cp genome contains 135 known genes, including 87 protein-coding genes, 8 ribosomal RNA genes, and 40 tRNA genes; 21 of these are located in the inverted repeat region. As expected, phylogenetic analysis support the idea that E. salsugineum is sister to Brassiceae species within the Brassicaceae family.
Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria

PubMed Central

Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin

2011-01-01

Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374
Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates.

PubMed

Weng, Mao-Lun; Blazier, John C; Govindu, Madhumita; Jansen, Robert K

2014-03-01

Geraniaceae plastid genomes are highly rearranged, and each of the four genera already sequenced in the family has a distinct genome organization. This study reports plastid genome sequences of six additional species, Francoa sonchifolia, Melianthus villosus, and Viviania marifolia from Geraniales, and Pelargonium alternans, California macrophylla, and Hypseocharis bilobata from Geraniaceae. These genome sequences, combined with previously published species, provide sufficient taxon sampling to reconstruct the ancestral plastid genome organization of Geraniaceae and the rearrangements unique to each genus. The ancestral plastid genome of Geraniaceae has a 4 kb inversion and a reduced, Pelargonium-like small single copy region. Our ancestral genome reconstruction suggests that a few minor rearrangements occurred in the stem branch of Geraniaceae followed by independent rearrangements in each genus. The genomic comparison demonstrates that a series of inverted repeat boundary shifts and inversions played a major role in shaping genome organization in the family. The distribution of repeats is strongly associated with breakpoints in the rearranged genomes, and the proportion and the number of large repeats (>20 bp and >60 bp) are significantly correlated with the degree of genome rearrangements. Increases in the degree of plastid genome rearrangements are correlated with the acceleration in nonsynonymous substitution rates (dN) but not with synonymous substitution rates (dS). Possible mechanisms that might contribute to this correlation, including DNA repair system and selection, are discussed.
[Mutation Analysis of 19 STR Loci in 20 723 Cases of Paternity Testing].

PubMed

Bi, J; Chang, J J; Li, M X; Yu, C Y

2017-06-01

To observe and analyze the confirmed cases of paternity testing, and to explore the mutation rules of STR loci. The mutant STR loci were screened from 20 723 confirmed cases of paternity testing by Goldeneye 20A system．The mutation rates, and the sources, fragment length, steps and increased or decreased repeat sequences of mutant alleles were counted for the analysis of the characteristics of mutation-related factors. A total of 548 mutations were found on 19 STR loci, and 557 mutation events were observed. The loci mutation rate was 0.07‰-2.23‰. The ratio of paternal to maternal mutant events was 3.06:1. One step mutation was the main mutation, and the number of the increased repeat sequences was almost the same as the decreased repeat sequences. The repeat sequences were more likely to decrease in two steps mutation and above. Mutation mainly occurred in the medium allele, and the number of the increased repeat sequences was almost the same as the decreased repeat sequences. In long allele mutations, the decreased repeat sequences were significantly more than the increased repeat sequences. The number of the increased repeat sequences was almost the same as the decreased repeat sequences in paternal mutation, while the decreased repeat sequences were more than the increased in maternal mutation. There are significant differences in the mutation rate of each locus. When one or two loci do not conform to the genetic law, other detection system should be added, and PI value should be calculated combined with the information of the mutate STR loci in order to further clarify the identification opinions. Copyright© by the Editorial Department of Journal of Forensic Medicine

Effect of repeat copy number on variable-number tandem repeat mutations in Escherichia coli O157:H7.

PubMed

Vogler, Amy J; Keys, Christine; Nemoto, Yoshimi; Colman, Rebecca E; Jay, Zack; Keim, Paul

2006-06-01

Variable-number tandem repeat (VNTR) loci have shown a remarkable ability to discriminate among isolates of the recently emerged clonal pathogen Escherichia coli O157:H7, making them a very useful molecular epidemiological tool. However, little is known about the rates at which these sequences mutate, the factors that affect mutation rates, or the mechanisms by which mutations occur at these loci. Here, we measure mutation rates for 28 VNTR loci and investigate the effects of repeat copy number and mismatch repair on mutation rate using in vitro-generated populations for 10 E. coli O157:H7 strains. We find single-locus rates as high as 7.0 x 10(-4) mutations/generation and a combined 28-locus rate of 6.4 x 10(-4) mutations/generation. We observed single- and multirepeat mutations that were consistent with a slipped-strand mispairing mutation model, as well as a smaller number of large repeat copy number mutations that were consistent with recombination-mediated events. Repeat copy number within an array was strongly correlated with mutation rate both at the most mutable locus, O157-10 (r2= 0.565, P = 0.0196), and across all mutating loci. The combined locus model was significant whether locus O157-10 was included (r2= 0.833, P < 0.0001) or excluded (r2= 0.452, P < 0.0001) from the analysis. Deficient mismatch repair did not affect mutation rate at any of the 28 VNTRs with repeat unit sizes of >5 bp, although a poly(G) homomeric tract was destabilized in the mutS strain. Finally, we describe a general model for VNTR mutations that encompasses insertions and deletions, single- and multiple-repeat mutations, and their relative frequencies based upon our empirical mutation rate data.
In situ optical sequencing and structure analysis of a trinucleotide repeat genome region by localization microscopy after specific COMBO-FISH nano-probing

NASA Astrophysics Data System (ADS)

Stuhlmüller, M.; Schwarz-Finsterle, J.; Fey, E.; Lux, J.; Bach, M.; Cremer, C.; Hinderhofer, K.; Hausmann, M.; Hildenbrand, G.

2015-10-01

Trinucleotide repeat expansions (like (CGG)n) of chromatin in the genome of cell nuclei can cause neurological disorders such as for example the Fragile-X syndrome. Until now the mechanisms are not clearly understood as to how these expansions develop during cell proliferation. Therefore in situ investigations of chromatin structures on the nanoscale are required to better understand supra-molecular mechanisms on the single cell level. By super-resolution localization microscopy (Spectral Position Determination Microscopy; SPDM) in combination with nano-probing using COMBO-FISH (COMBinatorial Oligonucleotide FISH), novel insights into the nano-architecture of the genome will become possible. The native spatial structure of trinucleotide repeat expansion genome regions was analysed and optical sequencing of repetitive units was performed within 3D-conserved nuclei using SPDM after COMBO-FISH. We analysed a (CGG)n-expansion region inside the 5' untranslated region of the FMR1 gene. The number of CGG repeats for a full mutation causing the Fragile-X syndrome was found and also verified by Southern blot. The FMR1 promotor region was similarly condensed like a centromeric region whereas the arrangement of the probes labelling the expansion region seemed to indicate a loop-like nano-structure. These results for the first time demonstrate that in situ chromatin structure measurements on the nanoscale are feasible. Due to further methodological progress it will become possible to estimate the state of trinucleotide repeat mutations in detail and to determine the associated chromatin strand structural changes on the single cell level. In general, the application of the described approach to any genome region will lead to new insights into genome nano-architecture and open new avenues for understanding mechanisms and their relevance in the development of heredity diseases.
Isolation and molecular characterization of dTnp1, a mobile and defective transposable element of Nicotiana plumbaginifolia.

PubMed

Meyer, C; Pouteau, S; Rouzé, P; Caboche, M

1994-01-01

By Northern blot analysis of nitrate reductase-deficient mutants of Nicotiana plumbaginifolia, we identified a mutant (mutant D65), obtained after gamma-ray irradiation of protoplasts, which contained an insertion sequence in the nitrate reductase (NR) mRNA. This insertion sequence was localized by polymerase chain reaction (PCR) in the first exon of NR and was also shown to be present in the NR gene. The mutant gene contained a 565 bp insertion sequence that exhibits the sequence characteristics of a transposable element, which was thus named dTnp1. The dTnp1 element has 14 bp terminal inverted repeats and is flanked by an 8-bp target site duplication generated upon transposition. These inverted repeats have significant sequence homology with those of other transposable elements. Judging by its size and the absence of a long open reading frame, dTnp1 appears to represent a defective, although mobile, transposable element. The octamer motif TTTAGGCC was found several times in direct orientation near the 5' and 3' ends of dTnp1 together with a perfect palindrome located after the 5' inverted repeat. Southern blot analysis using an internal probe of dTnp1 suggested that this element occurs as a single copy in the genome of N. plumbaginifolia. It is also present in N. tabacum, but absent in tomato or petunia. The dTnp1 element is therefore of potential use for gene tagging in Nicotiana species.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

PubMed Central

de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

2015-01-01

Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363
Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi.

PubMed

Batty, Elizabeth M; Chaemchuen, Suwittra; Blacksell, Stuart; Richards, Allen L; Paris, Daniel; Bowden, Rory; Chan, Caroline; Lachumanan, Ramkumar; Day, Nicholas; Donnelly, Peter; Chen, Swaine; Salje, Jeanne

2018-06-01

Orientia tsutsugamushi is a clinically important but neglected obligate intracellular bacterial pathogen of the Rickettsiaceae family that causes the potentially life-threatening human disease scrub typhus. In contrast to the genome reduction seen in many obligate intracellular bacteria, early genetic studies of Orientia have revealed one of the most repetitive bacterial genomes sequenced to date. The dramatic expansion of mobile elements has hampered efforts to generate complete genome sequences using short read sequencing methodologies, and consequently there have been few studies of the comparative genomics of this neglected species. We report new high-quality genomes of O. tsutsugamushi, generated using PacBio single molecule long read sequencing, for six strains: Karp, Kato, Gilliam, TA686, UT76 and UT176. In comparative genomics analyses of these strains together with existing reference genomes from Ikeda and Boryong strains, we identify a relatively small core genome of 657 genes, grouped into core gene islands and separated by repeat regions, and use the core genes to infer the first whole-genome phylogeny of Orientia. Complete assemblies of multiple Orientia genomes verify initial suggestions that these are remarkable organisms. They have larger genomes compared with most other Rickettsiaceae, with widespread amplification of repeat elements and massive chromosomal rearrangements between strains. At the gene level, Orientia has a relatively small set of universally conserved genes, similar to other obligate intracellular bacteria, and the relative expansion in genome size can be accounted for by gene duplication and repeat amplification. Our study demonstrates the utility of long read sequencing to investigate complex bacterial genomes and characterise genomic variation.
Group B Streptococcus Vaginal Carriage in Pregnant Women as Deciphered by Clustered Regularly Interspaced Short Palindromic Repeat Analysis.

PubMed

Beauruelle, Clemence; Pastuszka, Adeline; Mereghetti, Laurent; Lanotte, Philippe

2018-06-01

We evaluated the diversity of group B Streptococcus (GBS) vaginal carriage populations in pregnant women. For this purpose, we studied each isolate present in a primary culture of a vaginal swab using a new approach based on clustered regularly interspaced short palindromic repeats (CRISPR) locus analysis. To evaluate the CRISPR array composition rapidly, a restriction fragment length polymorphism (RFLP) analysis was performed. For each different pattern observed, the CRISPR array was sequenced and capsular typing and multilocus sequence typing (MLST) were performed. A total of 970 isolates from 10 women were analyzed by CRISPR-RFLP. Each woman carrying GBS isolates presented one to five specific "personal" patterns. Five women showed similar isolates with specific and unique restriction patterns, suggesting the carriage of a single GBS clone. Different patterns were observed among isolates from the other five women. For three of these, CRISPR locus sequencing highlighted low levels of internal modifications in the locus backbone, whereas there were high levels of modifications for the last two women, suggesting the carriage of two different clones. These two clones were closely related, having the same ancestral spacer(s), the same capsular type and, in one case, the same ST, but showed different antibiotic resistance patterns in pairs. Eight of 10 women were colonized by a single GBS clone, while two of them were colonized by two strains, leading to a risk of selection of more-virulent and/or more-resistant clones during antibiotic prophylaxis. This CRISPR analysis made it possible to separate isolates belonging to a single capsular type and sequence type, highlighting the greater discriminating power of this approach. Copyright © 2018 American Society for Microbiology.
Mammalian DNA enriched for replication origins is enriched for snap-back sequences.

PubMed

Zannis-Hadjopoulos, M; Kaufmann, G; Martin, R G

1984-11-15

Using the instability of replication loops as a method for the isolation of double-stranded nascent DNA, extruded DNA enriched for replication origins was obtained and denatured. Snap-back DNA, single-stranded DNA with inverted repeats (palindromic sequences), reassociates rapidly into stem-loop structures with zero-order kinetics when conditions are changed from denaturing to renaturing, and can be assayed by chromatography on hydroxyapatite. Origin-enriched nascent DNA strands from mouse, rat and monkey cells growing either synchronously or asynchronously were purified and assayed for the presence of snap-back sequences. The results show that origin-enriched DNA is also enriched for snap-back sequences, implying that some origins for mammalian DNA replication contain or lie near palindromic sequences.
Structure and possible function of a G-quadruplex in the long terminal repeat of the proviral HIV-1 genome

PubMed Central

De Nicola, Beatrice; Lech, Christopher J.; Heddi, Brahim; Regmi, Sagar; Frasson, Ilaria; Perrone, Rosalba; Richter, Sara N.; Phan, Anh Tuân

2016-01-01

The long terminal repeat (LTR) of the proviral human immunodeficiency virus (HIV)-1 genome is integral to virus transcription and host cell infection. The guanine-rich U3 region within the LTR promoter, previously shown to form G-quadruplex structures, represents an attractive target to inhibit HIV transcription and replication. In this work, we report the structure of a biologically relevant G-quadruplex within the LTR promoter region of HIV-1. The guanine-rich sequence designated LTR-IV forms a well-defined structure in physiological cationic solution. The nuclear magnetic resonance (NMR) structure of this sequence reveals a parallel-stranded G-quadruplex containing a single-nucleotide thymine bulge, which participates in a conserved stacking interaction with a neighboring single-nucleotide adenine loop. Transcription analysis in a HIV-1 replication competent cell indicates that the LTR-IV region may act as a modulator of G-quadruplex formation in the LTR promoter. Consequently, the LTR-IV G-quadruplex structure presented within this work could represent a valuable target for the design of HIV therapeutics. PMID:27298260
Genetic markers, genotyping methods & next generation sequencing in Mycobacterium tuberculosis

PubMed Central

Desikan, Srinidhi; Narayanan, Sujatha

2015-01-01

Molecular epidemiology (ME) is one of the main areas in tuberculosis research which is widely used to study the transmission epidemics and outbreaks of tubercle bacilli. It exploits the presence of various polymorphisms in the genome of the bacteria that can be widely used as genetic markers. Many DNA typing methods apply these genetic markers to differentiate various strains and to study the evolutionary relationships between them. The three widely used genotyping tools to differentiate Mycobacterium tuberculosis strains are IS6110 restriction fragment length polymorphism (RFLP), spacer oligotyping (Spoligotyping), and mycobacterial interspersed repeat units - variable number of tandem repeats (MIRU-VNTR). A new prospect towards ME was introduced with the development of whole genome sequencing (WGS) and the next generation sequencing (NGS) methods, where the entire genome is sequenced that not only helps in pointing out minute differences between the various sequences but also saves time and the cost. NGS is also found to be useful in identifying single nucleotide polymorphisms (SNPs), comparative genomics and also various aspects about transmission dynamics. These techniques enable the identification of mycobacterial strains and also facilitate the study of their phylogenetic and evolutionary traits. PMID:26205019
Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

PubMed

Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

2014-01-01

Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.
Sequences characterization of microsatellite DNA sequences in Pacific abalone ( Haliotis discus hannai)

NASA Astrophysics Data System (ADS)

Li, Qi; Akihiro, Kijima

2007-01-01

The microsatellite-enriched library was constructed using magnetic bead hybridization selection method, and the microsatellite DNA sequences were analyzed in Pacific abalone Haliotis discus hannai. Three hundred and fifty white colonies were screened using PCR-based technique, and 84 clones were identified to potentially contain microsatellite repeat motif. The 84 clones were sequenced, and 42 microsatellites and 4 minisatellites with a minimum of five repeats were found (13.1% of white colonies screened). Besides the motif of CA contained in the oligoprobe, we also found other 16 types of microsatellite repeats including a dinucleotide repeat, two tetranucleotide repeats, twelve pentanucleotide repeats and a hexanucleotide repeat. According to Weber (1990), the microsatellite sequences obtained could be categorized structurally into perfect repeats (73.3%), imperfect repeats (13.3%), and compound repeats (13.4%). Among the microsatellite repeats, relatively short arrays (<20 repeats) were most abundant, accounting for 75.0%. The largest length of microsatellites was 48 repeats, and the average number of repeats was 13.4. The data on the composition and length distribution of microsatellites obtained in the present study can be useful for choosing the repeat motifs for microsatellite isolation in other abalone species.
Complete Chloroplast Genome of the Multifunctional Crop Globe Artichoke and Comparison with Other Asteraceae

PubMed Central

Curci, Pasquale L.; De Paola, Domenico; Danzi, Donatella; Vendramin, Giovanni G.; Sonnante, Gabriella

2015-01-01

With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for “specific barcode” purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants. PMID:25774672
Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other Asteraceae.

PubMed

Curci, Pasquale L; De Paola, Domenico; Danzi, Donatella; Vendramin, Giovanni G; Sonnante, Gabriella

2015-01-01

With over 20,000 species, Asteraceae is the second largest plant family. High-throughput sequencing of nuclear and chloroplast genomes has allowed for a better understanding of the evolutionary relationships within large plant families. Here, the globe artichoke chloroplast (cp) genome was obtained by a combination of whole-genome and BAC clone high-throughput sequencing. The artichoke cp genome is 152,529 bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 25,155 bp, representing the longest IRs found in the Asteraceae family so far. The large (LSC) and the small (SSC) single-copy regions span 83,578 bp and 18,641 bp, respectively. The artichoke cp sequence was compared to the other eight Asteraceae complete cp genomes available, revealing an IR expansion at the SSC/IR boundary. This expansion consists of 17 bp of the ndhF gene generating an overlap between the ndhF and ycf1 genes. A total of 127 cp simple sequence repeats (cpSSRs) were identified in the artichoke cp genome, potentially suitable for future population studies in the Cynara genus. Parsimony-informative regions were evaluated and allowed to place a Cynara species within the Asteraceae family tree. The eight most informative coding regions were also considered and tested for "specific barcode" purpose in the Asteraceae family. Our results highlight the usefulness of cp genome sequencing in exploring plant genome diversity and retrieving reliable molecular resources for phylogenetic and evolutionary studies, as well as for specific barcodes in plants.
Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora).

PubMed

Nie, Xiaojun; Lv, Shuzuo; Zhang, Yingxin; Du, Xianghong; Wang, Le; Biradar, Siddanagouda S; Tan, Xiufang; Wan, Fanghao; Weining, Song

2012-01-01

Crofton weed (Ageratina adenophora) is one of the most hazardous invasive plant species, which causes serious economic losses and environmental damages worldwide. However, the sequence resource and genome information of A. adenophora are rather limited, making phylogenetic identification and evolutionary studies very difficult. Here, we report the complete sequence of the A. adenophora chloroplast (cp) genome based on Illumina sequencing. The A. adenophora cp genome is 150, 689 bp in length including a small single-copy (SSC) region of 18, 358 bp and a large single-copy (LSC) region of 84, 815 bp separated by a pair of inverted repeats (IRs) of 23, 755 bp. The genome contains 130 unique genes and 18 duplicated in the IR regions, with the gene content and organization similar to other Asteraceae cp genomes. Comparative analysis identified five DNA regions (ndhD-ccsA, psbI-trnS, ndhF-ycf1, ndhI-ndhG and atpA-trnR) containing parsimony-informative characters higher than 2%, which may be potential informative markers for barcoding and phylogenetic analysis. Repeat structure, codon usage and contraction of the IR were also investigated to reveal the pattern of evolution. Phylogenetic analysis demonstrated a sister relationship between A. adenophora and Guizotia abyssinica and supported a monophyly of the Asterales. We have assembled and analyzed the chloroplast genome of A. adenophora in this study, which was the first sequenced plastome in the Eupatorieae tribe. The complete chloroplast genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family.
Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

PubMed Central

Dunbrack, Roland L.

2012-01-01

Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia

PubMed Central

2014-01-01

Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems. PMID:24655715
Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M

Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems.« less
Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution.

PubMed

Melters, Daniël P; Bradnam, Keith R; Young, Hugh A; Telis, Natalie; May, Michael R; Ruby, J Graham; Sebra, Robert; Peluso, Paul; Eid, John; Rank, David; Garcia, José Fernando; DeRisi, Joseph L; Smith, Timothy; Tobias, Christian; Ross-Ibarra, Jeffrey; Korf, Ian; Chan, Simon W L

2013-01-30

Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.
Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution

PubMed Central

2013-01-01

Background Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. Results Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. Conclusions While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes. PMID:23363705
A phylogenetic framework facilitates Y-STR variant discovery and classification via massively parallel sequencing.

PubMed

Huszar, Tunde I; Jobling, Mark A; Wetton, Jon H

2018-04-12

Short tandem repeats on the male-specific region of the Y chromosome (Y-STRs) are permanently linked as haplotypes, and therefore Y-STR sequence diversity can be considered within the robust framework of a phylogeny of haplogroups defined by single nucleotide polymorphisms (SNPs). Here we use massively parallel sequencing (MPS) to analyse the 23 Y-STRs in Promega's prototype PowerSeq™ Auto/Mito/Y System kit (containing the markers of the PowerPlex® Y23 [PPY23] System) in a set of 100 diverse Y chromosomes whose phylogenetic relationships are known from previous megabase-scale resequencing. Including allele duplications and alleles resulting from likely somatic mutation, we characterised 2311 alleles, demonstrating 99.83% concordance with capillary electrophoresis (CE) data on the same sample set. The set contains 267 distinct sequence-based alleles (an increase of 58% compared to the 169 detectable by CE), including 60 novel Y-STR variants phased with their flanking sequences which have not been reported previously to our knowledge. Variation includes 46 distinct alleles containing non-reference variants of SNPs/indels in both repeat and flanking regions, and 145 distinct alleles containing repeat pattern variants (RPV). For DYS385a,b, DYS481 and DYS390 we observed repeat count variation in short flanking segments previously considered invariable, and suggest new MPS-based structural designations based on these. We considered the observed variation in the context of the Y phylogeny: several specific haplogroup associations were observed for SNPs and indels, reflecting the low mutation rates of such variant types; however, RPVs showed less phylogenetic coherence and more recurrence, reflecting their relatively high mutation rates. In conclusion, our study reveals considerable additional diversity at the Y-STRs of the PPY23 set via MPS analysis, demonstrates high concordance with CE data, facilitates nomenclature standardisation, and places Y-STR sequence variants in their phylogenetic context. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

PubMed Central

Glunčić, Matko; Paar, Vladimir

2013-01-01

The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequency domain, with straightforward identification of repeats as peaks in GRM diagram. In this way, we obtain very fast, efficient and highly automatized repeat finding tool. The method is robust to substitutions and insertions/deletions, as well as to various complexities of the sequence pattern. We present several case studies of GRM use, in order to illustrate its capabilities: identification of α-satellite tandem repeats and higher order repeats (HORs), identification of Alu dispersed repeats and of Alu tandems, identification of Period 3 pattern in exons, implementation of ‘magnifying glass’ effect, identification of complex HOR pattern, identification of inter-tandem transitional dispersed repeat sequences and identification of long segmental duplications. GRM algorithm is convenient for use, in particular, in cases of large repeat units, of highly mutated and/or complex repeats, and of global repeat maps for large genomic sequences (chromosomes and genomes). PMID:22977183
Identification and characterization of tandem repeats in exon III of dopamine receptor D4 (DRD4) genes from different mammalian species.

PubMed

Larsen, Svend Arild; Mogensen, Line; Dietz, Rune; Baagøe, Hans Jørgen; Andersen, Mogens; Werge, Thomas; Rasmussen, Henrik Berg

2005-12-01

In this study we have identified and characterized dopamine receptor D4 (DRD4) exon III tandem repeats in 33 public available nucleotide sequences from different mammalian species. We found that the tandem repeat in canids could be described in a novel and simple way, namely, as a structure composed of 15- and 12- bp modules. Tandem repeats composed of 18-bp modules were found in sequences from the horse, zebra, onager, and donkey, Asiatic bear, polar bear, common raccoon, dolphin, harbor porpoise, and domestic cat. Several of these sequences have been analyzed previously without a tandem repeat being found. In the domestic cow and gray seal we identified tandem repeats composed of 36-bp modules, each consisting of two closely related 18-bp basic units. A tandem repeat consisting of 9-bp modules was identified in sequences from mink and ferret. In the European otter we detected an 18-bp tandem repeat, while a tandem repeat consisting of 27-bp modules was identified in a sequence from European badger. Both these tandem repeats were composed of 9-bp basic units, which were closely related with the 9-bp repeat modules identified in the mink and ferret. Tandem repeats could not be identified in sequences from rodents. All tandem repeats possessed a high GC content with a strong bias for C. On phylogenetic analysis of the tandem repeats evolutionary related species were clustered into the same groups. The degree of conservation of the tandem repeats varied significantly between species. The deduced amino acid sequences of most of the tandem repeats exhibited a high propensity for disorder. This was also the case with an amino acid sequence of the human DRD4 exon III tandem repeat, which was included in the study for comparative purposes. We identified proline-containing motifs for SH3 and WW domain binding proteins, potential phosphorylation sites, PDZ domain binding motifs, and FHA domain binding motifs in the amino acid sequences of the tandem repeats. The numbers of potential functional sites varied pronouncedly between species. Our observations provide a platform for future studies of the architecture and evolution of the DRD4 exon III tandem repeat, and they suggest that differences in the structure of this tandem repeat contribute to specialization and generation of diversity in receptor function.
Rate-determining Step of Flap Endonuclease 1 (FEN1) Reflects a Kinetic Bias against Long Flaps and Trinucleotide Repeat Sequences.

PubMed

Tarantino, Mary E; Bilotti, Katharina; Huang, Ji; Delaney, Sarah

2015-08-21

Flap endonuclease 1 (FEN1) is a structure-specific nuclease responsible for removing 5'-flaps formed during Okazaki fragment maturation and long patch base excision repair. In this work, we use rapid quench flow techniques to examine the rates of 5'-flap removal on DNA substrates of varying length and sequence. Of particular interest are flaps containing trinucleotide repeats (TNR), which have been proposed to affect FEN1 activity and cause genetic instability. We report that FEN1 processes substrates containing flaps of 30 nucleotides or fewer at comparable single-turnover rates. However, for flaps longer than 30 nucleotides, FEN1 kinetically discriminates substrates based on flap length and flap sequence. In particular, FEN1 removes flaps containing TNR sequences at a rate slower than mixed sequence flaps of the same length. Furthermore, multiple-turnover kinetic analysis reveals that the rate-determining step of FEN1 switches as a function of flap length from product release to chemistry (or a step prior to chemistry). These results provide a kinetic perspective on the role of FEN1 in DNA replication and repair and contribute to our understanding of FEN1 in mediating genetic instability of TNR sequences. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Complete chloroplast genome and 45S nrDNA sequences of the medicinal plant species Glycyrrhiza glabra and Glycyrrhiza uralensis.

PubMed

Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun

2017-10-06

Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

PubMed

de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

2015-11-16

Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Single molecule study of the intrinsically disordered FG-repeat nucleoporin 153.

PubMed

Milles, Sigrid; Lemke, Edward A

2011-10-05

Nucleoporins (Nups), which are intrinsically disordered, form a selectivity filter inside the nuclear pore complex, taking a central role in the vital nucleocytoplasmic transport mechanism. These Nups display a complex and nonrandom amino-acid architecture of phenylalanine glycine (FG)-repeat clusters and intra-FG linkers. How such heterogeneous sequence composition relates to function and could give rise to a transport mechanism is still unclear. Here we describe a combined chemical biology and single-molecule fluorescence approach to study the large human Nup153 FG-domain. In order to obtain insights into the properties of this domain beyond the average behavior, we probed the end-to-end distance (R(E)) of several ∼50-residues long FG-repeat clusters in the context of the whole protein domain. Despite the sequence heterogeneity of these FG-clusters, we detected a reoccurring and consistent compaction from a relaxed coil behavior under denaturing conditions (R(E)/R(E,RC) = 0.99 ± 0.15 with R(E,RC) corresponding to ideal relaxed coil behavior) to a collapsed state under native conditions (R(E)/R(E,RC) = 0.79 ± 0.09). We then analyzed the properties of this protein on the supramolecular level, and determined that this human FG-domain was in fact able to form a hydrogel with physiological permeability barrier properties. Copyright © 2011 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Chloroplast Genome Differences between Asian and American Equisetum arvense (Equisetaceae) and the Origin of the Hypervariable trnY-trnE Intergenic Spacer

PubMed Central

Kim, Hyoung Tae; Kim, Ki-Joong

2014-01-01

Comparative analyses of complete chloroplast (cp) DNA sequences within a species may provide clues to understand the population dynamics and colonization histories of plant species. Equisetum arvense (Equisetaceae) is a widely distributed fern species in northeastern Asia, Europe, and North America. The complete cp DNA sequences from Asian and American E. arvense individuals were compared in this study. The Asian E. arvense cp genome was 583 bp shorter than that of the American E. arvense. In total, 159 indels were observed between two individuals, most of which were concentrated on the hypervariable trnY-trnE intergenic spacer (IGS) in the large single-copy (LSC) region of the cp genome. This IGS region held a series of 19 bp repeating units. The numbers of the 19 bp repeat unit were responsible for 78% of the total length difference between the two cp genomes. Furthermore, only other closely related species of Equisetum also show the hypervariable nature of the trnY-trnE IGS. By contrast, only a single indel was observed in the gene coding regions: the ycf1 gene showed 24 bp differences between the two continental individuals due to a single tandem-repeat indel. A total of 165 single-nucleotide polymorphisms (SNPs) were recorded between the two cp genomes. Of these, 52 SNPs (31.5%) were distributed in coding regions, 13 SNPs (7.9%) were in introns, and 100 SNPs (60.6%) were in intergenic spacers (IGS). The overall difference between the Asian and American E. arvense cp genomes was 0.12%. Despite the relatively high genetic diversity between Asian and American E. arvense, the two populations are recognized as a single species based on their high morphological similarity. This indicated that the two regional populations have been in morphological stasis. PMID:25157804
The complete plastid genome sequence of Eustrephus latifolius (Asparagaceae: Lomandroideae).

PubMed

Kim, Hyoung Tae; Kim, Jung Sung; Kim, Joo-Hwan

2016-01-01

The complete chloroplast (cp) genome sequence of Eustrephus latifolius was firstly determined in subfamily Lomandriodeae of family Asparagaceae. It was 159,736 bp and contained a large single copy region (82,403 bp) and a small single copy region (13,607 bp) which were separated by two inverted repeat regions (31,863 bp). In total, 132 genes were identified and they were consisted of 83 coding genes, 8 rRNA genes, 38 tRNA genes, 3 pseudogenes. rpl23 and clpP were pseudogenes due to sequence deletions. Among 23 genes containing introns, rps12 and ycf3 contained two introns and the rest had just one intron. The intact ycf68 was identified within an intron of trnI-GAU. The amino acid sequence was almost identical with Phoenix dactylifera in Aracales. Ycf1 of E. latifolius was completely located in IR. It was similar to cp genome structure of Lemna minor, Spirodela polyrhiza, Wolffiella lingulata, Wolffia australiana in Alismatales.
Complete plastid genome sequence of goosegrass (Eleusine indica) and comparison with other Poaceae.

PubMed

Zhang, Hui; Hall, Nathan; McElroy, J Scott; Lowe, Elijah K; Goertzen, Leslie R

2017-02-05

Eleusine indica, also known as goosegrass, is a serious weed in at least 42 countries. In this paper we report the complete plastid genome sequence of goosegrass obtained by de novo assembly of paired-end and mate-paired reads generated by Illumina sequencing of total genomic DNA. The goosegrass plastome is a circular molecule of 135,151bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 20,919 bases. The large (LSC) and the small (SSC) single-copy regions span 80,667 bases and 12,646 bases, respectively. The plastome of goosegrass has 38.19% GC content and includes 108 unique genes, of which 76 are protein-coding, 28 are transfer RNA, and 4 are ribosomal RNA. The goosegrass plastome sequence was compared to eight other species of Poaceae. Although generally conserved with respect to Poaceae, this genomic resource will be useful for evolutionary studies within this weed species and the genus Eleusine. Copyright © 2016. Published by Elsevier B.V.
Repeated sequence sets in mitochondrial DNA molecules of root knot nematodes (Meloidogyne): nucleotide sequences, genome location and potential for host-race identification.

PubMed Central

Okimoto, R; Chamberlin, H M; Macfarlane, J L; Wolstenholme, D R

1991-01-01

Within a 7 kb segment of the mtDNA molecule of the root knot nematode, Meloidogyne javanica, that lacks standard mitochondrial genes, are three sets of strictly tandemly arranged, direct repeat sequences: approximately 36 copies of a 102 ntp sequence that contains a TaqI site; 11 copies of a 63 ntp sequence, and 5 copies of an 8 ntp sequence. The 7 kb repeat-containing segment is bounded by putative tRNAasp and tRNAf-met genes and the arrangement of sequences within this segment is: the tRNAasp gene; a unique 1,528 ntp segment that contains two highly stable hairpin-forming sequences; the 102 ntp repeat set; the 8 ntp repeat set; a unique 1,068 ntp segment; the 63 ntp repeat set; and the tRNAf-met gene. The nucleotide sequences of the 102 ntp copies and the 63 ntp copies have been conserved among the species examined. Data from Southern hybridization experiments indicate that 102 ntp and 63 ntp repeats occur in the mtDNAs of three, two and two races of M.incognita, M.hapla and M.arenaria, respectively. Nucleotide sequences of the M.incognita Race-3 102 ntp repeat were found to be either identical or highly similar to those of the M.javanica 102 ntp repeat. Differences in migration distance and number of 102 ntp repeat-containing bands seen in Southern hybridization autoradiographs of restriction-digested mtDNAs of M.javanica and the different host races of M.incognita, M.hapla and M.arenaria are sufficient to distinguish the different host races of each species. Images PMID:2027769
Novel Structure of Ty3 Reverse Transcriptase | Center for Cancer Research

Cancer.gov

Retrotransposons are mobile genetic elements that self amplify via a single-stranded RNA intermediate, which is converted to double-stranded DNA by an encoded reverse transcriptase (RT) with both DNA polymerase (pol) and ribonuclease H (RNase) activities. Categorized by whether they contain flanking long terminal repeat (LTR) sequences, retrotransposons play a critical role in
A New SNP Haplotype associated with blue disease resistance gene in cotton (Gossypium hirsutum L.)

USDA-ARS?s Scientific Manuscript database

Resistance to cotton blue disease (CBD) was evaluated in 364 F2.3 families of 3 populations derived from resistant variety ‘Delta Opal’. The CBD resistance in ‘Delta Opal’ was controlled by one single dominant gene designated Cbd. Two simple sequence repeat (SSR) markers were identified as linked t...
The complete chloroplast genome of a medicinal plant Epimedium koreanum Nakai (Berberidaceae).

PubMed

Lee, Jung-Hoon; Kim, Kyunghee; Kim, Na-Rae; Lee, Sang-Choon; Yang, Tae-Jin; Kim, Young-Dong

2016-11-01

Epimedium koreanum is a perennial medicinal plant distributed in Eastern Asia. The complete chloroplast genome sequences of E. koreanum was obtained by de novo assembly using whole genome next-generation sequences. The chloroplast genome of E. koreanum was 157 218 bp in length and separated into four distinct regions such as large single copy region (89 600 bp), small single copy region (17 222 bp) and a pair of inverted repeat regions (25 198 bp). The genome contained a total of 112 genes including 78 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Phylogenetic analysis with the reported chloroplast genomes revealed that E. koreanum is most closely related to Berberis bealei, a traditional medicinal plant in the Berberidaceae family.
Deletions at short direct repeats and base substitutions are characteristic mutations for bleomycin-induced double- and single-strand breaks, respectively, in a human shuttle vector system

NASA Technical Reports Server (NTRS)

Dar, M. E.; Jorgensen, T. J.

1995-01-01

Using the radiomimetic drug, bleomycin, we have determined the mutagenic potential of DNA strand breaks in the shuttle vector pZ189 in human fibroblasts. The bleomycin treatment conditions used produce strand breaks with 3'-phosphoglycolate termini as > 95% of the detectable dose-dependent lesions. Breaks with this end group represent 50% of the strand break damage produced by ionizing radiation. We report that such strand breaks are mutagenic lesions. The type of mutation produced is largely determined by the type of strand break on the plasmid (i.e. single versus double). Mutagenesis studies with purified DNA forms showed that nicked plasmids (i.e. those containing single-strand breaks) predominantly produce base substitutions, the majority of which are multiples, which presumably originate from error-prone polymerase activity at strand break sites. In contrast, repair of linear plasmids (i.e. those containing double-strand breaks) mainly results in deletions at short direct repeat sequences, indicating the involvement of illegitimate recombination. The data characterize the nature of mutations produced by single- and double-strand breaks in human cells, and suggests that deletions at direct repeats may be a 'signature' mutation for the processing of DNA double-strand breaks.
Chloroplast Genome of the Folk Medicine and Vegetable Plant Talinum paniculatum (Jacq.) Gaertn.: Gene Organization, Comparative and Phylogenetic Analysis.

PubMed

Liu, Xia; Li, Yuan; Yang, Hongyuan; Zhou, Boyang

2018-04-09

The complete chloroplast (cp) genome of Talinum paniculatum (Caryophyllale), a source of pharmaceutical efficacy similar to ginseng, and a widely distributed and planted edible vegetable, were sequenced and analyzed. The cp genome size of T. paniculatum is 156,929 bp, with a pair of inverted repeats (IRs) of 25,751 bp separated by a large single copy (LSC) region of 86,898 bp and a small single copy (SSC) region of 18,529 bp. The genome contains 83 protein-coding genes, 37 transfer RNA (tRNA) genes, eight ribosomal RNA (rRNA) genes and four pseudogenes. Fifty one (51) repeat units and ninety two (92) simple sequence repeats (SSRs) were found in the genome. The pseudogene rpl23 (Ribosomal protein L23) was insert AATT than other Caryophyllale species by sequence alignment, which located in IRs region. The gene of trnK-UUU (tRNA-Lys) and rpl16 (Ribosomal protein L16) have larger introns in T. paniculatum , and the existence of matK (maturase K) genes, which usually located in the introns of trnK-UUU , rich sequence divergence in Caryophyllale. Complete cp genome comparison with other eight Caryophyllales species indicated that the differences between T. paniculatum and P. oleracea were very slight, and the most highly divergent regions occurred in intergenic spacers. Comparisons of IR boundaries among nine Caryophyllales species showed that T. paniculatum have larger IRs region and the contraction is relatively slight. The phylogenetic analysis among 35 Caryophyllales species and two outgroup species revealed that T. paniculatum and P. oleracea do not belong to the same family. All these results give good opportunities for future identification, barcoding of Talinum species, understanding the evolutionary mode of Caryophyllale cp genome and molecular breeding of T. paniculatum with high pharmaceutical efficacy.
Assessing the performance of the Oxford Nanopore Technologies MinION

PubMed Central

Laver, T.; Harrison, J.; O’Neill, P.A.; Moore, K.; Farbos, A.; Paszkiewicz, K.; Studholme, D.J.

2015-01-01

The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data. PMID:26753127
Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.

PubMed

Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L

1987-08-01

To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded.
Characterization of proviruses cloned from mink cell focus-forming virus-infected cellular DNA.

PubMed Central

Khan, A S; Repaske, R; Garon, C F; Chan, H W; Rowe, W P; Martin, M A

1982-01-01

Two proviruses were cloned from EcoRI-digested DNA extracted from mink cells chronically infected with AKR mink cell focus-forming (MCF) 247 murine leukemia virus (MuLV), using a lambda phage host vector system. One cloned MuLV DNA fragment (designated MCF 1) contained sequences extending 6.8 kilobases from an EcoRI restriction site in the 5' long terminal repeat (LTR) to an EcoRI site located in the envelope (env) region and was indistinguishable by restriction endonuclease mapping for 5.1 kilobases (except for the EcoRI site in the LTR) from the 5' end of AKR ecotropic proviral DNA. The DNA segment extending from 5.1 to 6.8 kilobases contained several restriction sites that were not present in the AKR ecotropic provirus. A 0.5-kilobase DNA segment located at the 3' end of MCF 1 DNA contained sequences which hybridized to a xenotropic env-specific DNA probe but not to labeled ecotropic env-specific DNA. This dual character of MCF 1 proviral DNA was also confirmed by analyzing heteroduplex molecules by electron microscopy. The second cloned proviral DNA (designated MCF 2) was a 6.9-kilobase EcoRI DNA fragment which contained LTR sequences at each end and a 2.0-kilobase deletion encompassing most of the env region. The MCF 2 proviral DNA proved to be a useful reagent for detecting LTRs electron microscopically due to the presence of nonoverlapping, terminally located LTR sequences which effected its circularization with DNAs containing homologous LTR sequences. Nucleotide sequence analysis demonstrated the presence of a 104-base-pair direct repeat in the LTR of MCF 2 DNA. In contrast, only a single copy of the reiterated component of the direct repeat was present in MCF 1 DNA. Images PMID:6281459
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters.

PubMed

Dallery, Jean-Félix; Lapalu, Nicolas; Zampounis, Antonios; Pigné, Sandrine; Luyten, Isabelle; Amselem, Joëlle; Wittenberg, Alexander H J; Zhou, Shiguo; de Queiroz, Marisa V; Robin, Guillaume P; Auger, Annie; Hainaut, Matthieu; Henrissat, Bernard; Kim, Ki-Tae; Lee, Yong-Hwan; Lespinet, Olivier; Schwartz, David C; Thon, Michael R; O'Connell, Richard J

2017-08-29

The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.
Sperm Bindin Divergence under Sexual Selection and Concerted Evolution in Sea Stars.

PubMed

Patiño, Susana; Keever, Carson C; Sunday, Jennifer M; Popovic, Iva; Byrne, Maria; Hart, Michael W

2016-08-01

Selection associated with competition among males or sexual conflict between mates can create positive selection for high rates of molecular evolution of gamete recognition genes and lead to reproductive isolation between species. We analyzed coding sequence and repetitive domain variation in the gene encoding the sperm acrosomal protein bindin in 13 diverse sea star species. We found that bindin has a conserved coding sequence domain structure in all 13 species, with several repeated motifs in a large central region that is similar among all sea stars in organization but highly divergent among genera in nucleotide and predicted amino acid sequence. More bindin codons and lineages showed positive selection for high relative rates of amino acid substitution in genera with gonochoric outcrossing adults (and greater expected strength of sexual selection) than in selfing hermaphrodites. That difference is consistent with the expectation that selfing (a highly derived mating system) may moderate the strength of sexual selection and limit the accumulation of bindin amino acid differences. The results implicate both positive selection on single codons and concerted evolution within the repetitive region in bindin divergence, and suggest that both single amino acid differences and repeat differences may affect sperm-egg binding and reproductive compatibility. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Combined deficiency of MSH2 and Sμ region abolishes class switch recombination.

PubMed

Leduc, Claire; Haddad, Dania; Laviolette-Malirat, Nathalie; Nguyen Huu, Ngoc-Sa; Khamlichi, Ahmed Amine

2010-10-01

Class switch recombination (CSR) is mediated by G-rich tandem repeated sequences termed switch regions. Transcription of switch regions generates single-stranded R loops that provide substrates for activation-induced cytidine deaminase. Mice deficient in MSH2 have a mild defect in CSR and analysis of their switch junctions has led to a model in which MSH2 is more critical for switch recombination events outside than within the tandem repeats. It is also known that deletion of the whole Sμ region severely impairs but does not abrogate CSR despite the lack of detectable R loops. Here, we demonstrate that deficiency of both MSH2 and the Sμ region completely abolishes CSR and that the abrogation occurs at the genomic level. This finding further supports the crucial role of MSH2 outside the tandem repeats. It also indicates that during CSR, MSH2 has access to activation-induced cytidine deaminase targets in R-loop-deficient Iμ-Cμ sequences rarely used in CSR, suggesting an MSH2-dependent DNA processing activity at the Iμ exon that may decrease with transcription elongation across the Sμ region.
5′CAG and 5′CTG Repeats Create Differential Impediment to the Progression of a Minimal Reconstituted T4 Replisome Depending on the Concentration of dNTPs

PubMed Central

Delagoutte, Emmanuelle; Baldacci, Giuseppe

2011-01-01

Instability of repetitive sequences originates from strand misalignment during repair or replicative DNA synthesis. To investigate the activity of reconstituted T4 replisomes across trinucleotide repeats (TNRs) during leading strand DNA synthesis, we developed a method to build replication miniforks containing a TNR unit of defined sequence and length. Each minifork consists of three strands, primer, leading strand template, and lagging strand template with a 5′ single-stranded (ss) tail. Each strand is prepared independently, and the minifork is assembled by hybridization of the three strands. Using these miniforks and a minimal reconstituted T4 replisome, we show that during leading strand DNA synthesis, the dNTP concentration dictates which strand of the structure-forming 5′CAG/5′CTG repeat creates the strongest impediment to the minimal replication complex. We discuss this result in the light of the known fluctuation of dNTP concentration during the cell cycle and cell growth and the known concentration balance among individual dNTPs. PMID:22096622
Expansion of inverted repeat does not decrease substitution rates in Pelargonium plastid genomes.

PubMed

Weng, Mao-Lun; Ruhlman, Tracey A; Jansen, Robert K

2017-04-01

For species with minor inverted repeat (IR) boundary changes in the plastid genome (plastome), nucleotide substitution rates were previously shown to be lower in the IR than the single copy regions (SC). However, the impact of large-scale IR expansion/contraction on plastid nucleotide substitution rates among closely related species remains unclear. We included plastomes from 22 Pelargonium species, including eight newly sequenced genomes, and used both pairwise and model-based comparisons to investigate the impact of the IR on sequence evolution in plastids. Ten types of plastome organization with different inversions or IR boundary changes were identified in Pelargonium. Inclusion in the IR was not sufficient to explain the variation of nucleotide substitution rates. Instead, the rate heterogeneity in Pelargonium plastomes was a mixture of locus-specific, lineage-specific and IR-dependent effects. Our study of Pelargonium plastomes that vary in IR length and gene content demonstrates that the evolutionary consequences of retaining these repeats are more complicated than previously suggested. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Systematic Observation of an Expert Driver's Gaze Strategy—An On-Road Case Study

PubMed Central

Lappi, Otto; Rinkkala, Paavo; Pekkanen, Jami

2017-01-01

In this paper we present and qualitatively analyze an expert driver's gaze behavior in natural driving on a real road, with no specific experimental task or instruction. Previous eye tracking research on naturalistic tasks has revealed recurring patterns of gaze behavior that are surprisingly regular and repeatable. Lappi (2016) identified in the literature seven “qualitative laws of gaze behavior in the wild”: recurring patterns that tend to go together, the more so the more naturalistic the setting, all of them expected in extended sequences of fully naturalistic behavior. However, no study to date has observed all in a single experiment. Here, we wanted to do just that: present observations supporting all the “laws” in a single behavioral sequence by a single subject. We discuss the laws in terms of unresolved issues in driver modeling and open challenges for experimental and theoretical development. PMID:28496422
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed

Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed Central

Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179
Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells

PubMed Central

Han, Lin; Wu, Hua-Jun; Zhu, Haiying; Kim, Kun-Yong; Marjani, Sadie L.; Riester, Markus; Euskirchen, Ghia; Zi, Xiaoyuan; Yang, Jennifer; Han, Jasper; Snyder, Michael; Park, In-Hyun; Irizarry, Rafael; Weissman, Sherman M.

2017-01-01

Abstract Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population. PMID:28126923
Analysis of simple sequence repeat (SSR) structure and sequence within Epichloë endophyte genomes reveals impacts on gene structure and insights into ancestral hybridization events.

PubMed

Clayton, William; Eaton, Carla Jane; Dupont, Pierre-Yves; Gillanders, Tim; Cameron, Nick; Saikia, Sanjay; Scott, Barry

2017-01-01

Epichloë grass endophytes comprise a group of filamentous fungi of both sexual and asexual species. Known for the beneficial characteristics they endow upon their grass hosts, the identification of these endophyte species has been of great interest agronomically and scientifically. The use of simple sequence repeat loci and the variation in repeat elements has been used to rapidly identify endophyte species and strains, however, little is known of how the structure of repeat elements changes between species and strains, and where these repeat elements are located in the fungal genome. We report on an in-depth analysis of the structure and genomic location of the simple sequence repeat locus B10, commonly used for Epichloë endophyte species identification. The B10 repeat was found to be located within an exon of a putative bZIP transcription factor, suggesting possible impacts on polypeptide sequence and thus protein function. Analysis of this repeat in the asexual endophyte hybrid Epichloë uncinata revealed that the structure of B10 alleles reflects the ancestral species that hybridized to give rise to this species. Understanding the structure and sequence of these simple sequence repeats provides a useful set of tools for readily distinguishing strains and for gaining insights into the ancestral species that have undergone hybridization events.
5S ribosomal RNA gene repeat sequences define at least eight groups of plant trypanosomatids (Phytomonas spp.): phloem-restricted pathogens form a distinct section.

PubMed

Dollet, M; Sturm, N R; Sánchez-Moreno, M; Campbell, D A

2000-01-01

Trypanosomatids isolated from plants have been assigned typically into the genus Phytomonas. Such designations do not reflect the biology of the diverse isolates; confusion may arise due to the transient presence in plants of monogenetic (insect) trypanosomatids deposited by phytophagous bugs. To develop further molecular markers for the plant kinetoplastids, we have obtained the DNA sequence of the 5S ribosomal RNA gene from 24 isolates harvested from phloem, latex, and fruit. Small, distinct sequence differences were found at the 3'-ends of the transcribed regions; substantial sequence and size differences were found in the non-transcribed regions. Alignment of the gene sequences from all the isolates suggested the presence of eight groupings. While six groups contained isolates from single plant tissues, groups C and A contained isolates from both fruit and latex. The DNA sequences of the 10 phloem-restricted pathogenic isolates from South America and the Carribean were highly conserved and thus comprised a single group (H). The conserved nature of the 5S ribosomal RNA genes in these plant pathogens supports the proposal that they be considered as a distinct section, the phloemicola.
The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox.

PubMed

Gubser, Caroline; Smith, Geoffrey L

2002-04-01

Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.
TRAP: automated classification, quantification and annotation of tandemly repeated sequences.

PubMed

Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur

2006-02-01

TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.
Repeated dose comparison of nomifensine, imipramine and placebo on subjective assessments of sleep and objective measures of psychomotor performance

PubMed Central

Hindmarch, I.; Parrott, A. C.

1977-01-01

1 Nine normal subjects volunteered to participate in a randomized single-blind crossover study of nomifensine 75 mg and two comparators, imipramine 75 mg and placebo. 2 Each volunteer received placebo for 3 d, then the first test drug for 4 days. This sequence was repeated twice more, so that each subject received each comparator. All medication was taken three times daily. 3 Assessments were made on days 3, 5 and 7 of each sequence, and consisted of a Sleep Evaluation Questionnaire, a test of Critical Flicker Fusion and a measurement of Complex Reaction Time (CRT). 4 There were no significant differences in the CRT. There was a significant increase in critical flicker fusion with nomifensine. 5 Although both nomifensine and imipramine disturbed the quality of sleep, only imipramine produced a hangover. PMID:334219
Auditory Attentional Capture during Serial Recall: Violations at Encoding of an Algorithm-Based Neural Model?

ERIC Educational Resources Information Center

Hughes, Robert W.; Vachon, Francois; Jones, Dylan M.

2005-01-01

A novel attentional capture effect is reported in which visual-verbal serial recall was disrupted if a single deviation in the interstimulus interval occurred within otherwise regularly presented task-irrelevant spoken items. The degree of disruption was the same whether the temporal deviant was embedded in a sequence made up of a repeating item…
Structure and possible function of a G-quadruplex in the long terminal repeat of the proviral HIV-1 genome.

PubMed

De Nicola, Beatrice; Lech, Christopher J; Heddi, Brahim; Regmi, Sagar; Frasson, Ilaria; Perrone, Rosalba; Richter, Sara N; Phan, Anh Tuân

2016-07-27

The long terminal repeat (LTR) of the proviral human immunodeficiency virus (HIV)-1 genome is integral to virus transcription and host cell infection. The guanine-rich U3 region within the LTR promoter, previously shown to form G-quadruplex structures, represents an attractive target to inhibit HIV transcription and replication. In this work, we report the structure of a biologically relevant G-quadruplex within the LTR promoter region of HIV-1. The guanine-rich sequence designated LTR-IV forms a well-defined structure in physiological cationic solution. The nuclear magnetic resonance (NMR) structure of this sequence reveals a parallel-stranded G-quadruplex containing a single-nucleotide thymine bulge, which participates in a conserved stacking interaction with a neighboring single-nucleotide adenine loop. Transcription analysis in a HIV-1 replication competent cell indicates that the LTR-IV region may act as a modulator of G-quadruplex formation in the LTR promoter. Consequently, the LTR-IV G-quadruplex structure presented within this work could represent a valuable target for the design of HIV therapeutics. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Energy Landscapes of Repeat-Containing Proteins: Topology, Cooperativity, and the Folding Funnels of One-Dimensional Architectures

PubMed Central

Komives, Elizabeth A.; Wolynes, Peter G.

2008-01-01

Repeat-proteins are made up of near repetitions of 20– to 40–amino acid stretches. These polypeptides usually fold up into non-globular, elongated architectures that are stabilized by the interactions within each repeat and those between adjacent repeats, but that lack contacts between residues distant in sequence. The inherent symmetries both in primary sequence and three-dimensional structure are reflected in a folding landscape that may be analyzed as a quasi–one-dimensional problem. We present a general description of repeat-protein energy landscapes based on a formal Ising-like treatment of the elementary interaction energetics in and between foldons, whose collective ensemble are treated as spin variables. The overall folding properties of a complete “domain” (the stability and cooperativity of the repeating array) can be derived from this microscopic description. The one-dimensional nature of the model implies there are simple relations for the experimental observables: folding free-energy (ΔGwater) and the cooperativity of denaturation (m-value), which do not ordinarily apply for globular proteins. We show how the parameters for the “coarse-grained” description in terms of foldon spin variables can be extracted from more detailed folding simulations on perfectly funneled landscapes. To illustrate the ideas, we present a case-study of a family of tetratricopeptide (TPR) repeat proteins and quantitatively relate the results to the experimentally observed folding transitions. Based on the dramatic effect that single point mutations exert on the experimentally observed folding behavior, we speculate that natural repeat proteins are “poised” at particular ratios of inter- and intra-element interaction energetics that allow them to readily undergo structural transitions in physiologically relevant conditions, which may be intrinsically related to their biological functions. PMID:18483553
Attentional capture by taboo words: A functional view of auditory distraction.

PubMed

Röer, Jan P; Körner, Ulrike; Buchner, Axel; Bell, Raoul

2017-06-01

It is well established that task-irrelevant, to-be-ignored speech adversely affects serial short-term memory (STM) for visually presented items compared with a quiet control condition. However, there is an ongoing debate about whether the semantic content of the speech has the capacity to capture attention and to disrupt memory performance. In the present article, we tested whether taboo words are more difficult to ignore than neutral words. Taboo words or neutral words were presented as (a) steady state sequences in which the same distractor word was repeated, (b) changing state sequences in which different distractor words were presented, and (c) auditory deviant sequences in which a single distractor word deviated from a sequence of repeated words. Experiments 1 and 2 showed that taboo words disrupted performance more than neutral words. This taboo effect did not habituate and it did not differ between individuals with high and low working memory capacity. In Experiments 3 and 4, in which only a single deviant taboo word was presented, no taboo effect was obtained. These results do not support the idea that the processing of the auditory distractors' semantic content is the result of occasional attention switches to the auditory modality. Instead, the overall pattern of results is more in line with a functional view of auditory distraction, according to which the to-be-ignored modality is routinely monitored for potentially important stimuli (e.g., self-relevant or threatening information), the detection of which draws processing resources away from the primary task. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Evolution of the rpoB-psbZ region in fern plastid genomes: notable structural rearrangements and highly variable intergenic spacers

PubMed Central

2011-01-01

Background The rpoB-psbZ (BZ) region of some fern plastid genomes (plastomes) has been noted to go through considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the fundamental process shaping fern plastome structure and organization. Results A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. We found that: (i) a tree fern Plagiogyria japonica contained a novel gene order that can be generated from either the ancestral Angiopteris type or the derived Adiantum type via a single inversion; (ii) the trnY-trnE intergenic spacer (IGS) of the filmy fern Vandenboschia radicans was expanded 3-fold due to the tandem 27-bp repeats which showed strong sequence similarity with the anticodon domain of trnY; (iii) the trnY-trnE IGSs of two horsetail ferns Equisetum ramosissimum and E. arvense underwent an unprecedented 5-kb long expansion, more than a quarter of which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv) ycf66 has independently lost at least four times in ferns. Conclusions Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ gene order was not detected, supporting that the Adiantum type was generated by two inversions occurring in pairs. The occurrence of Vandenboschia 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS. PMID:21486489
Evolution of the rpoB-psbZ region in fern plastid genomes: notable structural rearrangements and highly variable intergenic spacers.

PubMed

Gao, Lei; Zhou, Yuan; Wang, Zhi-Wei; Su, Ying-Juan; Wang, Ting

2011-04-13

The rpoB-psbZ (BZ) region of some fern plastid genomes (plastomes) has been noted to go through considerable genomic changes. Unraveling its evolutionary dynamics across all fern lineages will lead to clarify the fundamental process shaping fern plastome structure and organization. A total of 24 fern BZ sequences were investigated with taxon sampling covering all the extant fern orders. We found that: (i) a tree fern Plagiogyria japonica contained a novel gene order that can be generated from either the ancestral Angiopteris type or the derived Adiantum type via a single inversion; (ii) the trnY-trnE intergenic spacer (IGS) of the filmy fern Vandenboschia radicans was expanded 3-fold due to the tandem 27-bp repeats which showed strong sequence similarity with the anticodon domain of trnY; (iii) the trnY-trnE IGSs of two horsetail ferns Equisetum ramosissimum and E. arvense underwent an unprecedented 5-kb long expansion, more than a quarter of which was consisted of a single type of direct repeats also relevant to the trnY anticodon domain; and (iv) ycf66 has independently lost at least four times in ferns. Our results provided fresh insights into the evolutionary process of fern BZ regions. The intermediate BZ gene order was not detected, supporting that the Adiantum type was generated by two inversions occurring in pairs. The occurrence of Vandenboschia 27-bp repeats represents the first evidence of partial tRNA gene duplication in fern plastomes. Repeats potentially forming a stem-loop structure play major roles in the expansion of the trnY-trnE IGS.
Effect of Repeat Copy Number on Variable-Number Tandem Repeat Mutations in Escherichia coli O157:H7

PubMed Central

Vogler, Amy J.; Keys, Christine; Nemoto, Yoshimi; Colman, Rebecca E.; Jay, Zack; Keim, Paul

2006-01-01

Variable-number tandem repeat (VNTR) loci have shown a remarkable ability to discriminate among isolates of the recently emerged clonal pathogen Escherichia coli O157:H7, making them a very useful molecular epidemiological tool. However, little is known about the rates at which these sequences mutate, the factors that affect mutation rates, or the mechanisms by which mutations occur at these loci. Here, we measure mutation rates for 28 VNTR loci and investigate the effects of repeat copy number and mismatch repair on mutation rate using in vitro-generated populations for 10 E. coli O157:H7 strains. We find single-locus rates as high as 7.0 × 10−4 mutations/generation and a combined 28-locus rate of 6.4 × 10−4 mutations/generation. We observed single- and multirepeat mutations that were consistent with a slipped-strand mispairing mutation model, as well as a smaller number of large repeat copy number mutations that were consistent with recombination-mediated events. Repeat copy number within an array was strongly correlated with mutation rate both at the most mutable locus, O157-10 (r2 = 0.565, P = 0.0196), and across all mutating loci. The combined locus model was significant whether locus O157-10 was included (r2 = 0.833, P < 0.0001) or excluded (r2 = 0.452, P < 0.0001) from the analysis. Deficient mismatch repair did not affect mutation rate at any of the 28 VNTRs with repeat unit sizes of >5 bp, although a poly(G) homomeric tract was destabilized in the mutS strain. Finally, we describe a general model for VNTR mutations that encompasses insertions and deletions, single- and multiple-repeat mutations, and their relative frequencies based upon our empirical mutation rate data. PMID:16740932
The complete chloroplast genome sequence of Mahonia bealei (Berberidaceae) reveals a significant expansion of the inverted repeat and phylogenetic relationship with other angiosperms.

PubMed

Ma, Ji; Yang, Bingxian; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Wang, Xumin

2013-10-10

Mahonia bealei (Berberidaceae) is a frequently-used traditional Chinese medicinal plant with efficient anti-inflammatory ability. This plant is one of the sources of berberine, a new cholesterol-lowering drug with anti-diabetic activity. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of M. bealei. The complete cp genome of M. bealei is 164,792 bp in length, and has a typical structure with large (LSC 73,052 bp) and small (SSC 18,591 bp) single-copy regions separated by a pair of inverted repeats (IRs 36,501 bp) of large size. The Mahonia cp genome contains 111 unique genes and 39 genes are duplicated in the IR regions. The gene order and content of M. bealei are almost unarranged which is consistent with the hypothesis that large IRs stabilize cp genome and reduce gene loss-and-gain probabilities during evolutionary process. A large IR expansion of over 12 kb has occurred in M. bealei, 15 genes (rps19, rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11, petD, petB, psbH, psbN, psbT and psbB) have expanded to have an additional copy in the IRs. The IR expansion rearrangement occurred via a double-strand DNA break and subsequence repair, which is different from the ordinary gene conversion mechanism. Repeat analysis identified 39 direct/inverted repeats 30 bp or longer with a sequence identity ≥ 90%. Analysis also revealed 75 simple sequence repeat (SSR) loci and almost all are composed of A or T, contributing to a distinct bias in base composition. Comparison of protein-coding sequences with ESTs reveals 9 putative RNA edits and 5 of them resulted in non-synonymous modifications in rpoC1, rps2, rps19 and ycf1. Phylogenetic analysis using maximum parsimony (MP) and maximum likelihood (ML) was performed on a dataset composed of 65 protein-coding genes from 25 taxa, which yields an identical tree topology as previous plastid-based trees, and provides strong support for the sister relationship between Ranunculaceae and Berberidaceae. Molecular dating analyses suggest that Ranunculaceae and Berberidaceae diverged between 90 and 84 mya, which is congruent with the fossil records and with recent estimates of the divergence time of these two taxa. © 2013.

Characterization of the patterns of polymorphism in a [open quotes]cryptic repeat[close quotes] reveals a novel type of hypervariable sequence

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacobson, D.P.; Schmeling, P.; Sommer, S.S.

Alternating purine and pyrimidine repeats (RY(i)) are an abundant source of polymorphism. The subset with long tandem repeats of GT or AC (GT(i)) have been studied extensively, but cryptic RY(i) (i.e., no single tandem repeat predominates) have received little attention. The factor IX gene has a polymorphic cryptic RY(i) of 142-216 bp. Previously, there were four known polymorphic alleles, of the form AB, A[sub 2]B, A[sub 2]B[sub 2], and A[sub 3]B[sub 2], where A = (GT)(AC)[sub 3](AT)[sub 3](GT)(AT)[sub 4] and B = A with an additional 3' AT dinucleotide. To further characterize this locus, the authors examined more than 1,700more » additional human chromosomes and determined the sequences of the homologous sites in orangutans and chimpanzees. The novel alleles found in humans expand the repertoire of A/B alleles to A[sub 0-4]B[sub 1] and A[sub 1-3]B[sub 2]. The A[sub n]B[sub 2] series are abundant in Caucasians but are absent in blacks and Asians. Conversely, the A[sub 0]B[sub 1] allele is common in blacks but is not found in more than 1,700 Caucasian chromosomes. The data are compatible with a model in which recombination is more frequent than polymerase slippage at this locus. In orangutans, the RY(i) is present, but the sequence is markedly different. An A/B-type of pattern was discerned in which B differs from A by an additional six (AT) dinucleotides at the 3' end. In chimpanzees, the size of the RY(i) locus was greatly expanded, and the sequence showed a novel pattern of hypervariability in which there are many tandem repeats of the form (GT)[sub n](AC)[sub 0](AT)[sub p](GT)[sub q](AT)[sub s], where n, o, p, q, and s are different integers. The sequences of the factor IX intron 1 cryptic RY(i) in three primates provide perspective on the range of possible patterns of polymorphism. Analysis of the patterns suggests how the RY(i) can be conserved during evolution, while the precise sequence varies. 25 refs., 5 figs., 3 tabs.« less
The complete chloroplast genome of the Dendrobium strongylanthum (Orchidaceae: Epidendroideae).

PubMed

Li, Jing; Chen, Chen; Wang, Zhe-Zhi

2016-07-01

Complete chloroplast genome sequence is very useful for studying the phylogenetic and evolution of species. In this study, the complete chloroplast genome of Dendrobium strongylanthum was constructed from whole-genome Illumina sequencing data. The chloroplast genome is 153 058 bp in length with 37.6% GC content and consists of two inverted repeats (IRs) of 26 316 bp. The IR regions are separated by large single-copy region (LSC, 85 836 bp) and small single-copy (SSC, 14 590 bp) region. A total of 130 chloroplast genes were successfully annotated, including 84 protein coding genes, 38 tRNA genes, and eight rRNA genes. Phylogenetic analyses showed that the chloroplast genome of Dendrobium strongylanthum is related to that of the Dendrobium officinal.
Large diversity of the piggyBac-like elements in the genome of Tribolium castaneum

PubMed Central

Wang, Jianjun; Du, Yuzhou; Wang, Suzhi; Brown, Sue; Park, Yoonseong

2011-01-01

The piggyBac transposable element, originally discovered in the cabbage looper, Trichoplusia ni, has been widely used in insect transgenesis including the red flour beetle Tribolium castaneum. We surveyed piggyBac-like (PLE) sequences in the genome of Tribolium castaneum by homology searches using as queries the diverse PLE sequences that have been described previously. The search yielded a total of 32 piggyBac-like elements (TcPLEs) which were classified into 14 distinct groups. Most of the TcPLEs contain defective functional motifs in that they are lacking inverted terminal repeats or have disrupted open reading frames. Only one single copy of TcPLE1 appears to be intact with imperfect 16 bp inverted terminal repeats flanking an open reading frame encoding a transposase of 571 amino acid residues. Many copies of TcPLEs were found to be inserted into or close to other transposon-like sequences. This large diversity of TcPLEs with generally low copy numbers suggests multiple invasions of the TcPLEs over a long evolutionary time without extensive multiplications or occurrence of rapid loss of TcPLEs copies. PMID:18342253
A Rapid Method for Optimizing Running Temperature of Electrophoresis through Repetitive On-Chip CE Operations

PubMed Central

Kaneda, Shohei; Ono, Koichi; Fukuba, Tatsuhiro; Nojima, Takahiko; Yamamoto, Takatoki; Fujii, Teruo

2011-01-01

In this paper, a rapid and simple method to determine the optimal temperature conditions for denaturant electrophoresis using a temperature-controlled on-chip capillary electrophoresis (CE) device is presented. Since on-chip CE operations including sample loading, injection and separation are carried out just by switching the electric field, we can repeat consecutive run-to-run CE operations on a single on-chip CE device by programming the voltage sequences. By utilizing the high-speed separation and the repeatability of the on-chip CE, a series of electrophoretic operations with different running temperatures can be implemented. Using separations of reaction products of single-stranded DNA (ssDNA) with a peptide nucleic acid (PNA) oligomer, the effectiveness of the presented method to determine the optimal temperature conditions required to discriminate a single-base substitution (SBS) between two different ssDNAs is demonstrated. It is shown that a single run for one temperature condition can be executed within 4 min, and the optimal temperature to discriminate the SBS could be successfully found using the present method. PMID:21845077
Non-radioactive detection of trinucleotide repeat size variability.

PubMed

Tomé, Stéphanie; Nicole, Annie; Gomes-Pereira, Mario; Gourdon, Genevieve

2014-03-06

Many human diseases are associated with the abnormal expansion of unstable trinucleotide repeat sequences. The mechanisms of trinucleotide repeat size mutation have not been fully dissected, and their understanding must be grounded on the detailed analysis of repeat size distributions in human tissues and animal models. Small-pool PCR (SP-PCR) is a robust, highly sensitive and efficient PCR-based approach to assess the levels of repeat size variation, providing both quantitative and qualitative data. The method relies on the amplification of a very low number of DNA molecules, through sucessive dilution of a stock genomic DNA solution. Radioactive Southern blot hybridization is sensitive enough to detect SP-PCR products derived from single template molecules, separated by agarose gel electrophoresis and transferred onto DNA membranes. We describe a variation of the detection method that uses digoxigenin-labelled locked nucleic acid probes. This protocol keeps the sensitivity of the original method, while eliminating the health risks associated with the manipulation of radiolabelled probes, and the burden associated with their regulation, manipulation and waste disposal.
Excess single-stranded DNA inhibits meiotic double-strand break repair.

PubMed

Johnson, Rebecca; Borde, Valérie; Neale, Matthew J; Bishop-Bailey, Anna; North, Matthew; Harris, Sheila; Nicolas, Alain; Goldman, Alastair S H

2007-11-01

During meiosis, self-inflicted DNA double-strand breaks (DSBs) are created by the protein Spo11 and repaired by homologous recombination leading to gene conversions and crossovers. Crossover formation is vital for the segregation of homologous chromosomes during the first meiotic division and requires the RecA orthologue, Dmc1. We analyzed repair during meiosis of site-specific DSBs created by another nuclease, VMA1-derived endonuclease (VDE), in cells lacking Dmc1 strand-exchange protein. Turnover and resection of the VDE-DSBs was assessed in two different reporter cassettes that can repair using flanking direct repeat sequences, thereby obviating the need for a Dmc1-dependent DNA strand invasion step. Access of the single-strand binding complex replication protein A, which is normally used in all modes of DSB repair, was checked in chromatin immunoprecipitation experiments, using antibody against Rfa1. Repair of the VDE-DSBs was severely inhibited in dmc1Delta cells, a defect that was associated with a reduction in the long tract resection required to initiate single-strand annealing between the flanking repeat sequences. Mutants that either reduce Spo11-DSB formation or abolish resection at Spo11-DSBs rescued the repair block. We also found that a replication protein A component, Rfa1, does not accumulate to expected levels at unrepaired single-stranded DNA (ssDNA) in dmc1Delta cells. The requirement of Dmc1 for VDE-DSB repair using flanking repeats appears to be caused by the accumulation of large quantities of ssDNA that accumulate at Spo11-DSBs when Dmc1 is absent. We propose that these resected DSBs sequester both resection machinery and ssDNA binding proteins, which in wild-type cells would normally be recycled as Spo11-DSBs repair. The implication is that repair proteins are in limited supply, and this could reflect an underlying mechanism for regulating DSB repair in wild-type cells, providing protection from potentially harmful effects of overabundant repair proteins.
Excess Single-Stranded DNA Inhibits Meiotic Double-Strand Break Repair

PubMed Central

Bishop-Bailey, Anna; North, Matthew; Harris, Sheila; Nicolas, Alain; Goldman, Alastair S. H

2007-01-01

During meiosis, self-inflicted DNA double-strand breaks (DSBs) are created by the protein Spo11 and repaired by homologous recombination leading to gene conversions and crossovers. Crossover formation is vital for the segregation of homologous chromosomes during the first meiotic division and requires the RecA orthologue, Dmc1.We analyzed repair during meiosis of site-specific DSBs created by another nuclease, VMA1-derived endonuclease (VDE), in cells lacking Dmc1 strand-exchange protein. Turnover and resection of the VDE-DSBs was assessed in two different reporter cassettes that can repair using flanking direct repeat sequences, thereby obviating the need for a Dmc1-dependent DNA strand invasion step. Access of the single-strand binding complex replication protein A, which is normally used in all modes of DSB repair, was checked in chromatin immunoprecipitation experiments, using antibody against Rfa1. Repair of the VDE-DSBs was severely inhibited in dmc1Δ cells, a defect that was associated with a reduction in the long tract resection required to initiate single-strand annealing between the flanking repeat sequences. Mutants that either reduce Spo11-DSB formation or abolish resection at Spo11-DSBs rescued the repair block. We also found that a replication protein A component, Rfa1, does not accumulate to expected levels at unrepaired single-stranded DNA (ssDNA) in dmc1Δ cells. The requirement of Dmc1 for VDE-DSB repair using flanking repeats appears to be caused by the accumulation of large quantities of ssDNA that accumulate at Spo11-DSBs when Dmc1 is absent. We propose that these resected DSBs sequester both resection machinery and ssDNA binding proteins, which in wild-type cells would normally be recycled as Spo11-DSBs repair. The implication is that repair proteins are in limited supply, and this could reflect an underlying mechanism for regulating DSB repair in wild-type cells, providing protection from potentially harmful effects of overabundant repair proteins. PMID:18081428
Disruption of Higher Order DNA Structures in Friedreich’s Ataxia (GAA)n Repeats by PNA or LNA Targeting

PubMed Central

Bergquist, Helen; Rocha, Cristina S. J.; Álvarez-Asencio, Rubén; Nguyen, Chi-Hung; Rutland, Mark. W.; Smith, C. I. Edvard; Good, Liam; Nielsen, Peter E.; Zain, Rula

2016-01-01

Expansion of (GAA)n repeats in the first intron of the Frataxin gene is associated with reduced mRNA and protein levels and the development of Friedreich’s ataxia. (GAA)n expansions form non-canonical structures, including intramolecular triplex (H-DNA), and R-loops and are associated with epigenetic modifications. With the aim of interfering with higher order H-DNA (like) DNA structures within pathological (GAA)n expansions, we examined sequence-specific interaction of peptide nucleic acid (PNA) with (GAA)n repeats of different lengths (short: n=9, medium: n=75 or long: n=115) by chemical probing of triple helical and single stranded regions. We found that a triplex structure (H-DNA) forms at GAA repeats of different lengths; however, single stranded regions were not detected within the medium size pathological repeat, suggesting the presence of a more complex structure. Furthermore, (GAA)4-PNA binding of the repeat abolished all detectable triplex DNA structures, whereas (CTT)5-PNA did not. We present evidence that (GAA)4-PNA can invade the DNA at the repeat region by binding the DNA CTT strand, thereby preventing non-canonical-DNA formation, and that triplex invasion complexes by (CTT)5-PNA form at the GAA repeats. Locked nucleic acid (LNA) oligonucleotides also inhibited triplex formation at GAA repeat expansions, and atomic force microscopy analysis showed significant relaxation of plasmid morphology in the presence of GAA-LNA. Thus, by inhibiting disease related higher order DNA structures in the Frataxin gene, such PNA and LNA oligomers may have potential for discovery of drugs aiming at recovering Frataxin expression. PMID:27846236
Quantitative comparison between a multiecho sequence and a single-echo sequence for susceptibility-weighted phase imaging.

PubMed

Gilbert, Guillaume; Savard, Geneviève; Bard, Céline; Beaudoin, Gilles

2012-06-01

The aim of this study was to investigate the benefits arising from the use of a multiecho sequence for susceptibility-weighted phase imaging using a quantitative comparison with a standard single-echo acquisition. Four healthy adult volunteers were imaged on a clinical 3-T system using a protocol comprising two different three-dimensional susceptibility-weighted gradient-echo sequences: a standard single-echo sequence and a multiecho sequence. Both sequences were repeated twice in order to evaluate the local noise contribution by a subtraction of the two acquisitions. For the multiecho sequence, the phase information from each echo was independently unwrapped, and the background field contribution was removed using either homodyne filtering or the projection onto dipole fields method. The phase information from all echoes was then combined using a weighted linear regression. R2 maps were also calculated from the multiecho acquisitions. The noise standard deviation in the reconstructed phase images was evaluated for six manually segmented regions of interest (frontal white matter, posterior white matter, globus pallidus, putamen, caudate nucleus and lateral ventricle). The use of the multiecho sequence for susceptibility-weighted phase imaging led to a reduction of the noise standard deviation for all subjects and all regions of interest investigated in comparison to the reference single-echo acquisition. On average, the noise reduction ranged from 18.4% for the globus pallidus to 47.9% for the lateral ventricle. In addition, the amount of noise reduction was found to be strongly inversely correlated to the estimated R2 value (R=-0.92). In conclusion, the use of a multiecho sequence is an effective way to decrease the noise contribution in susceptibility-weighted phase images, while preserving both contrast and acquisition time. The proposed approach additionally permits the calculation of R2 maps. Copyright © 2012 Elsevier Inc. All rights reserved.
In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

PubMed

Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

2015-01-01

The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
Methods for sequencing GC-rich and CCT repeat DNA templates

DOEpatents

Robinson, Donna L.

2007-02-20

The present invention is directed to a PCR-based method of cycle sequencing DNA and other polynucleotide sequences having high CG content and regions of high GC content, and includes for example DNA strands with a high Cytosine and/or Guanosine content and repeated motifs such as CCT repeats.
The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

Treesearch

A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

2000-01-01

Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...
Always look on both sides: Phylogenetic information conveyed by simple sequence repeat allele sequences

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily,...
A Nomadic Subtelomeric Disease Resistance Gene Cluster in Common Bean1[W

PubMed Central

David, Perrine; Chen, Nicolas W.G.; Pedrosa-Harand, Andrea; Thareau, Vincent; Sévignac, Mireille; Cannon, Steven B.; Debouck, Daniel; Langin, Thierry; Geffroy, Valérie

2009-01-01

The B4 resistance (R) gene cluster is one of the largest clusters known in common bean (Phaseolus vulgaris [Pv]). It is located in a peculiar genomic environment in the subtelomeric region of the short arm of chromosome 4, adjacent to two heterochromatic blocks (knobs). We sequenced 650 kb spanning this locus and annotated 97 genes, 26 of which correspond to Coiled-Coil-Nucleotide-Binding-Site-Leucine-Rich-Repeat (CNL). Conserved microsynteny was observed between the Pv B4 locus and corresponding regions of Medicago truncatula and Lotus japonicus in chromosomes Mt6 and Lj2, respectively. The notable exception was the CNL sequences, which were completely absent in these regions. The origin of the Pv B4-CNL sequences was investigated through phylogenetic analysis, which reveals that, in the Pv genome, paralogous CNL genes are shared among nonhomologous chromosomes (4 and 11). Together, our results suggest that Pv B4-CNL was derived from CNL sequences from another cluster, the Co-2 cluster, through an ectopic recombination event. Integration of the soybean (Glycine max) genome data enables us to date more precisely this event and also to infer that a single CNL moved from the Co-2 to the B4 cluster. Moreover, we identified a new 528-bp satellite repeat, referred to as khipu, specific to the Phaseolus genus, present both between B4-CNL sequences and in the two knobs identified at the B4 R gene cluster. The khipu repeat is present on most chromosomal termini, indicating the existence of frequent ectopic recombination events in Pv subtelomeric regions. Our results highlight the importance of ectopic recombination in R gene evolution. PMID:19776165
Final Report for LDRD Project 02-ERD-069: Discovering the Unknown Mechanism(s) of Virulence in a BW, Class A Select Agent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chain, P; Garcia, E

2003-02-06

The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
[Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

PubMed

Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

2015-04-01

This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.
A novel species-specific tandem repeat DNA family from Sinapis arvensis: detection of telomere-like sequences.

PubMed

Kapila, R; Das, S; Srivastava, P S; Lakshmikumaran, M

1996-08-01

DNA sequences representing a tandemly repeated DNA family of the Sinapis arvensis genome were cloned and characterized. The 700-bp tandem repeat family is represented by two clones, pSA35 and pSA52, which are 697 and 709 bp in length, respectively. Dot matrix analysis of the sequences indicates the presence of repeated elements within each monomeric unit. Sequence analysis of the repetitive region of clones pSA35 and pSA52 shows that there are several copies of a 7-bp repeat element organized in tandem. The consensus sequence of this repeat element is 5'-TTTAGGG-3'. These elements are highly mutated and the difference in length between the two clones is due to different copy numbers of these elements. The repetitive region of clone pSA35 has 26 copies of the element TTTAGGG, whereas clone pSA52 has 28 copies. The repetitive region in both clones is flanked on either side by inverted repeats that may be footprints of a transposition event. Sequence comparison indicates that the element TTTAGGG is identical to telomeric repeats present in Arabidopsis, maize, tomato, and other plants. However, Bal31 digestion kinetics indicates non-telomeric localization of the 700-bp tandem repeats. The clones represent a novel repeat family as (i) they contain telomere-like motifs as subrepeats within each unit; and (ii) they do not hybridize to related crucifers and are species-specific in nature.
Comprehensive analysis of single molecule sequencing-derived complete genome and whole transcriptome of Hyposidra talaca nuclear polyhedrosis virus.

PubMed

Nguyen, Thong T; Suryamohan, Kushal; Kuriakose, Boney; Janakiraman, Vasantharajan; Reichelt, Mike; Chaudhuri, Subhra; Guillory, Joseph; Divakaran, Neethu; Rabins, P E; Goel, Ridhi; Deka, Bhabesh; Sarkar, Suman; Ekka, Preety; Tsai, Yu-Chih; Vargas, Derek; Santhosh, Sam; Mohan, Sangeetha; Chin, Chen-Shan; Korlach, Jonas; Thomas, George; Babu, Azariah; Seshagiri, Somasekar

2018-06-12

We sequenced the Hyposidra talaca NPV (HytaNPV) double stranded circular DNA genome using PacBio single molecule sequencing technology. We found that the HytaNPV genome is 139,089 bp long with a GC content of 39.6%. It encodes 141 open reading frames (ORFs) including the 37 baculovirus core genes, 25 genes conserved among lepidopteran baculoviruses, 72 genes known in baculovirus, and 7 genes unique to the HytaNPV genome. It is a group II alphabaculovirus that codes for the F protein and lacks the gp64 gene found in group I alphabaculovirus viruses. Using RNA-seq, we confirmed the expression of the ORFs identified in the HytaNPV genome. Phylogenetic analysis showed HytaNPV to be closest to BusuNPV, SujuNPV and EcobNPV that infect other tea pests, Buzura suppressaria, Sucra jujuba, and Ectropis oblique, respectively. We identified repeat elements and a conserved non-coding baculovirus element in the genome. Analysis of the putative promoter sequences identified motif consistent with the temporal expression of the genes observed in the RNA-seq data.
Microsatellites in the Eukaryotic DNA Mismatch Repair Genes as Modulators of Evolutionary Mutation Rate

NASA Technical Reports Server (NTRS)

Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard

2003-01-01

All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.
Formation and Repair of Mismatches Containing Ribonucleotides and Oxidized Bases at Repeated DNA Sequences*

PubMed Central

Cilli, Piera; Minoprio, Anna; Bossa, Cecilia; Bignami, Margherita; Mazzei, Filomena

2015-01-01

The cellular pool of ribonucleotide triphosphates (rNTPs) is higher than that of deoxyribonucleotide triphosphates. To ensure genome stability, DNA polymerases must discriminate against rNTPs and incorporated ribonucleotides must be removed by ribonucleotide excision repair (RER). We investigated DNA polymerase β (POL β) capacity to incorporate ribonucleotides into trinucleotide repeated DNA sequences and the efficiency of base excision repair (BER) and RER enzymes (OGG1, MUTYH, and RNase H2) when presented with an incorrect sugar and an oxidized base. POL β incorporated rAMP and rCMP opposite 7,8-dihydro-8-oxoguanine (8-oxodG) and extended both mispairs. In addition, POL β was able to insert and elongate an oxidized rGMP when paired with dA. We show that RNase H2 always preserves the capacity to remove a single ribonucleotide when paired to an oxidized base or to incise an oxidized ribonucleotide in a DNA duplex. In contrast, BER activity is affected by the presence of a ribonucleotide opposite an 8-oxodG. In particular, MUTYH activity on 8-oxodG:rA mispairs is fully inhibited, although its binding capacity is retained. This results in the reduction of RNase H2 incision capability of this substrate. Thus complex mispairs formed by an oxidized base and a ribonucleotide can compromise BER and RER in repeated sequences. PMID:26338705

Saccharomyces cerevisiae SSB1 protein and its relationship to nucleolar RNA-binding proteins.

PubMed Central

Jong, A Y; Clark, M W; Gilbert, M; Oehm, A; Campbell, J L

1987-01-01

To better define the function of Saccharomyces cerevisiae SSB1, an abundant single-stranded nucleic acid-binding protein, we determined the nucleotide sequence of the SSB1 gene and compared it with those of other proteins of known function. The amino acid sequence contains 293 amino acid residues and has an Mr of 32,853. There are several stretches of sequence characteristic of other eucaryotic single-stranded nucleic acid-binding proteins. At the amino terminus, residues 39 to 54 are highly homologous to a peptide in calf thymus UP1 and UP2 and a human heterogeneous nuclear ribonucleoprotein. Residues 125 to 162 constitute a fivefold tandem repeat of the sequence RGGFRG, the composition of which suggests a nucleic acid-binding site. Near the C terminus, residues 233 to 245 are homologous to several RNA-binding proteins. Of 18 C-terminal residues, 10 are acidic, a characteristic of the procaryotic single-stranded DNA-binding proteins and eucaryotic DNA- and RNA-binding proteins. In addition, examination of the subcellular distribution of SSB1 by immunofluorescence microscopy indicated that SSB1 is a nuclear protein, predominantly located in the nucleolus. Sequence homologies and the nucleolar localization make it likely that SSB1 functions in RNA metabolism in vivo, although an additional role in DNA metabolism cannot be excluded. Images PMID:2823109
Gross rearrangements within the 5'-untranslated region of the picornaviral genomes.

PubMed

Pilipenko, E V; Blinov, V M; Agol, V I

1990-06-11

An analysis of reported nucleotide sequences revealed several cases of gross rearrangements in the 5'-untranslated region (5-UTR) of picornaviral genomes. A large (greater than 100 nt) duplication was discovered in a downstream region of poliovirus 5-UTR involved in the translational control. Properties of the poliovirus mutants with large deletions [Kuge and Nomoto (1987) J. Virol. 61, 1478-1487] show that a single copy of the appropriate repeating unit is compatible with a wild type phenotype of the virus. In contrast to poliovirus and another enterovirus genomes, human rhinovirus RNAs contain only a single copy of this repeating unit. Another similarly large repeat was found in an upstream segment of the bovine enterovirus 5-UTR. A comparison of the primary and secondary structures of cardio- and aphthovirus 5-UTRs demonstrated the existence of a large (ca. 250 nucleotides) insertion/deletion in a region preceding the poly(C) tract. The two latter rearrangements appear to involve elements of the viral genome replication machinery. Possible origin as well as evolutionary and functional implications of these structural peculiarities are discussed.
Engineering customized TALE nucleases (TALENs) and TALE transcription factors by fast ligation-based automatable solid-phase high-throughput (FLASH) assembly.

PubMed

Reyon, Deepak; Maeder, Morgan L; Khayter, Cyd; Tsai, Shengdar Q; Foley, Jonathan E; Sander, Jeffry D; Joung, J Keith

2013-07-01

Customized DNA-binding domains made using transcription activator-like effector (TALE) repeats are rapidly growing in importance as widely applicable research tools. TALE nucleases (TALENs), composed of an engineered array of TALE repeats fused to the FokI nuclease domain, have been used successfully for directed genome editing in various organisms and cell types. TALE transcription factors (TALE-TFs), consisting of engineered TALE repeat arrays linked to a transcriptional regulatory domain, have been used to up- or downregulate expression of endogenous genes in human cells and plants. This unit describes a detailed protocol for the recently described fast ligation-based automatable solid-phase high-throughput (FLASH) assembly method. FLASH enables automated high-throughput construction of engineered TALE repeats using an automated liquid handling robot or manually using a multichannel pipet. Using the automated approach, a single researcher can construct up to 96 DNA fragments encoding TALE repeat arrays of various lengths in a single day, and then clone these to construct sequence-verified TALEN or TALE-TF expression plasmids in a week or less. Plasmids required for FLASH are available by request from the Joung lab (http://eGenome.org). This unit also describes improvements to the Zinc Finger and TALE Targeter (ZiFiT Targeter) web server (http://ZiFiT.partners.org) that facilitate the design and construction of FLASH TALE repeat arrays in high throughput. © 2013 by John Wiley & Sons, Inc.
DNA hypomethylation of individual sequences in aborted cloned bovine fetuses.

PubMed

Chen, Tao; Jiang, Yan; Zhang, Yan-Ling; Liu, Jing-He; Hou, Yi; Schatten, Heide; Chen, Da-Yuan; Sun, Qing-Yuan

2005-09-01

Cloned bovines have a much higher abortion rate than those derived in vivo. Available evidence indicates that inappropriate epigenetic reprogramming of donor nuclei is the primary cause of cloning failure. To gain a better understanding of the DNA methylation changes associated with the high abortion rate of cloned bovines, we examined the DNA methylation status of a repeated sequence (satellite I) and the promoter regions of two single-copy genes (interleukin 3/cytokeratin) in aborted cloned fetuses, aborted fetuses derived from artificial insemination (AI), cloned adults and AI adults by bisulfite sequencing and restriction enzyme analysis. Two of four aborted cloned fetuses show very low methylation levels in the two single-copy gene promoter regions. One of the two fetuses also showed undermethylated status in the satellite I sequence. The other two aborted cloned fetuses have similar methylation levels to those of aborted AI fetuses. However, no difference in methylation was observed between cloned adults and AI adults. Our results demonstrate for the first time the undermethylated status of individual sequences in aborted cloned fetuses. These findings suggest that aberrant DNA methylation may contribute to the developmental failure of cloned bovine fetuses.
Typing Clostridium difficile strains based on tandem repeat sequences

PubMed Central

2009-01-01

Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124
Primary analysis of repeat elements of the Asian seabass (Lates calcarifer) transcriptome and genome

PubMed Central

Kuznetsova, Inna S.; Thevasagayam, Natascha M.; Sridatta, Prakki S. R.; Komissarov, Aleksey S.; Saju, Jolly M.; Ngoh, Si Y.; Jiang, Junhui; Shen, Xueyan; Orbán, László

2014-01-01

As part of our Asian seabass genome project, we are generating an inventory of repeat elements in the genome and transcriptome. The karyotype showed a diploid number of 2n = 24 chromosomes with a variable number of B-chromosomes. The transcriptome and genome of Asian seabass were searched for repetitive elements with experimental and bioinformatics tools. Six different types of repeats constituting 8–14% of the genome were characterized. Repetitive elements were clustered in the pericentromeric heterochromatin of all chromosomes, but some of them were preferentially accumulated in pretelomeric and pericentromeric regions of several chromosomes pairs and have chromosomes specific arrangement. From the dispersed class of fish-specific non-LTR retrotransposon elements Rex1 and MAUI-like repeats were analyzed. They were wide-spread both in the genome and transcriptome, accumulated on the pericentromeric and peritelomeric areas of all chromosomes. Every analyzed repeat was represented in the Asian seabass transcriptome, some showed differential expression between the gonads. The other group of repeats analyzed belongs to the rRNA multigene family. FISH signal for 5S rDNA was located on a single pair of chromosomes, whereas that for 18S rDNA was found on two pairs. A BAC-derived contig containing rDNA was sequenced and assembled into a scaffold containing incomplete fragments of 18S rDNA. Their assembly and chromosomal position revealed that this part of Asian seabass genome is extremely rich in repeats containing evolutionarily conserved and novel sequences. In summary, transcriptome assemblies and cDNA data are suitable for the identification of repetitive DNA from unknown genomes and for comparative investigation of conserved elements between teleosts and other vertebrates. PMID:25120555
First complete genome sequence of infectious laryngotracheitis virus

PubMed Central

2011-01-01

Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
[Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

PubMed

Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

2009-11-01

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.
Identification of the gene for disaggregatase from Methanosarcina mazei.

PubMed

Osumi, Naoki; Kakehashi, Yoshihiro; Matsumoto, Shiho; Nagaoka, Kazunari; Sakai, Junichi; Miyashita, Kiyotaka; Kimura, Makoto; Asakawa, Susumu

2008-12-01

The gene sequences encoding disaggregatase (Dag), the enzyme responsible for dispersion of cell aggregates of Methanosarcina mazei to single cells, were determined for three strains of M. mazei (S-6(T), LYC and TMA). The dag genes of the three strains were 3234 bp in length and had almost the same sequences with 97% amino acid sequence identities. Dag was predicted to comprise 1077 amino acid residues and to have a molecular mass of 120 kDa containing three repeats of the DNRLRE domain in the C terminus, which is specific to the genus Methanosarcina and may be responsible for structural organization and cell wall function. Recombinant Dag was overexpressed in Escherichia coli and preparations of the expressed protein exhibited enzymatic activity. The RT-PCR analysis showed that dag was transcribed to mRNA in M. mazei LYC and indicated that the gene was expressed in vivo. This is the first time the gene involved in the morphological change of Methanosarcina spp. from aggregate to single cells has been identified.
Microsatellite markers identify three lineages of Phytophthora ramorum in US nurseries, yet single lineages in US forest and European nursery populations.

PubMed

Ivors, K; Garbelotto, M; Vries, I D E; Ruyter-Spira, C; Te Hekkert, B; Rosenzweig, N; Bonants, P

2006-05-01

Analysis of 12 polymorphic simple sequence repeats identified in the genome sequence of Phytophthora ramorum, causal agent of 'sudden oak death', revealed genotypic diversity to be significantly higher in nurseries (91% of total) than in forests (18% of total). Our analysis identified only two closely related genotypes in US forests, while the genetic structure of populations from European nurseries was of intermediate complexity, including multiple, closely related genotypes. Multilocus analysis determined populations in US forests reproduce clonally and are likely descendants of a single introduced individual. The 151 isolates analysed clustered in three clades. US forest and European nursery isolates clustered into two distinct clades, while one isolate from a US nursery belonged to a third novel clade. The combined microsatellite, sequencing and morphological analyses suggest the three clades represent distinct evolutionary lineages. All three clades were identified in some US nurseries, emphasizing the role of commercial plant trade in the movement of this pathogen.
Sustained State-Independent Quantum Contextual Correlations from a Single Ion

NASA Astrophysics Data System (ADS)

Leupold, F. M.; Malinowski, M.; Zhang, C.; Negnevitsky, V.; Alonso, J.; Home, J. P.; Cabello, A.

2018-05-01

We use a single trapped-ion qutrit to demonstrate the quantum-state-independent violation of noncontextuality inequalities using a sequence of randomly chosen quantum nondemolition projective measurements. We concatenate 53 ×106 sequential measurements of 13 observables, and unambiguously violate an optimal noncontextual bound. We use the same data set to characterize imperfections including signaling and repeatability of the measurements. The experimental sequence was generated in real time with a quantum random number generator integrated into our control system to select the subsequent observable with a latency below 50 μ s , which can be used to constrain contextual hidden-variable models that might describe our results. The state-recycling experimental procedure is resilient to noise and independent of the qutrit state, substantiating the fact that the contextual nature of quantum physics is connected to measurements and not necessarily to designated states. The use of extended sequences of quantum nondemolition measurements finds applications in the fields of sensing and quantum information.
Bisulfite-independent analysis of CpG island methylation enables genome-scale stratification of single cells.

PubMed

Han, Lin; Wu, Hua-Jun; Zhu, Haiying; Kim, Kun-Yong; Marjani, Sadie L; Riester, Markus; Euskirchen, Ghia; Zi, Xiaoyuan; Yang, Jennifer; Han, Jasper; Snyder, Michael; Park, In-Hyun; Irizarry, Rafael; Weissman, Sherman M; Michor, Franziska; Fan, Rong; Pan, Xinghua

2017-06-02

Conventional DNA bisulfite sequencing has been extended to single cell level, but the coverage consistency is insufficient for parallel comparison. Here we report a novel method for genome-wide CpG island (CGI) methylation sequencing for single cells (scCGI-seq), combining methylation-sensitive restriction enzyme digestion and multiple displacement amplification for selective detection of methylated CGIs. We applied this method to analyzing single cells from two types of hematopoietic cells, K562 and GM12878 and small populations of fibroblasts and induced pluripotent stem cells. The method detected 21 798 CGIs (76% of all CGIs) per cell, and the number of CGIs consistently detected from all 16 profiled single cells was 20 864 (72.7%), with 12 961 promoters covered. This coverage represents a substantial improvement over results obtained using single cell reduced representation bisulfite sequencing, with a 66-fold increase in the fraction of consistently profiled CGIs across individual cells. Single cells of the same type were more similar to each other than to other types, but also displayed epigenetic heterogeneity. The method was further validated by comparing the CpG methylation pattern, methylation profile of CGIs/promoters and repeat regions and 41 classes of known regulatory markers to the ENCODE data. Although not every minor methylation differences between cells are detectable, scCGI-seq provides a solid tool for unsupervised stratification of a heterogeneous cell population. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Seven New Complete Plastome Sequences Reveal Rampant Independent Loss of the ndh Gene Family across Orchids and Associated Instability of the Inverted Repeat/Small Single-Copy Region Boundaries.

PubMed

Kim, Hyoung Tae; Kim, Jung Sung; Moore, Michael J; Neubig, Kurt M; Williams, Norris H; Whitten, W Mark; Kim, Joo-Hwan

2015-01-01

Earlier research has revealed that the ndh loci have been pseudogenized, truncated, or deleted from most orchid plastomes sequenced to date, including in all available plastomes of the two most species-rich subfamilies, Orchidoideae and Epidendroideae. This study sought to resolve deeper-level phylogenetic relationships among major orchid groups and to refine the history of gene loss in the ndh loci across orchids. The complete plastomes of seven orchids, Oncidium sphacelatum (Epidendroideae), Masdevallia coccinea (Epidendroideae), Sobralia callosa (Epidendroideae), Sobralia aff. bouchei (Epidendroideae), Elleanthus sodiroi (Epidendroideae), Paphiopedilum armeniacum (Cypripedioideae), and Phragmipedium longifolium (Cypripedioideae) were sequenced and analyzed in conjunction with all other available orchid and monocot plastomes. Most ndh loci were found to be pseudogenized or lost in Oncidium, Paphiopedilum and Phragmipedium, but surprisingly, all ndh loci were found to retain full, intact reading frames in Sobralia, Elleanthus and Masdevallia. Character mapping suggests that the ndh genes were present in the common ancestor of orchids but have experienced independent, significant losses at least eight times across four subfamilies. In addition, ndhF gene loss was correlated with shifts in the position of the junction of the inverted repeat (IR) and small single-copy (SSC) regions. The Orchidaceae have unprecedented levels of homoplasy in ndh gene presence/absence, which may be correlated in part with the unusual life history of orchids. These results also suggest that ndhF plays a role in IR/SSC junction stability.
Genome Wide Characterization of Simple Sequence Repeats in Cucumber

USDA-ARS?s Scientific Manuscript database

The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

PubMed

Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

2012-11-20

Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies.
Telomere extension by telomerase and ALT generates variant repeats by mechanistically distinct processes

PubMed Central

Lee, Michael; Hills, Mark; Conomos, Dimitri; Stutz, Michael D.; Dagg, Rebecca A.; Lau, Loretta M.S.; Reddel, Roger R.; Pickett, Hilda A.

2014-01-01

Telomeres are terminal repetitive DNA sequences on chromosomes, and are considered to comprise almost exclusively hexameric TTAGGG repeats. We have evaluated telomere sequence content in human cells using whole-genome sequencing followed by telomere read extraction in a panel of mortal cell strains and immortal cell lines. We identified a wide range of telomere variant repeats in human cells, and found evidence that variant repeats are generated by mechanistically distinct processes during telomerase- and ALT-mediated telomere lengthening. Telomerase-mediated telomere extension resulted in biased repeat synthesis of variant repeats that differed from the canonical sequence at positions 1 and 3, but not at positions 2, 4, 5 or 6. This indicates that telomerase is most likely an error-prone reverse transcriptase that misincorporates nucleotides at specific positions on the telomerase RNA template. In contrast, cell lines that use the ALT pathway contained a large range of variant repeats that varied greatly between lines. This is consistent with variant repeats spreading from proximal telomeric regions throughout telomeres in a stochastic manner by recombination-mediated templating of DNA synthesis. The presence of unexpectedly large numbers of variant repeats in cells utilizing either telomere maintenance mechanism suggests a conserved role for variant sequences at human telomeres. PMID:24225324
Identification of Simple Sequence Repeats in Chloroplast Genomes of Magnoliids Through Bioinformatics Approach.

PubMed

Srivastava, Deepika; Shanker, Asheesh

2016-12-01

Basal angiosperms or Magnoliids is an important clade of commercially important plants which mainly include spices and edible fruits. In this study, 17 chloroplast genome sequences belonging to clade Magnoliids were screened for the identification of chloroplast simple sequence repeats (cpSSRs). Simple sequence repeats or microsatellites are short stretches of DNA up to 1-6 base pair in length. These repeats are ubiquitous and play important role in the development of molecular markers and to study the mapping of traits of economic, medical or ecological interest. A total of 479 SSRs were detected, showing average density of 1 SSR/6.91 kb. Depending on the repeat units, the length of SSRs ranged from 12 to 24 bp for mono-, 12 to 18 bp for di-, 12 to 26 bp for tri-, 12 to 24 bp for tetra-, 15 bp for penta- and 18 bp for hexanucleotide repeats. Mononucleotide repeats were the most frequent (207, 43.21 %) followed by tetranucleotide repeats (130, 27.13 %). Penta- and hexanucleotide repeats were least frequent or absent in these chloroplast genomes.
Sequence of a second gene encoding bovine submaxillary mucin: implication for mucin heterogeneity and cloning.

PubMed

Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P

1998-10-20

Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.
The complete chloroplast genome of North American ginseng, Panax quinquefolius.

PubMed

Han, Zeng-Jie; Li, Wei; Liu, Yuan; Gao, Li-Zhi

2016-09-01

We report complete nucleotide sequence of the Panax quinquefolius chloroplast genome using next-generation sequencing technology. The genome size is 156 359 bp, including two inverted repeats (IRs) of 52 153 bp, separated by the large single-copy (LSC 86 184 bp) and small single-copy (SSC 18 081 bp) regions. This cp genome encodes 114 unigenes (80 protein-coding genes, four rRNA genes, and 30 tRNA genes), in which 18 are duplicated in the IR regions. Overall GC content of the genome is 38.08%. A phylogenomic analysis of the 10 complete chloroplast genomes from Araliaceae using Daucus carota from Apiaceae as outgroup showed that P. quinquefolius is closely related to the other two members of the genus Panax, P. ginseng and P. notoginseng.
Metallo-β-lactamase-producing Pseudomonas aeruginosa in the Netherlands: the nationwide emergence of a single sequence type.

PubMed

Van der Bij, A K; Van der Zwan, D; Peirano, G; Severin, J A; Pitout, J D D; Van Westreenen, M; Goessens, W H F

2012-09-01

Recently, the first outbreak of clonally related VIM-2 metallo-β-lactamase (MBL)-producing Pseudomonas aeruginosa in a Dutch tertiary-care centre was described. Subsequently, a nationwide surveillance study was performed in 2010-2011, which identified the presence of VIM-2 MBL-producing P. aeruginosa in 11 different hospitals. Genotyping by multiple-locus variable-number tandem-repeat analysis (MLVA) showed that the majority of the 82 MBL-producing isolates found belonged to a single MLVA type (n = 70, 85%), identified as ST111 by multilocus sequence typing (MLST). As MBL-producing isolates cause serious infections that are difficult to treat, the presence of clonally related isolates in various hospitals throughout the Netherlands is of nationwide concern. © 2012 The Authors. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.

The complete chloroplast genome sequence of Chikusichloa aquatica (Poaceae: Oryzeae).

PubMed

Zhang, Jie; Zhang, Dan; Shi, Chao; Gao, Ju; Gao, Li-Zhi

2016-07-01

The complete chloroplast sequence of the Chikusichloa aquatica was determined in this study. The genome consists of 136 563 bp containing a pair of inverted repeats (IRs) of 20 837 bp, which was separated by a large single-copy region and a small single-copy region of 82 315 bp and 33 411 bp, respectively. The C. aquatica cp genome encodes 111 functional genes (71 protein-coding genes, four rRNA genes, and 36 tRNA genes): 92 are unique, while 19 are duplicated in the IR regions. The genic regions account for 58.9% of whole cp genome, and the GC content of the plastome is 39.0%. A phylogenomic analysis showed that C. aquatica is closely related to Rhynchoryza subulata that belongs to the tribe Oryzeae.
The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

PubMed Central

Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael

2015-01-01

With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710
DNA-based differentiation of the Ecuadorian cocoa types CCN-51 and Arriba based on sequence differences in the chloroplast genome.

PubMed

Herrmann, Luise; Haase, Ilka; Blauhut, Maike; Barz, Nadine; Fischer, Markus

2014-12-17

Two cocoa types, Arriba and CCN-51, are being cultivated in Ecuador. With regard to the unique aroma, Arriba is considered a fine cocoa type, while CCN-51 is a bulk cocoa because of its weaker aroma. Because it is being assumed that Arriba is mixed with CCN-51, there is an interest in the analytical differentiation of the two types. Two methods to identify CCN-51 adulterations in Arriba cocoa were developed on the basis of differences in the chloroplast DNA. On the one hand, a different repeat of the sequence TAAAG in the inverted repeat region results in a different length of amplicons for the two cocoa types, which can be detected by agarose gel electrophoresis, capillary gel electrophoresis, and denaturing high-performance liquid chromatography. On the other hand, single nucleotide polymorphisms (SNPs) between the CCN-51 and Arriba sequences represent restriction sites, which can be used for restriction fragment length polymorphism analysis. A semi-quantitative analysis based on these SNPs is feasible. A method for an exact quantitation based on these results is not realizable. These sequence variations were confirmed for a comprehensive cultivar collection of Arriba and CCN-51, for both bean and leaf samples.
Complete chloroplast genome sequences of Praxelis (Eupatorium catarium Veldkamp), an important invasive species.

PubMed

Zhang, Ying; Li, Lei; Yan, Ting Liang; Liu, Qiang

2014-10-01

Praxelis (Eupatorium catarium Veldkamp) is a new hazardous invasive plant species that has caused serious economic losses and environmental damage in the Northern hemisphere tropical and subtropical regions. Although previous studies focused on detecting the biological characteristics of this plant to prevent its expansion, little effort has been made to understand the impact of Praxelis on the ecosystem in an evolutionary process. The genetic information of Praxelis is required for further phylogenetic identification and evolutionary studies. Here, we report the complete Praxelis chloroplast (cp) genome sequence. The Praxelis chloroplast genome is 151,410 bp in length including a small single-copy region (18,547 bp) and a large single-copy region (85,311 bp) separated by a pair of inverted repeats (IRs; 23,776 bp). The genome contains 85 unique and 18 duplicated genes in the IR region. The gene content and organization are similar to other Asteraceae tribe cp genomes. We also analyzed the whole cp genome sequence, repeat structure, codon usage, contraction of the IR and gene structure/organization features between native and invasive Asteraceae plants, in order to understand the evolution of organelle genomes between native and invasive Asteraceae. Comparative analysis identified the 14 markers containing greater than 2% parsimony-informative characters, indicating that they are potential informative markers for barcoding and phylogenetic analysis. Moreover, a sister relationship between Praxelis and seven other species in Asteraceae was found based on phylogenetic analysis of 28 protein-coding sequences. Complete cp genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family. Copyright © 2014 Elsevier B.V. All rights reserved.
Construction and sequence sampling of deep-coverage, large-insert BAC libraries for three model lepidopteran species

PubMed Central

Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R

2009-01-01

Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes. PMID:19558662
Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes.

PubMed

Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E

2008-01-15

To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.
Molecular characterization and distribution of a 145-bp tandem repeat family in the genus Populus.

PubMed

Rajagopal, J; Das, S; Khurana, D K; Srivastava, P S; Lakshmikumaran, M

1999-10-01

This report aims to describe the identification and molecular characterization of a 145-bp tandem repeat family that accounts for nearly 1.5% of the Populus genome. Three members of this repeat family were cloned and sequenced from Populus deltoides and P. ciliata. The dimers of the repeat were sequenced in order to confirm the head-to-tail organization of the repeat. Hybridization-based analysis using the 145-bp tandem repeat as a probe on genomic DNA gave rise to ladder patterns which were identified to be a result of methylation and (or) sequence heterogeneity. Analysis of the methylation pattern of the repeat family using methylation-sensitive isoschizomers revealed variable methylation of the C residues and lack of methylation of the A residues. Sequence comparisons between the monomers revealed a high degree of sequence divergence that ranged between 6% and 11% in P. deltoides and between 4.2% and 8.3% in P. ciliata. This indicated the presence of sub-families within the 145-bp tandem family of repeats. Divergence was mainly due to the accumulation of point mutations and was concentrated in the central region of the repeat. The 145-bp tandem repeat family did not show significant homology to known tandem repeats from plants. A short stretch of 36 bp was found to show homology of 66.7% to a centromeric repeat from Chironomus plumosus. Dot-blot analysis and Southern hybridization data revealed the presence of the repeat family in 13 of the 14 Populus species examined. The absence of the 145-bp repeat from P. euphratica suggested that this species is relatively distant from other members of the genus, which correlates with taxonomic classifications. The widespread occurrence of the tandem family in the genus indicated that this family may be of ancient origin.
Small tandemly repeated DNA sequences of higher plants likely originate from a tRNA gene ancestor.

PubMed Central

Benslimane, A A; Dron, M; Hartmann, C; Rode, A

1986-01-01

Several monomers (177 bp) of a tandemly arranged repetitive nuclear DNA sequence of Brassica oleracea have been cloned and sequenced. They share up to 95% homology between one another and up to 80% with other satellite DNA sequences of Cruciferae, suggesting a common ancestor. Both strands of these monomers show more than 50% homology with many tRNA genes; the best homologies have been obtained with Lys and His yeast mitochondrial tRNA genes (respectively 64% and 60%). These results suggest that small tandemly repeated DNA sequences of plants may have evolved from a tRNA gene ancestor. These tandem repeats have probably arisen via a process involving reverse transcription of polymerase III RNA intermediates, as is the case for interspersed DNA sequences of mammalians. A model is proposed to explain the formation of such small tandemly repeated DNA sequences. Images PMID:3774553
Two tandemly repeated telomere-associated sequences in Nicotiana plumbaginifolia.

PubMed

Chen, C M; Wang, C T; Wang, C J; Ho, C H; Kao, Y Y; Chen, C C

1997-12-01

Two tandemly repeated telomere-associated sequences, NP3R and NP4R, have been isolated from Nicotiana plumbaginifolia. The length of a repeating unit for NP3R and NP4R is 165 and 180 nucleotides respectively. The abundance of NP3R, NP4R and telomeric repeats is, respectively, 8.4 x 10(4), 6 x 10(3) and 1.5 x 10(6) copies per haploid genome of N. plumbaginifolia. Fluorescence in situ hybridization revealed that NP3R is located at the ends and/or in interstitial regions of all 10 chromosomes and NP4R on the terminal regions of three chromosomes in the haploid genome of N. plumbaginifolia. Sequence homology search revealed that not only are NP3R and NP4R homologous to HRS60 and GRS, respectively, two tandem repeats isolated from N. tabacum, but that NP3R and NP4R are also related to each other, suggesting that they originated from a common ancestral sequence. The role of these repeated sequences in chromosome healing is discussed based on the observation that two to three copies of a telomere-similar sequence were present in each repeating unit of NP3R and NP4R.
Development of Pineapple Microsatellite Markers and Germplasm Genetic Diversity Analysis

PubMed Central

Tong, Helin; Chen, You; Wang, Jingyi; Chen, Yeyuan; Sun, Guangming; He, Junhu; Wu, Yaoting

2013-01-01

Two methods were used to develop pineapple microsatellite markers. Genomic library-based SSR development: using selectively amplified microsatellite assay, 86 sequences were generated from pineapple genomic library. 91 (96.8%) of the 94 Simple Sequence Repeat (SSR) loci were dinucleotide repeats (39 AC/GT repeats and 52 GA/TC repeats, accounting for 42.9% and 57.1%, resp.), and the other three were mononucleotide repeats. Thirty-six pairs of SSR primers were designed; 24 of them generated clear bands of expected sizes, and 13 of them showed polymorphism. EST-based SSR development: 5659 pineapple EST sequences obtained from NCBI were analyzed; among 1397 nonredundant EST sequences, 843 were found containing 1110 SSR loci (217 of them contained more than one SSR locus). Frequency of SSRs in pineapple EST sequences is 1SSR/3.73 kb, and 44 types were found. Mononucleotide, dinucleotide, and trinucleotide repeats dominate, accounting for 95.6% in total. AG/CT and AGC/GCT were the dominant type of dinucleotide and trinucleotide repeats, accounting for 83.5% and 24.1%, respectively. Thirty pairs of primers were designed for each of randomly selected 30 sequences; 26 of them generated clear and reproducible bands, and 22 of them showed polymorphism. Eighteen pairs of primers obtained by the one or the other of the two methods above that showed polymorphism were selected to carry out germplasm genetic diversity analysis for 48 breeds of pineapple; similarity coefficients of these breeds were between 0.59 and 1.00, and they can be divided into four groups accordingly. Amplification products of five SSR markers were extracted and sequenced, corresponding repeat loci were found and locus mutations are mainly in copy number of repeats and base mutations in the flanking region. PMID:24024187
Complete genome sequence of Menghai rhabdovirus, a novel mosquito-borne rhabdovirus from China.

PubMed

Sun, Qiang; Zhao, Qiumin; An, Xiaoping; Guo, Xiaofang; Zuo, Shuqing; Zhang, Xianglilan; Pei, Guangqian; Liu, Wenli; Cheng, Shi; Wang, Yunfei; Shu, Peng; Mi, Zhiqiang; Huang, Yong; Zhang, Zhiyi; Tong, Yigang; Zhou, Hongning; Zhang, Jiusong

2017-04-01

Menghai rhabdovirus (MRV) was isolated from Aedes albopictus in Menghai county of Yunnan Province, China, in August 2010. Whole-genome sequencing of MRV was performed using an Ion PGM™ Sequencer. We found that MRV is a single-stranded, negative-sense RNA virus. The complete genome of MRV has 10,744 nt, with short inverted repeat termini, encoding five typical rhabdovirus proteins (N, P, M, G, and L) and an additional small hypothetical protein. Nucleotide BLAST analysis using the BLASTn method showed that the genome sequence most similar to that of MRV is that of Arboretum virus (NC_025393.1), with a Max score of 322, query coverage of 14%, and 66% identity. Genomic and phylogenetic analyses both demonstrated that MRV should be considered a member of a novel species of the family Rhabdoviridae.
Auditory attentional capture during serial recall: violations at encoding of an algorithm-based neural model?

PubMed

Hughes, Robert W; Vachon, François; Jones, Dylan M

2005-07-01

A novel attentional capture effect is reported in which visual-verbal serial recall was disrupted if a single deviation in the interstimulus interval occurred within otherwise regularly presented task-irrelevant spoken items. The degree of disruption was the same whether the temporal deviant was embedded in a sequence made up of a repeating item or a sequence of changing items. Moreover, the effect was evident during the presentation of the to-be-remembered sequence but not during rehearsal just prior to recall, suggesting that the encoding of sequences is particularly susceptible. The results suggest that attentional capture is due to a violation of an algorithm rather than an aggregate-based neural model and further undermine an attentional capture-based account of the classical changing-state irrelevant sound effect. ((c) 2005 APA, all rights reserved).
De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

PubMed Central

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689
Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

PubMed Central

Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

1982-01-01

The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262
Analytical and Clinical Validation of a Digital Sequencing Panel for Quantitative, Highly Accurate Evaluation of Cell-Free Circulating Tumor DNA

PubMed Central

Zill, Oliver A.; Sebisanovic, Dragan; Lopez, Rene; Blau, Sibel; Collisson, Eric A.; Divers, Stephen G.; Hoon, Dave S. B.; Kopetz, E. Scott; Lee, Jeeyun; Nikolinakos, Petros G.; Baca, Arthur M.; Kermani, Bahram G.; Eltoukhy, Helmy; Talasaz, AmirAli

2015-01-01

Next-generation sequencing of cell-free circulating solid tumor DNA addresses two challenges in contemporary cancer care. First this method of massively parallel and deep sequencing enables assessment of a comprehensive panel of genomic targets from a single sample, and second, it obviates the need for repeat invasive tissue biopsies. Digital SequencingTM is a novel method for high-quality sequencing of circulating tumor DNA simultaneously across a comprehensive panel of over 50 cancer-related genes with a simple blood test. Here we report the analytic and clinical validation of the gene panel. Analytic sensitivity down to 0.1% mutant allele fraction is demonstrated via serial dilution studies of known samples. Near-perfect analytic specificity (> 99.9999%) enables complete coverage of many genes without the false positives typically seen with traditional sequencing assays at mutant allele frequencies or fractions below 5%. We compared digital sequencing of plasma-derived cell-free DNA to tissue-based sequencing on 165 consecutive matched samples from five outside centers in patients with stage III-IV solid tumor cancers. Clinical sensitivity of plasma-derived NGS was 85.0%, comparable to 80.7% sensitivity for tissue. The assay success rate on 1,000 consecutive samples in clinical practice was 99.8%. Digital sequencing of plasma-derived DNA is indicated in advanced cancer patients to prevent repeated invasive biopsies when the initial biopsy is inadequate, unobtainable for genomic testing, or uninformative, or when the patient’s cancer has progressed despite treatment. Its clinical utility is derived from reduction in the costs, complications and delays associated with invasive tissue biopsies for genomic testing. PMID:26474073
De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

PubMed

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.
Comparative Analysis of the Complete Plastomes of Apostasia wallichii and Neuwiedia singapureana (Apostasioideae) Reveals Different Evolutionary Dynamics of IR/SSC Boundary among Photosynthetic Orchids.

PubMed

Niu, Zhitao; Pan, Jiajia; Zhu, Shuying; Li, Ludan; Xue, Qingyun; Liu, Wei; Ding, Xiaoyu

2017-01-01

Apostasioideae, consists of only two genera, Apostasia and Neuwiedia , which are mainly distributed in Southeast Asia and northern Australia. The floral structure, taxonomy, biogeography, and genome variation of Apostasioideae have been intensively studied. However, detailed analyses of plastome composition and structure and comparisons with those of other orchid subfamilies have not yet been conducted. Here, the complete plastome sequences of Apostasia wallichii and Neuwiedia singapureana were sequenced and compared with 43 previously published photosynthetic orchid plastomes to characterize the plastome structure and evolution in the orchids. Unlike many orchid plastomes (e.g., Paphiopedilum and Vanilla ), the plastomes of Apostasioideae contain a full set of 11 functional NADH dehydrogenase ( ndh ) genes. The distribution of repeat sequences and simple sequence repeat elements enhanced the view that the mutation rate of non-coding regions was higher than that of coding regions. The 10 loci- ndhA intron, matK-5'trnK , clpP-psbB , rps8-rpl14 , trnT-trnL , 3'trnK-matK , clpP intron , psbK-trnK , trnS-psbC , and ndhF-rpl32 -that had the highest degrees of sequence variability were identified as mutational hotspots for the Apostasia plastome. Furthermore, our results revealed that plastid genes exhibited a variable evolution rate within and among different orchid genus. Considering the diversified evolution of both coding and non-coding regions, we suggested that the plastome-wide evolution of orchid species was disproportional. Additionally, the sequences flanking the inverted repeat/small single copy (IR/SSC) junctions of photosynthetic orchid plastomes were categorized into three types according to the presence/absence of ndh genes. Different evolutionary dynamics for each of the three IR/SSC types of photosynthetic orchid plastomes were also proposed.
Comparative Analysis of the Complete Plastomes of Apostasia wallichii and Neuwiedia singapureana (Apostasioideae) Reveals Different Evolutionary Dynamics of IR/SSC Boundary among Photosynthetic Orchids

PubMed Central

Niu, Zhitao; Pan, Jiajia; Zhu, Shuying; Li, Ludan; Xue, Qingyun; Liu, Wei; Ding, Xiaoyu

2017-01-01

Apostasioideae, consists of only two genera, Apostasia and Neuwiedia, which are mainly distributed in Southeast Asia and northern Australia. The floral structure, taxonomy, biogeography, and genome variation of Apostasioideae have been intensively studied. However, detailed analyses of plastome composition and structure and comparisons with those of other orchid subfamilies have not yet been conducted. Here, the complete plastome sequences of Apostasia wallichii and Neuwiedia singapureana were sequenced and compared with 43 previously published photosynthetic orchid plastomes to characterize the plastome structure and evolution in the orchids. Unlike many orchid plastomes (e.g., Paphiopedilum and Vanilla), the plastomes of Apostasioideae contain a full set of 11 functional NADH dehydrogenase (ndh) genes. The distribution of repeat sequences and simple sequence repeat elements enhanced the view that the mutation rate of non-coding regions was higher than that of coding regions. The 10 loci—ndhA intron, matK-5′trnK, clpP-psbB, rps8-rpl14, trnT-trnL, 3′trnK-matK, clpP intron, psbK-trnK, trnS-psbC, and ndhF-rpl32—that had the highest degrees of sequence variability were identified as mutational hotspots for the Apostasia plastome. Furthermore, our results revealed that plastid genes exhibited a variable evolution rate within and among different orchid genus. Considering the diversified evolution of both coding and non-coding regions, we suggested that the plastome-wide evolution of orchid species was disproportional. Additionally, the sequences flanking the inverted repeat/small single copy (IR/SSC) junctions of photosynthetic orchid plastomes were categorized into three types according to the presence/absence of ndh genes. Different evolutionary dynamics for each of the three IR/SSC types of photosynthetic orchid plastomes were also proposed. PMID:29046685
Small interfering RNA-producing loci in the ancient parasitic eukaryote Trypanosoma brucei

PubMed Central

2012-01-01

Background At the core of the RNA interference (RNAi) pathway in Trypanosoma brucei is a single Argonaute protein, TbAGO1, with an established role in controlling retroposon and repeat transcripts. Recent evidence from higher eukaryotes suggests that a variety of genomic sequences with the potential to produce double-stranded RNA are sources for small interfering RNAs (siRNAs). Results To test whether such endogenous siRNAs are present in T. brucei and to probe the individual role of the two Dicer-like enzymes, we affinity purified TbAGO1 from wild-type procyclic trypanosomes, as well as from cells deficient in the cytoplasmic (TbDCL1) or nuclear (TbDCL2) Dicer, and subjected the bound RNAs to Illumina high-throughput sequencing. In wild-type cells the majority of reads originated from two classes of retroposons. We also considerably expanded the repertoire of trypanosome siRNAs to encompass a family of 147-bp satellite-like repeats, many of the regions where RNA polymerase II transcription converges, large inverted repeats and two pseudogenes. Production of these newly described siRNAs is strictly dependent on the nuclear DCL2. Notably, our data indicate that putative centromeric regions, excluding the CIR147 repeats, are not a significant source for endogenous siRNAs. Conclusions Our data suggest that endogenous RNAi targets may be as evolutionarily old as the mechanism itself. PMID:22925482
Genetic Diversity of Blumeria graminis f. sp. hordei in Central Europe and Its Comparison with Australian Population

PubMed Central

Komínková, Eva; Dreiseitl, Antonín; Malečková, Eva; Doležel, Jaroslav

2016-01-01

Population surveys of Blumeria graminis f. sp. hordei (Bgh), a causal agent of more than 50% of barley fungal infections in the Czech Republic, have been traditionally based on virulence tests, at times supplemented with non-specific Restriction fragment length polymorphism or Random amplified polymorphic DNA markers. A genomic sequence of Bgh, which has become available recently, enables identification of potential markers suitable for population genetics studies. Two major strategies relying on transposable elements and microsatellites were employed in this work to develop a set of Repeat junction markers, Single sequence repeat and Single nucleotide polymorphism markers. A resolution power of the new panel of markers comprising 33 polymorphisms was demonstrated by a phylogenetic analysis of 158 Bgh isolates. A core set of 97 Czech isolates was compared to a set 50 Australian isolates on the background of 11 diverse isolates collected throughout the world. 73.2% of Czech isolates were found to be genetically unique. An extreme diversity of this collection was in strong contrast with the uniformity of the Australian one. This work paves the way for studies of population structure and dynamics based on genetic variability among different Bgh isolates originating from geographically limited regions. PMID:27875588

Alu Sb2 subfamily is present in all higher primates but was most succesfully amplified in humans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Richer, C.; Zietkiewicz, E.; Labuda, D.

Alu repeats can be classified into subfamilies which amplified in primate genomes at different evolutionary time periods. A young Alu subfamily, Sb2, with a characteristic 7-nucleotide duplication at position 256, has been described in seven human loci. An Sb2 insertion found near the HD gene was unique to two HD families, indicating that Sb2 was still retropositionally active. Here, we have shown that the Sb2 insertion in the CHOL locus was similarly rare, being absent in 120 individuals of Caucasian, Oriental and Black origin. In contrast, Sb2 inserts in five other loci were found fixed (non-polymorphic), based on measurements inmore » the same population sample, but absent from orthologous positions in higher apes. This suggest that Sb2 repeats spread relatively early in the human lineage following divergence from other primates and that these elements may be human-specific. By quantitative PCR, we investigated the presence of Sb2 sequences in different primate DNA, using one PCR primer anchored at the 5{prime} Alu-end and the other complementary to the duplicated Sb2-specific segment. With an Sb2-containing plasmid as a standard, we estimated the number of Sb2 repeats at 1500-1800 copies per human haploid equivalent; corresponding numbers in chimpanzee and gorilla were almost two orders of magnitude lower, while the signal observed in orangutan and gibbon DNAs was consistent with the presence of a single copy. The analysis of 22 human, 11 chimpanzee and 10 gorilla sequences indicates that the Alu Sb2 dispersed independently in these three primate lineages; gorilla consensus differs from the human Sb2 sequence by one position, while all chimpanzee repeats have their linker expanded by up to eight A-residues. Should they be thus considered as separate subfamilies? It is possible that sequence modifications with respect to the human consensus are responsible for poor retroposition of Sb2 in apes.« less
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

PubMed

Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

2011-01-01

Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Enactment versus Observation: Item-Specific and Relational Processing in Goal-Directed Action Sequences (and Lists of Single Actions)

PubMed Central

Schult, Janette; von Stülpnagel, Rul; Steffens, Melanie C.

2014-01-01

What are the memory-related consequences of learning actions (such as “apply the patch”) by enactment during study, as compared to action observation? Theories converge in postulating that enactment encoding increases item-specific processing, but not the processing of relational information. Typically, in the laboratory enactment encoding is studied for lists of unrelated single actions in which one action execution has no overarching purpose or relation with other actions. In contrast, real-life actions are usually carried out with the intention to achieve such a purpose. When actions are embedded in action sequences, relational information provides efficient retrieval cues. We contrasted memory for single actions with memory for action sequences in three experiments. We found more reliance on relational processing for action-sequences than single actions. To what degree can this relational information be used after enactment versus after the observation of an actor? We found indicators of superior relational processing after observation than enactment in ordered pair recall (Experiment 1A) and in emerging subjective organization of repeated recall protocols (recall runs 2–3, Experiment 2). An indicator of superior item-specific processing after enactment compared to observation was recognition (Experiment 1B, Experiment 2). Similar net recall suggests that observation can be as good a learning strategy as enactment. We discuss possible reasons why these findings only partly converge with previous research and theorizing. PMID:24927279
Mutation of Hip’s Carboxy-Terminal Region Inhibits a Transitional Stage of Progesterone Receptor Assembly

PubMed Central

Prapapanich, Viravan; Chen, Shiying; Smith, David F.

1998-01-01

Steroid receptor complexes are assembled through an ordered, multistep pathway involving multiple components of the cytoplasmic chaperone machinery. Two of these components are Hsp70-binding proteins, Hip and Hop, that have some limited homology in their C-terminal regions, outside the sequences mapped for Hsp70 binding. Within this region of Hip is a DPEV sequence that occurs twice; in Hop, one DPEV sequence plus a partial second sequence occurs. In an effort to better understand Hip function as it relates to assembly of progesterone receptor complexes, the DPEV region of Hip was targeted for mutations. Each DPEV sequence was mutated to an APAV sequence, singly or in combination. The combined mutation, APAV2, was further combined with a deletion of Hip’s tetratricopeptide repeat region that is required for Hsp70 binding or with a deletion of Hip’s GGMP repeat. An additional mutant was prepared by truncation of Hip’s DPEV-containing C terminus. By comparing interactions of various Hip forms with Hsp70, it was determined that mutation of the DPEV sequences created a dominant inhibitory form of Hip. The mutant Hip-Hsp70 complex was not prevented from interacting with progesterone receptor, but the mutant caused a dose-dependent inhibition of receptor assembly with Hsp90. The behavior of the Hip mutant is consistent with a model in which Hip and Hop are required to facilitate the transition from an early receptor complex with Hsp70 into later complexes containing Hsp90. PMID:9447991
Survey and Analysis of Microsatellites in the Silkworm, Bombyx mori

PubMed Central

Prasad, M. Dharma; Muthulakshmi, M.; Madhu, M.; Archak, Sunil; Mita, K.; Nagaraju, J.

2005-01-01

We studied microsatellite frequency and distribution in 21.76-Mb random genomic sequences, 0.67-Mb BAC sequences from the Z chromosome, and 6.3-Mb EST sequences of Bombyx mori. We mined microsatellites of ≥15 bases of mononucleotide repeats and ≥5 repeat units of other classes of repeats. We estimated that microsatellites account for 0.31% of the genome of B. mori. Microsatellite tracts of A, AT, and ATT were the most abundant whereas their number drastically decreased as the length of the repeat motif increased. In general, tri- and hexanucleotide repeats were overrepresented in the transcribed sequences except TAA, GTA, and TGA, which were in excess in genomic sequences. The Z chromosome sequences contained shorter repeat types than the rest of the chromosomes in addition to a higher abundance of AT-rich repeats. Our results showed that base composition of the flanking sequence has an influence on the origin and evolution of microsatellites. Transitions/transversions were high in microsatellites of ESTs, whereas the genomic sequence had an equal number of substitutions and indels. The average heterozygosity value for 23 polymorphic microsatellite loci surveyed in 13 diverse silkmoth strains having 2–14 alleles was 0.54. Only 36 (18.2%) of 198 microsatellite loci were polymorphic between the two divergent silkworm populations and 10 (5%) loci revealed null alleles. The microsatellite map generated using these polymorphic markers resulted in 8 linkage groups. B. mori microsatellite loci were the most conserved in its immediate ancestor, B. mandarina, followed by the wild saturniid silkmoth, Antheraea assama. PMID:15371363
Isolation and characterization of two overlapping cosmid clones from the 4q35 region, near the facioscapulohumeral muscular dystrophy locus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deidda, G.; Grisanti, P.; Vigneti, E.

1994-09-01

The gene for facioscapulohumeral muscular dystrophy (FSHD) has been localized by linkage analysis to the 4q35 region. The most telomeric p13E-11 prove has been shown to detect 4q35 DNA rearrangements in both sporadic and familial cases of the disease. With the aim of constructing a detailed physical map of the 4q35 region and searching for the mutant gene, we used p13E-11 probe to isolate cosmid clones from a human genomic library in a pCos-EMBL 2 vector. Two positive clones were isolated, clones 3 and 5, which partially overlap and carry human genomic inserts of 42 and 45 kb, respectively. Themore » cosmids share a common region containing the p13E-11 region and a stretch of KpnI units consisting of 3.2 kb tandemly repeated sequences (about 10). The restriction maps were constructed using the following enzymes: Bam HI, BgIII, Eco RI, EcoRV, KpnI and Sfi I. Clone 3 extends 4 kb upstream of C5 and stops within the Kpn repeats. Clone 5 extends 4 kb downstream from the Kpn repeats and it presents an additional EcoRI site. Clone 5 contains a stretch of Kpn sequences of nearly 32 kb, corresponding to 10 Kpn repeats; clone 3 contains a stretch of 29 kb corresponding to 9 Kpn repeats, as determined by PFGE analysis of partial digestion of the clones. Clone 5 seems to contain the entire Eco RI region prone to rearrangements in FSHD patients. From clone 5 several subclones were obtained, from the Kpn region and from the region spanning from the last Kpn repeat to the cloning site. No single copy sequences were detected. Subclones from the 3{prime} end region contain beta-satellite or Sau3A-like sequences. In situ hybridization with the whole C5 cosmid shows hybridization signals at the tip of chromosome 4 (4q35) and chromosome 10 (10q26), in the pericentromeric region of chromosome 1 (1q12) and in the p12 region of the acrocentric chromosomes (chr. 21, 22, 13, 14, 15).« less
Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

NASA Astrophysics Data System (ADS)

Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

2015-12-01

Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.
Structures of Human Pumilio with Noncognate RNAs Reveal Molecular Mechanisms for Binding Promiscuity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gupta,Y.; Nair, D.; Wharton, R.

2008-01-01

Pumilio is a founder member of the evolutionarily conserved Puf family of RNA-binding proteins that control a number of physiological processes in eukaryotes. A structure of human Pumilio (hPum) Puf domain bound to a Drosophila regulatory sequence showed that each Puf repeat recognizes a single nucleotide. Puf domains in general bind promiscuously to a large set of degenerate sequences, but the structural basis for this promiscuity has been unclear. Here, we describe the structures of hPum Puf domain complexed to two noncognate RNAs, CycBreverse and Puf5. In each complex, one of the nucleotides is ejected from the binding surface, inmore » effect, acting as a 'spacer.' The complexes also reveal the plasticity of several Puf repeats, which recognize noncanonical nucleotides. Together, these complexes provide a molecular basis for recognition of degenerate binding sites, which significantly increases the number of mRNAs targeted for regulation by Puf proteins in vivo.« less
Dog leukocyte antigen class II-associated genetic risk testing for immune disorders of dogs: simplified approaches using Pug dog necrotizing meningoencephalitis as a model.

PubMed

Pedersen, Niels; Liu, Hongwei; Millon, Lee; Greer, Kimberly

2011-01-01

A significantly increased risk for a number of autoimmune and infectious diseases in purebred and mixed-breed dogs has been associated with certain alleles or allele combinations of the dog leukocyte antigen (DLA) class II complex containing the DRB1, DQA1, and DQB1 genes. The exact level of risk depends on the specific disease, the alleles in question, and whether alleles exist in a homozygous or heterozygous state. The gold standard for identifying high-risk alleles and their zygosity has involved direct sequencing of the exon 2 regions of each of the 3 genes. However, sequencing and identification of specific alleles at each of the 3 loci are relatively expensive and sequencing techniques are not ideal for additional parentage or identity determination. However, it is often possible to get the same information from sequencing only 1 gene given the small number of possible alleles at each locus in purebred dogs, extensive homozygosity, and tendency for disease-causing alleles at each of the 3 loci to be strongly linked to each other into haplotypes. Therefore, genetic testing in purebred dogs with immune diseases can be often simplified by sequencing alleles at 1 rather than 3 loci. Further simplification of genetic tests for canine immune diseases can be achieved by the use of alternative genetic markers in the DLA class II region that are also strongly linked with the disease genotype. These markers consist of either simple tandem repeats or single nucleotide polymorphisms that are also in strong linkage with specific DLA class II genotypes and/or haplotypes. The current study uses necrotizing meningoencephalitis of Pug dogs as a paradigm to assess simple alternative genetic tests for disease risk. It was possible to attain identical necrotizing meningoencephalitis risk assessments to 3-locus DLA class II sequencing by sequencing only the DQB1 gene, using 3 DLA class II-linked simple tandem repeat markers, or with a small single nucleotide polymorphism array designed to identify breed-specific DQB1 alleles.
The open reading frames in the 3' long terminal repeats of several mouse mammary tumor virus integrants encode V beta 3-specific superantigens

PubMed Central

1992-01-01

Mice expressing the minor lymphocyte stimulation antigens, Mls-1a, -2a, or -3a, singly on the B10.BR background have been generated. Mls phenotypes correlate with the integration of mouse mammary tumor viruses (MTV) in the mouse genome. The open reading frames within the 3' long terminal repeats of the integrated MTVs 1, 3, 6, and 13 encode V beta 3-specific superantigens. Sequence data for these viral superantigens is presented, indicating that it is the COOH-terminal portion of the viral superantigen that interacts with the T cell receptor V beta element. PMID:1309854
Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae).

PubMed

Vatanparast, Mohammad; Shetty, Prateek; Chopra, Ratan; Doyle, Jeff J; Sathyanarayana, N; Egan, Ashley N

2016-06-30

Winged bean, Psophocarpus tetragonolobus (L.) DC., is similar to soybean in yield and nutritional value but more viable in tropical conditions. Here, we strengthen genetic resources for this orphan crop by producing a de novo transcriptome assembly and annotation of two Sri Lankan accessions (denoted herein as CPP34 [PI 491423] and CPP37 [PI 639033]), developing simple sequence repeat (SSR) markers, and identifying single nucleotide polymorphisms (SNPs) between geographically separated genotypes. A combined assembly based on 804,757 reads from two accessions produced 16,115 contigs with an N50 of 889 bp, over 90% of which has significant sequence similarity to other legumes. Combining contigs with singletons produced 97,241 transcripts. We identified 12,956 SSRs, including 2,594 repeats for which primers were designed and 5,190 high-confidence SNPs between Sri Lankan and Nigerian genotypes. The transcriptomic data sets generated here provide new resources for gene discovery and marker development in this orphan crop, and will be vital for future plant breeding efforts. We also analyzed the soybean trypsin inhibitor (STI) gene family, important plant defense genes, in the context of related legumes and found evidence for radiation of the Kunitz trypsin inhibitor (KTI) gene family within winged bean.
TCW: Transcriptome Computational Workbench

PubMed Central

Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.

2013-01-01

Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959
TCW: transcriptome computational workbench.

PubMed

Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

2013-01-01

The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.
Characterization of (CA)n microsatellite repeats from large-insert clones.

PubMed

Litt, M; Browne, D

2001-05-01

The most laborious part of developing (CA)n microsatellite repeats as genetic markers is constructing DNA clones to permit determination of sequences flanking the microsatellites. When cosmids or large-insert phage clones are used as primary sources of (CA)n repeat markers, they have traditionally been subcloned into plasmid vectors such as pUC18 or M13 mp 18/19 cloning vectors to obtain fragments of suitable size for DNA sequencing. This unit presents an alternative approach whereby a set of degenerate sequencing primers that anneal directly to (CA)n microsatellites can be used to determine sequences that are inaccessible with vector-derived primers. Because the primers anneal to the repeat and not to the vector, they can be used with subclones containing inserts of several kilobases and should, in theory, always give sequence in the regions directly flanking the repeat. Degeneracy at the 3 end of each of these primers prevents elongation of primers that have annealed out-of-register. The most laborious part of developing (CA)n microsatellite repeats as genetic markers is constructing DNA clones to permit.
Instability of plasmid DNA sequences: macro and micro evolution of the antibiotic resistance plasmid R6-5.

PubMed

Timmis, K N; Cabello, F; Andrés, I; Nordheim, A; Burkhardt, H J; Cohen, S N

1978-11-16

Detailed examination of the structure of cloned DNA fragments of the R6-5 antibiotic resistance plasmid has revealed a substantial degree of polynucleotide sequence heterogeneity and indicates that sequence rearrangements in plasmids and possible other replicons occur more frequently than has hitherto been appreciated. The sequences changes in cloned R6-5 fragments were shown in some instances to have occurred prior to cloning, i.e. existing in the original population of R6-5 molecules that was obtained from a single bacterial clone and by several different criteria judged to be homogeneous, and in others to have occurred either during the cloning procedure or during subsequent propagation of hybrid molecules. The molecular changes that are described involved insertion/deletion of the previously characterized IS2 insertion element, formation of a new inverted repeat structure probably by duplication of a preexisting R6-5 DNA sequence, sequence inversion, and loss and gain of restriction endonuclease cleavage sites.
Nucleic Acid Detection Methods

DOEpatents

Smith, Cassandra L.; Yaar, Ron; Szafranski, Przemyslaw; Cantor, Charles R.

1998-05-19

The invention relates to methods for rapidly determining the sequence and/or length a target sequence. The target sequence may be a series of known or unknown repeat sequences which are hybridized to an array of probes. The hybridized array is digested with a single-strand nuclease and free 3'-hydroxyl groups extended with a nucleic acid polymerase. Nuclease cleaved heteroduplexes can be easily distinguish from nuclease uncleaved heteroduplexes by differential labeling. Probes and target can be differentially labeled with detectable labels. Matched target can be detected by cleaving resulting loops from the hybridized target and creating free 3-hydroxyl groups. These groups are recognized and extended by polymerases added into the reaction system which also adds or releases one label into solution. Analysis of the resulting products using either solid phase or solution. These methods can be used to detect characteristic nucleic acid sequences, to determine target sequence and to screen for genetic defects and disorders. Assays can be conducted on solid surfaces allowing for multiple reactions to be conducted in parallel and, if desired, automated.
Noninvasive Prenatal Paternity Testing (NIPAT) through Maternal Plasma DNA Sequencing: A Pilot Study.

PubMed

Jiang, Haojun; Xie, Yifan; Li, Xuchao; Ge, Huijuan; Deng, Yongqiang; Mu, Haofang; Feng, Xiaoli; Yin, Lu; Du, Zhou; Chen, Fang; He, Nongyue

2016-01-01

Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) have been already used to perform noninvasive prenatal paternity testing from maternal plasma DNA. The frequently used technologies were PCR followed by capillary electrophoresis and SNP typing array, respectively. Here, we developed a noninvasive prenatal paternity testing (NIPAT) based on SNP typing with maternal plasma DNA sequencing. We evaluated the influence factors (minor allele frequency (MAF), the number of total SNP, fetal fraction and effective sequencing depth) and designed three different selective SNP panels in order to verify the performance in clinical cases. Combining targeted deep sequencing of selective SNP and informative bioinformatics pipeline, we calculated the combined paternity index (CPI) of 17 cases to determine paternity. Sequencing-based NIPAT results fully agreed with invasive prenatal paternity test using STR multiplex system. Our study here proved that the maternal plasma DNA sequencing-based technology is feasible and accurate in determining paternity, which may provide an alternative in forensic application in the future.
Multiple regulatory mechanisms of hepatocyte growth factor expression in malignant cells with a short poly(dA) sequence in the HGF gene promoter.

PubMed

Sakai, Kazuko; Takeda, Masayuki; Okamoto, Isamu; Nakagawa, Kazuhiko; Nishio, Kazuto

2015-01-01

Hepatocyte growth factor (HGF) expression is a poor prognostic factor in various types of cancer. Expression levels of HGF have been reported to be regulated by shorter poly(dA) sequences in the promoter region. In the present study, the poly(dA) mononucleotide tract in various types of human cancer cell lines was examined and compared with the HGF expression levels in those cells. Short deoxyadenosine repeat sequences were detected in five of the 55 cell lines used in the present study. The H69, IM95, CCK-81, Sui73 and H28 cells exhibited a truncated poly(dA) sequence in which the number of poly(dA) repeats was reduced by ≥5 bp. Two of the cell lines exhibited high HGF expression, determined by reverse transcription quantitative polymerase chain reaction and enzyme-linked immunosorbent assay. The CCK-81, Sui73 and H28 cells with shorter poly(dA) sequences exhibited low HGF expression. The cause of the suppression of HGF expression in the CCK-81, Sui73 and H28 cells was clarified by two approaches, suppression by methylation and single nucleotide polymorphisms in the HGF gene. Exposure to 5-Aza-dC, an inhibitor of DNA methyltransferase 1, induced an increased expression of HGF in the CCK-81 cells, but not in the other cells. Single-nucleotide polymorphism (SNP) rs72525097 in intron 1 was detected in the Sui73 and H28 cells. Taken together, it was found that the defect of poly(dA) in the HGF promoter was present in various types of cancer, including lung, stomach, colorectal, pancreas and mesothelioma. The present study proposes the negative regulation mechanisms by methylation and SNP in intron 1 of HGF for HGF expression in cancer cells with short poly(dA).
Structural analysis of the rDNA intergenic spacer of Brassica nigra: evolutionary divergence of the spacers of the three diploid Brassica species.

PubMed

Bhatia, S; Singh Negi, M; Lakshmikumaran, M

1996-11-01

EcoRI restriction of the B. nigra rDNA recombinants, isolated from a lambda genomic library, showed that the 3.9-kb fragment corresponded to the Intergenic Spacer (IGS), which was sequenced and found to be 3,928 bp in size. Sequence and dot-matrix analyses showed that the organization of the B. nigra rDNA IGS was typical of most rDNA spacers, consisting of a central repetitive region and flanking unique sequences on either side. The repetitive region was composed of two repeat families-RF 'A' and RF 'B.' The B. nigra RF 'A' consisted of a tandem array of three full-length copies of a 106-bp sequence element. RF 'B' was composed of 66 tandemly repeated elements. Each 'B' element was only 21-bp in size and this is the smallest repeat unit identified in plant rDNA to date. The putative transcription initiation site (TIS) was identified as nucleotide position 3,110. Based on the sequence analysis it was suggested that the present organization of the repeat families was generated by successive cycles of deletions and amplifications and was being maintained by homogenization processes such as gene conversion and crossing-over.A detailed comparison of the rDNA IGS sequences of the three diploid Brassica species-namely, B. nigra, B. campestris, and B. oleracea-was carried out. First, comparisons revealed that B. campestris and B. oleracea were close to each other as the repeat families in both showed high sequence homology between each other. Second, the repeat elements in both the species were organized in an interspersed manner. Third, a 52-bp sequence, present just downstream of the repeats in B. campestris, was found to be identical to the B. oleracea repeats, thereby suggesting a common progenitor. On the other hand, in B. nigra no interspersion pattern of organization of repeats was observed. Further, the B. nigra RF 'A' was identified as distinct from the repeat families of B. campestris and B. oleracea. Based on this analysis, it was suggested that during speciation B. campestris and B. oleracea evolved in one lineage whereas B. nigra diverged into a separate lineage. The comparative analysis of the IGS helped in identifying not only conserved ancestral sequence motifs of possible functional significance such as promoters and enhancers, but also sequences which showed variation between the three diploid species and were therefore identified as species-specific sequences.
The complete chloroplast genome sequence of Dodonaea viscosa: comparative and phylogenetic analyses.

PubMed

Saina, Josphat K; Gichira, Andrew W; Li, Zhi-Zhong; Hu, Guang-Wan; Wang, Qing-Feng; Liao, Kuo

2018-02-01

The plant chloroplast (cp) genome is a highly conserved structure which is beneficial for evolution and systematic research. Currently, numerous complete cp genome sequences have been reported due to high throughput sequencing technology. However, there is no complete chloroplast genome of genus Dodonaea that has been reported before. To better understand the molecular basis of Dodonaea viscosa chloroplast, we used Illumina sequencing technology to sequence its complete genome. The whole length of the cp genome is 159,375 base pairs (bp), with a pair of inverted repeats (IRs) of 27,099 bp separated by a large single copy (LSC) 87,204 bp, and small single copy (SSC) 17,972 bp. The annotation analysis revealed a total of 115 unique genes of which 81 were protein coding, 30 tRNA, and four ribosomal RNA genes. Comparative genome analysis with other closely related Sapindaceae members showed conserved gene order in the inverted and single copy regions. Phylogenetic analysis clustered D. viscosa with other species of Sapindaceae with strong bootstrap support. Finally, a total of 249 SSRs were detected. Moreover, a comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates in D. viscosa showed very low values. The availability of cp genome reported here provides a valuable genetic resource for comprehensive further studies in genetic variation, taxonomy and phylogenetic evolution of Sapindaceae family. In addition, SSR markers detected will be used in further phylogeographic and population structure studies of the species in this genus.

Microevolution of Pandemic Vibrio parahaemolyticus Assessed by the Number of Repeat Units in Short Sequence Tandem Repeat Regions

PubMed Central

García, Katherine; Gavilán, Ronnie G.; Höfle, Manfred G.; Martínez-Urtaza, Jaime; Espejo, Romilio T.

2012-01-01

The emergence of the pandemic strain Vibrio parahaemolyticus O3:K6 in 1996 caused a large increase of diarrhea outbreaks related to seafood consumption in Southeast Asia, and later worldwide. Isolates of this strain constitutes a clonal complex, and their effectual differentiation is possible by comparison of their variable number tandem repeats (VNTRs). The differentiation of the isolates by the differences in VNTRs will allow inferring the population dynamics and microevolution of this strain but this requires knowing the rate and mechanism of VNTRs' variation. Our study of mutants obtained after serial cultivation of clones showed that mutation rates of the six VNTRs examined are on the order of 10−4 mutant per generation and that difference increases by stepwise addition of single mutations. The single stepwise mutation (SSM) was deduced because mutants with 1, 2, 3, or more repeat unit deletions or insertions follow a geometric distribution. Plausible phylogenetic trees are obtained when, according to SSM, the genetic distance between clusters with different number of repeats is assessed by the absolute differences in repeats. Using this approach, mutants originated from different isolates of pandemic V. parahaemolyticus after serial cultivation are clustered with their parental isolates. Additionally, isolates of pandemic V. parahaemolyticus from Southeast Asia, Tokyo, and northern and southern Chile are clustered according their geographical origin. The deepest split in these four populations is observed between the Tokyo and southern Chile populations. We conclude that proper phylogenetic relations and successful tracing of pandemic V. parahaemolyticus requires measuring the differences between isolates by the absolute number of repeats in the VNTRs considered. PMID:22292049
Isothermal circular-strand-displacement polymerization of DNA and microRNA in digital microfluidic devices.

PubMed

Giuffrida, Maria Chiara; Zanoli, Laura Maria; D'Agata, Roberta; Finotti, Alessia; Gambari, Roberto; Spoto, Giuseppe

2015-02-01

Nucleic-acid amplification is a crucial step in nucleic-acid-sequence-detection assays. The use of digital microfluidic devices to miniaturize amplification techniques reduces the required sample volume and the analysis time and offers new possibilities for process automation and integration in a single device. The recently introduced droplet polymerase-chain-reaction (PCR) amplification methods require repeated cycles of two or three temperature-dependent steps during the amplification of the nucleic-acid target sequence. In contrast, low-temperature isothermal-amplification methods have no need for thermal cycling, thus requiring simplified microfluidic-device features. Here, the combined use of digital microfluidics and molecular-beacon (MB)-assisted isothermal circular-strand-displacement polymerization (ICSDP) to detect microRNA-210 sequences is described. MicroRNA-210 has been described as the most consistently and predominantly upregulated hypoxia-inducible factor. The nmol L(-1)-pmol L(-1) detection capabilities of the method were first tested by targeting single-stranded DNA sequences from the genetically modified Roundup Ready soybean. The ability of the droplet-ICSDP method to discriminate between full-matched, single-mismatched, and unrelated sequences was also investigated. The detection of a range of nmol L(-1)-pmol L(-1) microRNA-210 solutions compartmentalized in nanoliter-sized droplets was performed, establishing the ability of the method to detect as little as 10(-18) mol of microRNA target sequences compartmentalized in 20 nL droplets. The suitability of the method for biological samples was tested by detecting microRNA-210 from transfected K562 cells.
Formation and Repair of Mismatches Containing Ribonucleotides and Oxidized Bases at Repeated DNA Sequences.

PubMed

Cilli, Piera; Minoprio, Anna; Bossa, Cecilia; Bignami, Margherita; Mazzei, Filomena

2015-10-23

The cellular pool of ribonucleotide triphosphates (rNTPs) is higher than that of deoxyribonucleotide triphosphates. To ensure genome stability, DNA polymerases must discriminate against rNTPs and incorporated ribonucleotides must be removed by ribonucleotide excision repair (RER). We investigated DNA polymerase β (POL β) capacity to incorporate ribonucleotides into trinucleotide repeated DNA sequences and the efficiency of base excision repair (BER) and RER enzymes (OGG1, MUTYH, and RNase H2) when presented with an incorrect sugar and an oxidized base. POL β incorporated rAMP and rCMP opposite 7,8-dihydro-8-oxoguanine (8-oxodG) and extended both mispairs. In addition, POL β was able to insert and elongate an oxidized rGMP when paired with dA. We show that RNase H2 always preserves the capacity to remove a single ribonucleotide when paired to an oxidized base or to incise an oxidized ribonucleotide in a DNA duplex. In contrast, BER activity is affected by the presence of a ribonucleotide opposite an 8-oxodG. In particular, MUTYH activity on 8-oxodG:rA mispairs is fully inhibited, although its binding capacity is retained. This results in the reduction of RNase H2 incision capability of this substrate. Thus complex mispairs formed by an oxidized base and a ribonucleotide can compromise BER and RER in repeated sequences. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Development and application of microsatellites in candidate genes related to wood properties in the Chinese white poplar (Populus tomentosa Carr.).

PubMed

Du, Qingzhang; Gong, Chenrui; Pan, Wei; Zhang, Deqiang

2013-02-01

Gene-derived simple sequence repeats (genic SSRs), also known as functional markers, are often preferred over random genomic markers because they represent variation in gene coding and/or regulatory regions. We characterized 544 genic SSR loci derived from 138 candidate genes involved in wood formation, distributed throughout the genome of Populus tomentosa, a key ecological and cultivated wood production species. Of these SSRs, three-quarters were located in the promoter or intron regions, and dinucleotide (59.7%) and trinucleotide repeat motifs (26.5%) predominated. By screening 15 wild P. tomentosa ecotypes, we identified 188 polymorphic genic SSRs with 861 alleles, 2-7 alleles for each marker. Transferability analysis of 30 random genic SSRs, testing whether these SSRs work in 26 genotypes of five genus Populus sections (outgroup, Salix matsudana), showed that 72% of the SSRs could be amplified in Turanga and 100% could be amplified in Leuce. Based on genotyping of these 26 genotypes, a neighbour-joining analysis showed the expected six phylogenetic groupings. In silico analysis of SSR variation in 220 sequences that are homologous between P. tomentosa and Populus trichocarpa suggested that genic SSR variations between relatives were predominantly affected by repeat motif variations or flanking sequence mutations. Inheritance tests and single-marker associations demonstrated the power of genic SSRs in family-based linkage mapping and candidate gene-based association studies, as well as marker-assisted selection and comparative genomic studies of P. tomentosa and related species.
Interstitial telomeric sequences in vertebrate chromosomes: Origin, function, instability and evolution.

PubMed

Bolzán, Alejandro D

2017-07-01

By definition, telomeric sequences are located at the very ends or terminal regions of chromosomes. However, several vertebrate species show blocks of (TTAGGG)n repeats present in non-terminal regions of chromosomes, the so-called interstitial telomeric sequences (ITSs), interstitial telomeric repeats or interstitial telomeric bands, which include those intrachromosomal telomeric-like repeats located near (pericentromeric ITSs) or within the centromere (centromeric ITSs) and those telomeric repeats located between the centromere and the telomere (i.e., truly interstitial telomeric sequences) of eukaryotic chromosomes. According with their sequence organization, localization and flanking sequences, ITSs can be classified into four types: 1) short ITSs, 2) subtelomeric ITSs, 3) fusion ITSs, and 4) heterochromatic ITSs. The first three types have been described mainly in the human genome, whereas heterochromatic ITSs have been found in several vertebrate species but not in humans. Several lines of evidence suggest that ITSs play a significant role in genome instability and evolution. This review aims to summarize our current knowledge about the origin, function, instability and evolution of these telomeric-like repeats in vertebrate chromosomes. Copyright © 2017 Elsevier B.V. All rights reserved.
Clustered regularly interspaced short palindromic repeats (CRISPRs) for the genotyping of bacterial pathogens.

PubMed

Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

2009-01-01

Clustered regularly interspaced short palindromic repeats (CRISPRs) are DNA sequences composed of a succession of repeats (23- to 47-bp long) separated by unique sequences called spacers. Polymorphism can be observed in different strains of a species and may be used for genotyping. We describe protocols and bioinformatics tools that allow the identification of CRISPRs from sequenced genomes, their comparison, and their component determination (the direct repeats and the spacers). A schematic representation of the spacer organization can be produced, allowing an easy comparison between strains.
Identification of the centromeric repeat in the threespine stickleback fish (Gasterosteus aculeatus).

PubMed

Cech, Jennifer N; Peichel, Catherine L

2015-12-01

Centromere sequences exist as gaps in many genome assemblies due to their repetitive nature. Here we take an unbiased approach utilizing centromere protein A (CENP-A) chomatin immunoprecipitation followed by high-throughput sequencing to identify the centromeric repeat sequence in the threespine stickleback fish (Gasterosteus aculeatus). A 186-bp, AT-rich repeat was validated as centromeric using both fluorescence in situ hybridization (FISH) and immunofluorescence combined with FISH (IF-FISH) on interphase nuclei and metaphase spreads. This repeat hybridizes strongly to the centromere on all chromosomes, with the exception of weak hybridization to the Y chromosome. Together, our work provides the first validated sequence information for the threespine stickleback centromere.
Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

PubMed Central

2012-01-01

Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678
Molecular Characterization and Comparative Sequence Analysis of Defense-Related Gene, Oryza rufipogon Receptor-Like Protein Kinase 1

PubMed Central

Law, Yee-Song; Gudimella, Ranganath; Song, Beng-Kah; Ratnam, Wickneswari; Harikrishna, Jennifer Ann

2012-01-01

Many of the plant leucine rich repeat receptor-like kinases (LRR-RLKs) have been found to regulate signaling during plant defense processes. In this study, we selected and sequenced an LRR-RLK gene, designated as Oryza rufipogon receptor-like protein kinase 1 (OrufRPK1), located within yield QTL yld1.1 from the wild rice Oryza rufipogon (accession IRGC105491). A 2055 bp coding region and two exons were identified. Southern blotting determined OrufRPK1 to be a single copy gene. Sequence comparison with cultivated rice orthologs (OsI219RPK1, OsI9311RPK1 and OsJNipponRPK1, respectively derived from O. sativa ssp. indica cv. MR219, O. sativa ssp. indica cv. 9311 and O. sativa ssp. japonica cv. Nipponbare) revealed the presence of 12 single nucleotide polymorphisms (SNPs) with five non-synonymous substitutions, and 23 insertion/deletion sites. The biological role of the OrufRPK1 as a defense related LRR-RLK is proposed on the basis of cDNA sequence characterization, domain subfamily classification, structural prediction of extra cellular domains, cluster analysis and comparative gene expression. PMID:22942769
Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

PubMed Central

Eguiluz, Maria; Yuyama, Priscila Mary; Guzman, Frank; Rodrigues, Nureyev Ferreira; Margis, Rogerio

2017-01-01

Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp) genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC) and 18,587 bp (SSC). The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes). Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization. PMID:29111566
A leucine repeat motif in AbiA is required for resistance of Lactococcus lactis to phages representing three species.

PubMed

Dinsmore, P K; O'Sullivan, D J; Klaenhammer, T R

1998-05-28

The abiA gene encodes an abortive bacteriophage infection mechanism that can protect Lactococcus species from infection by a variety of bacteriophages including three unrelated phage species. Five heptad leucine repeats suggestive of a leucine zipper motif were identified between residues 232 and 266 in the predicted amino acid sequence of the AbiA protein. The biological role of residues in the repeats was investigated by incorporating amino acid substitutions via site-directed mutagenesis. Each mutant was tested for phage resistance against three phages, phi 31, sk1, and c2, belonging to species P335, 936, and c2, respectively. The five residues that comprise the heptad repeats were designated L234, L242, A249, L256, and L263. Three single conservative mutations of leucine to valine in positions L235, L242, and L263 and a double mutation of two leucines (L235 and L242) to valines did not affect AbiA activity on any phages tested. Non-conservative single substitutions of charged amino acids for three of the leucines (L235, L242, and L256) virtually eliminated AbiA activity on all phages tested. Substitution of the alanine residue in the third repeat (A249) with a charged residue did not affect AbiA activity. Replacement of L242 with an alanine elimination phage resistance against phi 31, but partial resistance to sk1 and c2 remained. Two single proline substitutions for leucines L242 and L263 virtually eliminated AbiA activity against all phages, indicating that the predicted alpha-helical structure of this region is important. Mutations in an adjacent region of basic amino acids had various effects on phage resistance, suggesting that these basic residues are also important for AbiA activity. This directed mutagenesis analysis of AbiA indicated that the leucine repeat structure is essential for conferring phage resistance against three species of lactococcal bacteriophages.
SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

USGS Publications Warehouse

Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.
Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

PubMed

Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

2014-04-01

Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.
The Complete Chloroplast Genome of 17 Individuals of Pest Species Jacobaea vulgaris: SNPs, Microsatellites and Barcoding Markers for Population and Phylogenetic Studies

PubMed Central

Doorduin, Leonie; Gravendeel, Barbara; Lammers, Youri; Ariyurek, Yavuz; Chin-A-Woeng, Thomas; Vrieling, Klaas

2011-01-01

Invasive individuals from the pest species Jacobaea vulgaris show different allocation patterns in defence and growth compared with native individuals. To examine if these changes are caused by fast evolution, it is necessary to identify native source populations and compare these with invasive populations. For this purpose, we are in need of intraspecific polymorphic markers. We therefore sequenced the complete chloroplast genomes of 12 native and 5 invasive individuals of J. vulgaris with next generation sequencing and discovered single-nucleotide polymorphisms (SNPs) and microsatellites. This is the first study in which the chloroplast genome of that many individuals within a single species was sequenced. Thirty-two SNPs and 34 microsatellite regions were found. For none of the individuals, differences were found between the inverted repeats. Furthermore, being the first chloroplast genome sequenced in the Senecioneae clade, we compared it with four other members of the Asteraceae family to identify new regions for phylogentic inference within this clade and also within the Asteraceae family. Five markers (ndhC-trnV, ndhC-atpE, rps18-rpl20, clpP and psbM-trnD) contained parsimony-informative characters higher than 2%. Finally, we compared two procedures of preparing chloroplast DNA for next generation sequencing. PMID:21444340
Molecular structure and chromosome distribution of three repetitive DNA families in Anemone hortensis L. (Ranunculaceae).

PubMed

Mlinarec, Jelena; Chester, Mike; Siljak-Yakovlev, Sonja; Papes, Drazena; Leitch, Andrew R; Besendorfer, Visnja

2009-01-01

The structure, abundance and location of repetitive DNA sequences on chromosomes can characterize the nature of higher plant genomes. Here we report on three new repeat DNA families isolated from Anemone hortensis L.; (i) AhTR1, a family of satellite DNA (stDNA) composed of a 554-561 bp long EcoRV monomer; (ii) AhTR2, a stDNA family composed of a 743 bp long HindIII monomer and; (iii) AhDR, a repeat family composed of a 945 bp long HindIII fragment that exhibits some sequence similarity to Ty3/gypsy-like retroelements. Fluorescence in-situ hybridization (FISH) to metaphase chromosomes of A. hortensis (2n = 16) revealed that both AhTR1 and AhTR2 sequences co-localized with DAPI-positive AT-rich heterochromatic regions. AhTR1 sequences occur at intercalary DAPI bands while AhTR2 sequences occur at 8-10 terminally located heterochromatic blocks. In contrast AhDR sequences are dispersed over all chromosomes as expected of a Ty3/gypsy-like element. AhTR2 and AhTR1 repeat families include polyA- and polyT-tracks, AT/TA-motifs and a pentanucleotide sequence (CAAAA) that may have consequences for chromatin packing and sequence homogeneity. AhTR2 repeats also contain TTTAGGG motifs and degenerate variants. We suggest that they arose by interspersion of telomeric repeats with subtelomeric repeats, before hybrid unit(s) amplified through the heterochromatic domain. The three repetitive DNA families together occupy approximately 10% of the A. hortensis genome. Comparative analyses of eight Anemone species revealed that the divergence of the A. hortensis genome was accompanied by considerable modification and/or amplification of repeats.
A septal chromosome segregator protein evolved into a conjugative DNA-translocator protein

PubMed Central

Sepulveda, Edgardo; Vogelmann, Jutta

2011-01-01

Streptomycetes, Gram-positive soil bacteria well known for the production of antibiotics feature a unique conjugative DNA transfer system. In contrast to classical conjugation which is characterized by the secretion of a pilot protein covalently linked to a single-stranded DNA molecule, in Streptomyces a double-stranded DNA molecule is translocated during conjugative transfer. This transfer involves a single plasmid encoded protein, TraB. A detailed biochemical and biophysical characterization of TraB, revealed a close relationship to FtsK, mediating chromosome segregation during bacterial cell division. TraB translocates plasmid DNA by recognizing 8-bp direct repeats located in a specific plasmid region clt. Similar sequences accidentally also occur on chromosomes and have been shown to be bound by TraB. We suggest that TraB mobilizes chromosomal genes by the interaction with these chromosomal clt-like sequences not relying on the integration of the conjugative plasmid into the chromosome. PMID:22479692
The complete chloroplast genome of an irreplaceable dietary and model crop, foxtail millet (Setaria italica).

PubMed

Wang, Shuo; Gao, Li-Zhi

2016-11-01

The complete chloroplast genome sequence of foxtail millet (Setaria italica), an important food and fodder crop in the family Poaceae, is first reported in this study. The genome consists of 1 35 516 bp containing a pair of inverted repeats (IRs) of 21 804 bp separated by a large single-copy (LSC) region and a small single-copy (SSC) region of 79 896 bp and 12 012 bp, respectively. Coding sequences constitute 58.8% of the genome harboring 111 unique genes, 71 of which are protein-coding genes, 4 are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated foxtail millet clustered with Panicum virgatum and Echinochloa crus-galli belonging to the tribe Paniceae of the subfamily Panicoideae. This newly determined chloroplast genome will provide valuable information for the future breeding programs of valuable cereal crops in the family Poaceae.
Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

DOEpatents

Weier, H.U.G.; Gray, J.W.

1995-06-27

A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.
Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

DOEpatents

Weier, Heinz-Ulrich G.; Gray, Joe W.

1995-01-01

A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.
Novel Structure of Ty3 Reverse Transcriptase | Center for Cancer Research

Cancer.gov

Retrotransposons are mobile genetic elements that self amplify via a single-stranded RNA intermediate, which is converted to double-stranded DNA by an encoded reverse transcriptase (RT) with both DNA polymerase (pol) and ribonuclease H (RNase) activities. Categorized by whether they contain flanking long terminal repeat (LTR) sequences, retrotransposons play a critical role in the architecture of eukaryotic genomes and are the evolutionary origin of retroviruses, including human immunodeficiency virus (HIV).

De novo identification of highly diverged protein repeats by probabilistic consistency.

PubMed

Biegert, A; Söding, J

2008-03-15

An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID
Differential effects of the mismatch repair genes MSH2 and MSH3 on homeologous recombination in Saccharomyces cerevisiae.

PubMed

Selva, E M; Maderazo, A B; Lahue, R S

1997-12-01

The products of the yeast mismatch repair genes MSH2 and MSH3 participate in the inhibition of genetic recombination between homeologous (divergent) DNA sequences. In strains deficient for these genes, homeologous recombination rates between repeated elements are elevated due to the loss of this inhibition. In this study, the effects of these mutations were further analyzed by quantitation of mitotic homeologous recombinants as crossovers, gene conversions or exceptional events in wild-type, msh2, msh3 and msh2 msh3 mutant strains. When homeologous sequences were present as a direct repeat in one orientation, crossovers and gene conversions were elevated in msh2, msh3 and msh2 msh3 strains. The increases were greater in the msh2 msh3 double mutant than in either single mutant. When the order of the homeologous sequences was reversed, the msh2 mutation again yielded increased rates of crossovers and gene conversions. However, in an msh3 strain, gene conversions occurred at higher levels but interchromosomal crossovers were not increased and intrachromosomal crossovers were reduced relative to wild type. The msh2 msh3 double mutant behaved like the msh2 single mutant in this orientation. Control strains harboring homologous duplications were largely but not entirely unaffected in mutant strains, suggesting specificity for the mismatched intermediates of homeologous recombination. In all strains, very few (< 10%) recombinants could be attributed to exceptional events. These results suggest that MSH2 and MSH3 can function differentially to control homeologous exchanges.
Seven New Complete Plastome Sequences Reveal Rampant Independent Loss of the ndh Gene Family across Orchids and Associated Instability of the Inverted Repeat/Small Single-Copy Region Boundaries

PubMed Central

Moore, Michael J.; Neubig, Kurt M.; Williams, Norris H.; Whitten, W. Mark; Kim, Joo-Hwan

2015-01-01

Earlier research has revealed that the ndh loci have been pseudogenized, truncated, or deleted from most orchid plastomes sequenced to date, including in all available plastomes of the two most species-rich subfamilies, Orchidoideae and Epidendroideae. This study sought to resolve deeper-level phylogenetic relationships among major orchid groups and to refine the history of gene loss in the ndh loci across orchids. The complete plastomes of seven orchids, Oncidium sphacelatum (Epidendroideae), Masdevallia coccinea (Epidendroideae), Sobralia callosa (Epidendroideae), Sobralia aff. bouchei (Epidendroideae), Elleanthus sodiroi (Epidendroideae), Paphiopedilum armeniacum (Cypripedioideae), and Phragmipedium longifolium (Cypripedioideae) were sequenced and analyzed in conjunction with all other available orchid and monocot plastomes. Most ndh loci were found to be pseudogenized or lost in Oncidium, Paphiopedilum and Phragmipedium, but surprisingly, all ndh loci were found to retain full, intact reading frames in Sobralia, Elleanthus and Masdevallia. Character mapping suggests that the ndh genes were present in the common ancestor of orchids but have experienced independent, significant losses at least eight times across four subfamilies. In addition, ndhF gene loss was correlated with shifts in the position of the junction of the inverted repeat (IR) and small single-copy (SSC) regions. The Orchidaceae have unprecedented levels of homoplasy in ndh gene presence/absence, which may be correlated in part with the unusual life history of orchids. These results also suggest that ndhF plays a role in IR/SSC junction stability. PMID:26558895
ATM kinase is required for telomere elongation in mouse and human cells

PubMed Central

Lee, Stella Suyong; Bohrson, Craig; Pike, Alexandra Mims; Wheelan, Sarah Jo; Greider, Carol Widney

2015-01-01

Summary Short telomeres induce a DNA damage response, senescence and apoptosis; thus, maintaining telomere length equilibrium is essential for cell viability. Telomerase addition of telomere repeats is tightly regulated in cells. To probe pathways that regulate telomere addition, we developed the ADDIT assay to measure new telomere addition at a single telomere in vivo. Sequence analysis showed telomerase specific addition of repeats onto a new telomere occurred in just 48 hr. Using the ADDIT assay, we found that ATM is required for addition of new repeats onto telomeres in mouse cells. Evaluation of bulk telomeres, in both human and mouse cells, showed that blocking ATM inhibited telomere elongation. Finally, the activation of ATM through the inhibition of PARP1 resulted in increased telomere elongation, supporting the central role of the ATM pathway in regulating telomere addition. Understanding this role of ATM may yield new areas for possible therapeutic intervention in telomere-mediated disease. PMID:26586427
New development and validation of 50 SSR markers in breadfruit (Artocarpus altilis, Moraceae) by next-generation sequencing.

PubMed

De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre

2016-08-01

Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management.
Repeated sampling of genes from a single cell - implications for gravitropism research

NASA Astrophysics Data System (ADS)

Scherp, P.; Hasenstein, K. H.

The need for repeated but independent extractions of mRNA from single cells and plant tissues prompted the development of Solid Phase Gene Extraction (SPGE, patent pending). Oligo dT18 coated glass needles hybridize during a 2 to 3 min sampling time with the poly A+ mRNA. The needle is withdrawn and can be used directly for RT-PCR. Because of the small probe size, no cytoplasm is lost and repeated sampling of the same cell is possible. SPGE of Chara rhizoids and internodal cells showed fluctuations of type and quantity of mRNA in specific areas of the cytoplasm of rhizoids and time-dependent gene expression in internodal cells as a function of light/dark intervals. Despite extensive cytoplasmic streaming, mRNA-samples taken in the vicinity of the nucleus revealed a higher variability than the distal ends of the cell. In rhizoids, the mRNA/cDNA varied between the different zones of cytoplasm. In Arabidopsis, we isolated cDNA species from root tips, shoots and leaves and determined their sequences. Growth studies on SPGE-sampled individuals showed that after a short recovery period, all sampled plants resumed growth with normal growth rates and graviresponse. The data indicate that SPGE is a powerful method to study gene expression in single cells and in tissues of higher plants with high spatial and temporal resolution. Supported by NASA: NAG 2-1423
Detecting and Characterizing Repeating Earthquake Sequences During Volcanic Eruptions

NASA Astrophysics Data System (ADS)

Tepp, G.; Haney, M. M.; Wech, A.

2017-12-01

A major challenge in volcano seismology is forecasting eruptions. Repeating earthquake sequences often precede volcanic eruptions or lava dome activity, providing an opportunity for short-term eruption forecasting. Automatic detection of these sequences can lead to timely eruption notification and aid in continuous monitoring of volcanic systems. However, repeating earthquake sequences may also occur after eruptions or along with magma intrusions that do not immediately lead to an eruption. This additional challenge requires a better understanding of the processes involved in producing these sequences to distinguish those that are precursory. Calculation of the inverse moment rate and concepts from the material failure forecast method can lead to such insights. The temporal evolution of the inverse moment rate is observed to differ for precursory and non-precursory sequences, and multiple earthquake sequences may occur concurrently. These observations suggest that sequences may occur in different locations or through different processes. We developed an automated repeating earthquake sequence detector and near real-time alarm to send alerts when an in-progress sequence is identified. Near real-time inverse moment rate measurements can further improve our ability to forecast eruptions by allowing for characterization of sequences. We apply the detector to eruptions of two Alaskan volcanoes: Bogoslof in 2016-2017 and Redoubt Volcano in 2009. The Bogoslof eruption produced almost 40 repeating earthquake sequences between its start in mid-December 2016 and early June 2017, 21 of which preceded an explosive eruption, and 2 sequences in the months before eruptive activity. Three of the sequences occurred after the implementation of the alarm in late March 2017 and successfully triggered alerts. The nearest seismometers to Bogoslof are over 45 km away, requiring a detector that can work with few stations and a relatively low signal-to-noise ratio. During the Redoubt eruption, earthquake sequences were observed in the months leading up to the eruptive activity beginning in March 2009 as well as immediately preceding 7 of the 19 explosive events. In contrast to Bogoslof, Redoubt has a local monitoring network which allows for better detection and more detailed analysis of the repeating earthquake sequences.
The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

PubMed Central

Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

2007-01-01

Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element) are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the dictionary creator. CRISPRdb is accessible at PMID:17521438
Variable presence of the inverted repeat and plastome stability in Erodium

PubMed Central

Blazier, John C.; Jansen, Robert K.; Mower, Jeffrey P.; Govindu, Madhu; Zhang, Jin; Weng, Mao-Lun; Ruhlman, Tracey A.

2016-01-01

Background and Aims Several unrelated lineages such as plastids, viruses and plasmids, have converged on quadripartite genomes of similar size with large and small single copy regions and a large inverted repeat (IR). Except for Erodium (Geraniaceae), saguaro cactus and some legumes, the plastomes of all photosynthetic angiosperms display this structure. The functional significance of the IR is not understood and Erodium provides a system to examine the role of the IR in the long-term stability of these genomes. We compared the degree of genomic rearrangement in plastomes of Erodium that differ in the presence and absence of the IR. Methods We sequenced 17 new Erodium plastomes. Using 454, Illumina, PacBio and Sanger sequences, 16 genomes were assembled and categorized along with one incomplete and two previously published Erodium plastomes. We conducted phylogenetic analyses among these species using a dataset of 19 protein-coding genes and determined if significantly higher evolutionary rates had caused the long branch seen previously in phylogenetic reconstructions within the genus. Bioinformatic comparisons were also performed to evaluate plastome evolution across the genus. Key Results Erodium plastomes fell into four types (Type 1–4) that differ in their substitution rates, short dispersed repeat content and degree of genomic rearrangement, gene and intron content and GC content. Type 4 plastomes had significantly higher rates of synonymous substitutions (dS) for all genes and for 14 of the 19 genes non-synonymous substitutions (dN) were significantly accelerated. We evaluated the evidence for a single IR loss in Erodium and in doing so discovered that Type 4 plastomes contain a novel IR. Conclusions The presence or absence of the IR does not affect plastome stability in Erodium. Rather, the overall repeat content shows a negative correlation with genome stability, a pattern in agreement with other angiosperm groups and recent findings on genome stability in bacterial endosymbionts. PMID:27192713
Variable presence of the inverted repeat and plastome stability in Erodium.

PubMed

Blazier, John C; Jansen, Robert K; Mower, Jeffrey P; Govindu, Madhu; Zhang, Jin; Weng, Mao-Lun; Ruhlman, Tracey A

2016-06-01

Several unrelated lineages such as plastids, viruses and plasmids, have converged on quadripartite genomes of similar size with large and small single copy regions and a large inverted repeat (IR). Except for Erodium (Geraniaceae), saguaro cactus and some legumes, the plastomes of all photosynthetic angiosperms display this structure. The functional significance of the IR is not understood and Erodium provides a system to examine the role of the IR in the long-term stability of these genomes. We compared the degree of genomic rearrangement in plastomes of Erodium that differ in the presence and absence of the IR. We sequenced 17 new Erodium plastomes. Using 454, Illumina, PacBio and Sanger sequences, 16 genomes were assembled and categorized along with one incomplete and two previously published Erodium plastomes. We conducted phylogenetic analyses among these species using a dataset of 19 protein-coding genes and determined if significantly higher evolutionary rates had caused the long branch seen previously in phylogenetic reconstructions within the genus. Bioinformatic comparisons were also performed to evaluate plastome evolution across the genus. Erodium plastomes fell into four types (Type 1-4) that differ in their substitution rates, short dispersed repeat content and degree of genomic rearrangement, gene and intron content and GC content. Type 4 plastomes had significantly higher rates of synonymous substitutions (dS) for all genes and for 14 of the 19 genes non-synonymous substitutions (dN) were significantly accelerated. We evaluated the evidence for a single IR loss in Erodium and in doing so discovered that Type 4 plastomes contain a novel IR. The presence or absence of the IR does not affect plastome stability in Erodium. Rather, the overall repeat content shows a negative correlation with genome stability, a pattern in agreement with other angiosperm groups and recent findings on genome stability in bacterial endosymbionts. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Identifying uniformly mutated segments within repeats.

PubMed

Sahinalp, S Cenk; Eichler, Evan; Goldberg, Paul; Berenbrink, Petra; Friedetzky, Tom; Ergun, Funda

2004-12-01

Given a long string of characters from a constant size alphabet we present an algorithm to determine whether its characters have been generated by a single i.i.d. random source. More specifically, consider all possible n-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the n coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability of a coin change is much lower than the probability of using the same coin repeatedly. We present a procedure to evaluate the likelihood of a n-coin model for given S, subject a uniform prior distribution over the parameters of the model (that represent mutation rates and probabilities of copying events). In the absence of detailed prior knowledge of these parameters, the algorithm can be used to determine whether the a posteriori probability for n=1 is higher than for any other n>1. Our algorithm runs in time O(l4logl), where l is the length of S, through a dynamic programming approach which exploits the assumed convexity of the a posteriori probability for n. Our test can be used in the analysis of long alignments between pairs of genomic sequences in a number of ways. For example, functional regions in genome sequences exhibit much lower mutation rates than non-functional regions. Because our test provides means for determining variations in the mutation rate, it may be used to distinguish functional regions from non-functional ones. Another application is in determining whether two highly similar, thus evolutionarily related, genome segments are the result of a single copy event or of a complex series of copy events. This is particularly an issue in evolutionary studies of genome regions rich with repeat segments (especially tandemly repeated segments).
Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure

PubMed Central

2013-01-01

Background Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved. Results We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans. Conclusions The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution. PMID:24025428
Unrelated sequences at the 5' end of mouse LINE-1 repeated elements define two distinct subfamilies.

PubMed Central

Wincker, P; Jubier-Maurin, V; Roizès, G

1987-01-01

Some full length members of the mouse long interspersed repeated DNA family L1Md have been shown to be associated at their 5' end with a variable number of tandem repetitions, the A repeats, that have been suggested to be transcription controlling elements. We report that the other type of repeat, named F, found at the 5' end of a few L1 elements is also an integral part of full length L1 copies. Sequencing shows that the F repeats are GC rich, and organized in tandem. The L1 copies associated with either A or F repeats can be correlated with two different subsets of L1 sequences distinguished by a series of variant nucleotides specific to each and by unassociated but frequent restriction sites. These findings suggest that sequence replacement has occurred at least once in 5' of L1Md, and is related to the generation of specific subfamilies. Images PMID:3684566
Plant chromosomes from end to end: telomeres, heterochromatin and centromeres.

PubMed

Lamb, Jonathan C; Yu, Weichang; Han, Fangpu; Birchler, James A

2007-04-01

Recent evidence indicates that heterochromatin in plants is composed of heterogeneous sequences, which are usually composed of transposable elements or tandem repeat arrays. These arrays are associated with chromatin modifications that produce a closed configuration that limits transcription. Centromere sequences in plants are usually composed of tandem repeat arrays that are homogenized across the genome. Analysis of such arrays in closely related taxa suggests a rapid turnover of the repeat unit that is typical of a particular species. In addition, two lines of evidence for an epigenetic component of centromere specification have been reported, namely an example of a neocentromere formed over sequences without the typical repeat array and examples of centromere inactivation. Although the telomere repeat unit is quite prevalent in the plant kingdom, unusual repeats have been found in some families. Recently, it was demonstrated that the introduction of telomere sequences into plants cells causes truncation of the chromosomes, and that this technique can be used to produce artificial chromosome platforms.
SSRscanner: a program for reporting distribution and exact location of simple sequence repeats.

PubMed

Anwar, Tamanna; Khan, Asad U

2006-02-20

Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence. This program has been written in PERL and is freely available for non-commercial users by request from the authors. Please contact the authors by E-mail: huzzi99@hotmail.com.
Sequence analysis of the complete genome of Trichoplusia ni single nucleopolyhedrovirus and the identification of a baculoviral photolyase gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Willis, Leslie G.; Siepp, Robyn; Stewart, Taryn M.

2005-08-01

The genome of the Trichoplusia ni single nucleopolyhedrovirus (TnSNPV), a group II NPV which infects the cabbage looper (T. ni), has been completely sequenced and analyzed. The TnSNPV DNA genome consists of 134,394 bp and has an overall G + C content of 39%. Gene analysis predicted 144 open reading frames (ORFs) of 150 nucleotides or greater that showed minimal overlap. Comparisons with previously sequenced baculoviruses indicate that 119 TnSNPV ORFs were homologues of previously reported viral gene sequences. Ninety-four TnSNPV ORFs returned an Autographa californica multiple NPV (AcMNPV) homologue while 25 ORFs returned poor or no sequence matches withmore » the current databases. A putative photolyase gene was also identified that had highest amino acid identity to the photolyase genes of Chrysodeixis chalcites NPV (ChchNPV) (47%) and Danio rerio (zebrafish) (40%). In addition unlike all other baculoviruses no obvious homologous repeat (hr) sequences were identified. Comparison of the TnSNPV and AcMNPV genomes provides a unique opportunity to examine two baculoviruses that are highly virulent for a common insect host (T. ni) yet belong to diverse baculovirus taxonomic groups and possess distinct biological features. In vitro fusion assays demonstrated that the TnSNPV F protein induces membrane fusion and syncytia formation and were compared to syncytia formed by AcMNPV GP64.« less
Simultaneous Myocardial Strain and Dark-Blood Perfusion Imaging Using a Displacement-Encoded MRI Pulse Sequence

PubMed Central

Le, Yuan; Stein, Ashley; Berry, Colin; Kellman, Peter; Bennett, Eric E.; Taylor, Joni; Lucas, Katherine; Kopace, Rael; Chefd’Hotel, Christophe; Lorenz, Christine H.; Croisille, Pierre; Wen, Han

2010-01-01

The purpose of this study is to develop and evaluate a displacement-encoded pulse sequence for simultaneous perfusion and strain imaging. Displacement-encoded images in 2–3 myocardial slices were repeatedly acquired using a single shot pulse sequence for 3 to 4 minutes, which covers a bolus infusion of Gd. The magnitudes of the images were T1 weighted and provided quantitative measures of perfusion, while the phase maps yielded strain measurements. In an acute coronary occlusion swine protocol (n=9), segmental perfusion measurements were validated against microsphere reference standard with a linear regression (slope 0.986, R2 = 0.765, Bland-Altman standard deviation = 0.15 ml/min/g). In a group of ST-elevation myocardial infarction(STEMI) patients (n=11), the scan success rate was 76%. Short-term contrast washout rate and perfusion are highly correlated (R2=0.72), and the pixel-wise relationship between circumferential strain and perfusion was better described with a sigmoidal Hill curve than linear functions. This study demonstrates the feasibility of measuring strain and perfusion from a single set of images. PMID:20544714
A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

PubMed Central

2010-01-01

Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
A TALE-inspired computational screen for proteins that contain approximate tandem repeats.

PubMed

Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias

2017-01-01

TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.
A TALE-inspired computational screen for proteins that contain approximate tandem repeats

PubMed Central

Krwawicz, Joanna

2017-01-01

TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832

Dfam: a database of repetitive DNA based on profile hidden Markov models.

PubMed

Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

2013-01-01

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
Complete sequence of Tvv1, a family of Ty 1 copia-like retrotransposons of Vitis vinifera L., reconstituted by chromosome walking.

PubMed

Pelsy, F.; Merdinoglu, D.

2002-09-01

A chromosome-walking strategy was used to sequence and characterize retrotransposons in the grapevine genome. The reconstitution of a family of retroelements, named Tvv1, was achieved by six successive steps. These elements share a single, highly conserved open reading frame 4,153 nucleotides-long, putatively encoding the gag, pro, int, rt and rh proteins. Comparison of the Tvv1 open reading frame coding potential with those of drosophila copia and tobacco Tnt1, revealed that Tvv1 is closely related to Ty 1 copia-like retrotransposons. A highly variable untranslated leader region, upstream of the open reading frame, allowed us to differentiate Tvv1 variants, which represent a family of at least 28 copies, in varying sizes. This internal region is flanked by two long terminal repeats in direct orientation, sized between 149 and 157 bp. Among elements theoretically sized from 4,970 to 5,550 bp, we describe the full-length sequence of a reference element Tvv1-1, 5,343 nucleotides-long. The full-length sequence of Tvv1-1 compared to pea PDR1 shows a 53.3% identity. In addition, both elements contain long terminal repeats of nearly the same size in which the U5 region could be entirely absent. Therefore, we assume that Tvv1 and PDR1 could constitute a particular class of short LTRs retroelements.
Distant sequences determine 5′ end formation of cox3 transcripts in Arabidopsis thaliana ecotype C24

PubMed Central

Forner, Joachim; Weber, Bärbel; Wiethölter, Caterina; Meyer, Rhonda C.; Binder, Stefan

2005-01-01

The genomic environments and the transcripts of the mitochondrial cox3 gene are investigated in three Arabidopsis thaliana ecotypes. While the proximate 5′ sequences up to nucleotide position −584, the coding regions and the 3′ flanking regions are identical in Columbia (Col), C24 and Landsberg erecta (Ler), genomic variation is detected in regions further upstream. In the mitochondrial DNA of Col, a 1790 bp fragment flanked by a nonanucleotide direct repeat is present beyond position −584 with respect to the ATG. While in Ler only part of this insertion is conserved, this sequence is completely absent in C24, except for a single copy of the nonanucleotide direct repeat. Northern hybridization reveals identical major transcripts in the three ecotypes, but identifies an additional abundant 60 nt larger mRNA species in C24. The extremities of the most abundant mRNA species are identical in the three ecotypes. In C24, an extra major 5′ end is abundant. This terminus and the other major 5′ ends are located in identical sequence regions. Inspection of Atcox3 transcripts in C24/Col hybrids revealed a female inheritance of the mRNA species with the extra 5′ terminus. Thus, a mitochondrially encoded factor determines the generation of an extra 5′ mRNA end. PMID:16107557
Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing.

PubMed

Hribová, Eva; Neumann, Pavel; Matsumoto, Takashi; Roux, Nicolas; Macas, Jirí; Dolezel, Jaroslav

2010-09-16

Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection.
Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing

PubMed Central

2010-01-01

Background Bananas and plantains (Musa spp.) are grown in more than a hundred tropical and subtropical countries and provide staple food for hundreds of millions of people. They are seed-sterile crops propagated clonally and this makes them vulnerable to a rapid spread of devastating diseases and at the same time hampers breeding improved cultivars. Although the socio-economic importance of bananas and plantains cannot be overestimated, they remain outside the focus of major research programs. This slows down the study of nuclear genome and the development of molecular tools to facilitate banana improvement. Results In this work, we report on the first thorough characterization of the repeat component of the banana (M. acuminata cv. 'Calcutta 4') genome. Analysis of almost 100 Mb of sequence data (0.15× genome coverage) permitted partial sequence reconstruction and characterization of repetitive DNA, making up about 30% of the genome. The results showed that the banana repeats are predominantly made of various types of Ty1/copia and Ty3/gypsy retroelements representing 16 and 7% of the genome respectively. On the other hand, DNA transposons were found to be rare. In addition to new families of transposable elements, two new satellite repeats were discovered and found useful as cytogenetic markers. To help in banana sequence annotation, a specific Musa repeat database was created, and its utility was demonstrated by analyzing the repeat composition of 62 genomic BAC clones. Conclusion A low-depth 454 sequencing of banana nuclear genome provided the largest amount of DNA sequence data available until now for Musa and permitted reconstruction of most of the major types of DNA repeats. The information obtained in this study improves the knowledge of the long-range organization of banana chromosomes, and provides sequence resources needed for repeat masking and annotation during the Musa genome sequencing project. It also provides sequence data for isolation of DNA markers to be used in genetic diversity studies and in marker-assisted selection. PMID:20846365
The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands

PubMed Central

de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique

2006-01-01

Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ considerably in gene content. Conclusion Our results underscore the remarkable plasticity of the chlorophycean chloroplast genome. Owing to this plasticity, only a sketchy portrait could be drawn for the chloroplast genome of the last common ancestor of Scenedesmus and Chlamydomonas. PMID:16638149
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

PubMed Central

2012-01-01

Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies. PMID:23167289
Completion of the mitochondrial genome sequence of onion (Allium cepa L.) containing the CMS-S male-sterile cytoplasm and identification of an independent event of the ccmF N gene split.

PubMed

Kim, Bongju; Kim, Kyunghee; Yang, Tae-Jin; Kim, Sunggil

2016-11-01

Cytoplasmic male-sterility (CMS) conferred by the CMS-S cytoplasm has been most commonly used for onion (Allium cepa L.) F 1 hybrid seed production. We first report the complete mitochondrial genome sequence containing CMS-S cytoplasm in this study. Initially, seven contigs were de novo assembled from 150-bp paired-end raw reads produced from the total genomic DNA using the Illumina NextSeq500 platform. These contigs were connected into a single circular genome consisting of 316,363 bp (GenBank accession: KU318712) by PCR amplification. Although all 24 core protein-coding genes were present, no ribosomal protein-coding genes, except rps12, were identified in the onion mitochondrial genome. Unusual trans-splicing of the cox2 gene was verified, and the cox1 gene was identified as part of the chimeric orf725 gene, which is a candidate gene responsible for inducing CMS. In addition to orf725, two small chimeric genes were identified, but no transcripts were detected for these two open reading frames. Thirteen chloroplast-derived sequences, with sizes of 126-13,986 bp, were identified in the intergenic regions. Almost 10 % of the onion mitochondrial genome was composed of repeat sequences. The vast majority of repeats were short repeats of <100 base pairs. Interestingly, the gene encoding ccmF N was split into two genes. The ccmF N gene split is first identified outside the Brassicaceae family. The breakpoint in the onion ccmF N gene was different from that of other Brassicaceae species. This split of the ccmF N gene was also present in 30 other Allium species. The complete onion mitochondrial genome sequence reported in this study would be fundamental information for elucidation of onion CMS evolution.
Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus.

PubMed

Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji

2015-02-14

Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.
Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.

PubMed

Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong

2014-05-01

We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.
Sequencing and de novo assembly of visceral mass transcriptome of the critically endangered land snail Satsuma myomphala: Annotation and SSR discovery.

PubMed

Kang, Se Won; Patnaik, Bharat Bhusan; Hwang, Hee-Ju; Park, So Young; Chung, Jong Min; Song, Dae Kwon; Patnaik, Hongray Howrelia; Lee, Jae Bong; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Park, Seung-Hwan; Park, Young-Su; Han, Yeon Soo; Lee, Jun Sang; Lee, Yong Seok

2017-03-01

Satsuma myomphala is critically endangered through loss of natural habitats, predation by natural enemies, and indiscriminate collection. It is a protected species in Korea but lacks genomic resources for an understanding of varied functional processes attributable to evolutionary success under natural habitats. For assessing the genetic information of S. myomphala, we performed for the first time, de novo transcriptome sequencing and functional annotation of expressed sequences using Illumina Next-Generation Sequencing (NGS) platform and bioinformatics analysis. We identified 103,774 unigenes of which 37,959, 12,890, and 17,699 were annotated in the PANM (Protostome DB), Unigene, and COG (Clusters of Orthologous Groups) databases, respectively. In addition, 14,451 unigenes were predicted under Gene Ontology functional categories, with 4581 assigned to a single category. Furthermore, 3369 sequences with 646 having Enzyme Commission (EC) numbers were mapped to 122 pathways in the Kyoto Encyclopedia of Genes and Genomes Pathway database. The prominent protein domains included the Zinc finger (C2H2-like), Reverse Transcriptase, Thioredoxin-like fold, and RNA recognition motif domain. Many unigenes with homology to immunity, defense, and reproduction-related genes were screened in the transcriptome. We also detected 3120 putative simple sequence repeats (SSRs) encompassing dinucleotide to hexanucleotide repeat motifs from >1kb unigene sequences. A list of PCR primers of SSR loci have been identified to study the genetic polymorphisms. The transcriptome data represents a valuable resource for further investigations on the species genome structure and biology. The unigenes information and microsatellites would provide an indispensable tool for conservation of the species in natural and adaptive environments. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Molecular and bioinformatic analysis of the FB-NOF transposable element.

PubMed

Badal, Martí; Portela, Anna; Xamena, Noel; Cabré, Oriol

2006-04-12

The Drosophila melanogaster transposable element FB-NOF is known to play a role in genome plasticity through the generation of all sort of genomic rearrangements. Moreover, several insertional mutants due to FB mobilizations have been reported. Its structure and sequence, however, have been poorly studied mainly as a consequence of the long, complex and repetitive sequence of FB inverted repeats. This repetitive region is composed of several 154 bp blocks, each with five almost identical repeats. In this paper, we report the sequencing process of 2 kb long FB inverted repeats of a complete FB-NOF element, with high precision and reliability. This achievement has been possible using a new map of the FB repetitive region, which identifies unambiguously each repeat with new features that can be used as landmarks. With this new vision of the element, a list of FB-NOF in the D. melanogaster genomic clones has been done, improving previous works that used only bioinformatic algorithms. The availability of many FB and FB-NOF sequences allowed an analysis of the FB insertion sequences that showed no sequence specificity, but a preference for A/T rich sequences. The position of NOF into FB is also studied, revealing that it is always located after a second repeat in a random block. With the results of this analysis, we propose a model of transposition in which NOF jumps from FB to FB, using an unidentified transposase enzyme that should specifically recognize the second repeat end of the FB blocks.
The repetitive landscape of the chicken genome.

PubMed

Wicker, Thomas; Robertson, Jon S; Schulze, Stefan R; Feltus, F Alex; Magrini, Vincent; Morrison, Jason A; Mardis, Elaine R; Wilson, Richard K; Peterson, Daniel G; Paterson, Andrew H; Ivarie, Robert

2005-01-01

Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, and low-copy fractions of the chicken genome. Sequencing high-copy DNA of chicken to about 2.7 x coverage of its estimated sequence complexity led to the initial identification of several new repeat families, which were then used for a survey of the newly released first draft of the complete chicken genome. The analysis provided insight into the diversity and biology of known repeat structures such as CR1 and CNM, for which only limited sequence data had previously been available. Cot sequence data also resulted in the identification of four novel repeats (Birddawg, Hitchcock, Kronos, and Soprano), two new subfamilies of CR1 repeats, and many elements absent from the chicken genome assembly. Multiple autonomous elements were found for a novel Mariner-like transposon, Galluhop, in addition to nonautonomous deletion derivatives. Phylogenetic analysis of the high-copy repeats CR1, Galluhop, and Birddawg provided insight into two distinct genome dispersion strategies. This study also exemplifies the power of the CBCS method to create representative databases for the repetitive fractions of genomes for which only limited sequence data is available.
The repetitive landscape of the chicken genome

PubMed Central

Wicker, Thomas; Robertson, Jon S.; Schulze, Stefan R.; Feltus, F. Alex; Magrini, Vincent; Morrison, Jason A.; Mardis, Elaine R.; Wilson, Richard K.; Peterson, Daniel G.; Paterson, Andrew H.; Ivarie, Robert

2005-01-01

Cot-based cloning and sequencing (CBCS) is a powerful tool for isolating and characterizing the various repetitive components of any genome, combining the established principles of DNA reassociation kinetics with high-throughput sequencing. CBCS was used to generate sequence libraries representing the high, middle, and low-copy fractions of the chicken genome. Sequencing high-copy DNA of chicken to about 2.7× coverage of its estimated sequence complexity led to the initial identification of several new repeat families, which were then used for a survey of the newly released first draft of the complete chicken genome. The analysis provided insight into the diversity and biology of known repeat structures such as CR1 and CNM, for which only limited sequence data had previously been available. Cot sequence data also resulted in the identification of four novel repeats (Birddawg, Hitchcock, Kronos, and Soprano), two new subfamilies of CR1 repeats, and many elements absent from the chicken genome assembly. Multiple autonomous elements were found for a novel Mariner-like transposon, Galluhop, in addition to nonautonomous deletion derivatives. Phylogenetic analysis of the high-copy repeats CR1, Galluhop, and Birddawg provided insight into two distinct genome dispersion strategies. This study also exemplifies the power of the CBCS method to create representative databases for the repetitive fractions of genomes for which only limited sequence data is available. PMID:15256510
Museum genomics: low-cost and high-accuracy genetic data from historical specimens.

PubMed

Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C

2011-11-01

Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.
Effects of questions' repetition and variation on the efficiency of the guilty knowledge test: a reexamination.

PubMed

Ben-Shakhar, Gershon; Elaad, Eitan

2002-10-01

The effect of question repetition and variation on the efficiency of the Guilty Knowledge Test (GKT), based on electrodermal and respiration measures, was examined in a between-subjects experiment with 3 conditions. Each participant was presented with a sequence of 12 biographical questions. In Condition 1, a single question was repeated 12 times; in Condition 4, each of 4 different questions was repeated 3 times; and in Condition 12, 12 different questions were used. A monotonic relationship between the number of different questions used and detection efficiency was observed only with the electrodermal measure (the areas under the receiver operating characteristic curves, obtained with this measure in Conditions 1, 4, and 12 were .68, .81, and .99, respectively). These results demonstrate that a GKT based on multiple questions is superior to the use of many repetitions of a single or a few questions, and it can reach an almost perfect detection efficiency.
Genetic Variation and Association Mapping of Seed-Related Traits in Cultivated Peanut (Arachis hypogaea L.) Using Single-Locus Simple Sequence Repeat Markers.

PubMed

Zhao, Jiaojiao; Huang, Li; Ren, Xiaoping; Pandey, Manish K; Wu, Bei; Chen, Yuning; Zhou, Xiaojing; Chen, Weigang; Xia, Youlin; Li, Zeqing; Luo, Huaiyong; Lei, Yong; Varshney, Rajeev K; Liao, Boshou; Jiang, Huifang

2017-01-01

Cultivated peanut ( Arachis hypogaea L.) is an allotetraploid (AABB, 2 n = 4 x = 40), valued for its edible oil and digestible protein. Seed size and weight are important agronomical traits significantly influence the yield and nutritional composition of peanut. However, the genetic basis of seed-related traits remains ambiguous. Association mapping is a powerful approach for quickly and efficiently exploring the genetic basis of important traits in plants. In this study, a total of 104 peanut accessions were used to identify molecular markers associated with seed-related traits using 554 single-locus simple sequence repeat (SSR) markers. Most of the accessions had no or weak relationship in the peanut panel. The linkage disequilibrium (LD) decayed with the genetic distance of 1cM at the genome level and the LD of B subgenome decayed faster than that of the A subgenome. Large phenotypic variation was observed for four seed-related traits in the association panel. Using mixed linear model with population structure and kinship, a total of 30 significant SSR markers were detected to be associated with four seed-related traits ( P < 1.81 × 10 -3 ) in different environments, which explained 11.22-32.30% of the phenotypic variation for each trait. The marker AHGA44686 was simultaneously and repeatedly associated with seed length and hundred-seed weight in multiple environments with large phenotypic variance (26.23 ∼ 32.30%). The favorable alleles of associated markers for each seed-related trait and the optimal combination of favorable alleles of associated markers were identified to significantly enhance trait performance, revealing a potential of utilization of these associated markers in peanut breeding program.
ATP hydrolysis provides functions that promote rejection of pairings between different copies of long repeated sequences

PubMed Central

Danilowicz, Claudia; Hermans, Laura; Coljee, Vincent; Prévost, Chantal

2017-01-01

Abstract During DNA recombination and repair, RecA family proteins must promote rapid joining of homologous DNA. Repeated sequences with >100 base pair lengths occupy more than 1% of bacterial genomes; however, commitment to strand exchange was believed to occur after testing ∼20–30 bp. If that were true, pairings between different copies of long repeated sequences would usually become irreversible. Our experiments reveal that in the presence of ATP hydrolysis even 75 bp sequence-matched strand exchange products remain quite reversible. Experiments also indicate that when ATP hydrolysis is present, flanking heterologous dsDNA regions increase the reversibility of sequence matched strand exchange products with lengths up to ∼75 bp. Results of molecular dynamics simulations provide insight into how ATP hydrolysis destabilizes strand exchange products. These results inspired a model that shows how pairings between long repeated sequences could be efficiently rejected even though most homologous pairings form irreversible products. PMID:28854739
Massively parallel sequencing of forensic STRs and SNPs using the Illumina® ForenSeq™ DNA Signature Prep Kit on the MiSeq FGx™ Forensic Genomics System.

PubMed

Guo, Fei; Yu, Jiao; Zhang, Lu; Li, Jun

2017-11-01

The ForenSeq™ DNA Signature Prep Kit (ForenSeq Kit) is designed to detect more than 200 forensically relevant markers in a single reaction on the MiSeq FGx™ Forensic Genomics System (MiSeq FGx System), including Amelogenin, 27 autosomal short tandem repeats (A-STRs), 7 X chromosomal STRs (X-STRs), 24 Y chromosomal STRs (Y-STRs) and 94 identity-informative single nucleotide polymorphisms (iSNPs) with the option to contain 22 phenotypic-informative SNPs (pSNPs) and 56 ancestry-informative SNPs (aSNPs). In this study, we evaluated the MiSeq FGx System on three major parts: methodological optimization (DNA extraction, sample quantification, library normalization, diluted libraries concentration, and sample-to-cell arrangement), massively parallel sequencing (MPS) performance (depth of coverage, sequence coverage ratio, and allele coverage ratio), and ForenSeq Kit characteristics (repeatability and concordance, sensitivity, mixture, stability and case-type samples). Results showed that quantitative polymerase chain reaction (qPCR)-based sample quantification and library normalization and the appropriate number of pooled libraries and concentration of diluted libraries provided a greater level of MPS performance and repeatability. Repeatable and concordant genotypes were obtained by the ForenSeq Kit. Full profiles were obtained from ≥100pg input DNA for STRs and ≥200pg for SNPs. A sample with ≥5% minor contributors was considered as a mixture by imbalanced allele coverage ratio distribution, and full profiles from minor contributors were easily detected between 9:1 and 1:9 mixtures with known reference profiles. The ForenSeq Kit tolerated considerable concentrations of inhibitors like ≤200μM hematin and ≤50μg/ml humic acid, and >56% STR profiles and >88% SNP profiles were obtained from ≥200-bp degraded samples. Also, it was adapted to case-type samples. As a whole, the ForenSeq Kit is a well-performed, robust, reliable, reproducible and highly informative assay, and it can fully meet requirements for human identification. Further, sensitive QC indicator and automated sample comparison function in the ForenSeq™ Universal Analysis Software are quite helpful, so that we can concentrate on questionable genotypes and avoid tedious and time-consuming labor to maximum the time spent in data analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
The Influence of Primary and Secondary DNA Structure in Deletion and Duplication between Direct Repeats in Escherichia Coli

PubMed Central

Trinh, T. Q.; Sinden, R. R.

1993-01-01

We describe a system to measure the frequency of both deletions and duplications between direct repeats. Short 17- and 18-bp palindromic and nonpalindromic DNA sequences were cloned into the EcoRI site within the chloramphenicol acetyltransferase gene of plasmids pBR325 and pJT7. This creates an insert between direct repeated EcoRI sites and results in a chloramphenicol-sensitive phenotype. Selection for chloramphenicol resistance was utilized to select chloramphenicol resistant revertants that included those with precise deletion of the insert from plasmid pBR325 and duplication of the insert in plasmid pJT7. The frequency of deletion or duplication varied more than 500-fold depending on the sequence of the short sequence inserted into the EcoRI site. For the nonpalindromic inserts, multiple internal direct repeats and the length of the direct repeats appear to influence the frequency of deletion. Certain palindromic DNA sequences with the potential to form DNA hairpin structures that might stabilize the misalignment of direct repeats had a high frequency of deletion. Other DNA sequences with the potential to form structures that might destabilize misalignment of direct repeats had a very low frequency of deletion. Duplication mutations occurred at the highest frequency when the DNA between the direct repeats contained no direct or inverted repeats. The presence of inverted repeats dramatically reduced the frequency of duplications. The results support the slippage-misalignment model, suggesting that misalignment occurring during DNA replication leads to deletion and duplication mutations. The results also support the idea that the formation of DNA secondary structures during DNA replication can facilitate and direct specific mutagenic events. PMID:8325478

Visual ModuleOrganizer: a graphical interface for the detection and comparative analysis of repeat DNA modules

PubMed Central

2014-01-01

Background DNA repeats, such as transposable elements, minisatellites and palindromic sequences, are abundant in sequences and have been shown to have significant and functional roles in the evolution of the host genomes. In a previous study, we introduced the concept of a repeat DNA module, a flexible motif present in at least two occurences in the sequences. This concept was embedded into ModuleOrganizer, a tool allowing the detection of repeat modules in a set of sequences. However, its implementation remains difficult for larger sequences. Results Here we present Visual ModuleOrganizer, a Java graphical interface that enables a new and optimized version of the ModuleOrganizer tool. To implement this version, it was recoded in C++ with compressed suffix tree data structures. This leads to less memory usage (at least 120-fold decrease in average) and decreases by at least four the computation time during the module detection process in large sequences. Visual ModuleOrganizer interface allows users to easily choose ModuleOrganizer parameters and to graphically display the results. Moreover, Visual ModuleOrganizer dynamically handles graphical results through four main parameters: gene annotations, overlapping modules with known annotations, location of the module in a minimal number of sequences, and the minimal length of the modules. As a case study, the analysis of FoldBack4 sequences clearly demonstrated that our tools can be extended to comparative and evolutionary analyses of any repeat sequence elements in a set of genomic sequences. With the increasing number of sequences available in public databases, it is now possible to perform comparative analyses of repeated DNA modules in a graphic and friendly manner within a reasonable time period. Availability Visual ModuleOrganizer interface and the new version of the ModuleOrganizer tool are freely available at: http://lcb.cnrs-mrs.fr/spip.php?rubrique313. PMID:24678954
The complete chloroplast genome sequence of the medicinal plant Andrographis paniculata.

PubMed

Ding, Ping; Shao, Yanhua; Li, Qian; Gao, Junli; Zhang, Runjing; Lai, Xiaoping; Wang, Deqin; Zhang, Huiye

2016-07-01

The complete chloroplast genome of Andrographis paniculata, an important medicinal plant with great economic value, has been studied in this article. The genome size is 150,249 bp in length, with 38.3% GC content. A pair of inverted repeats (IRs, 25,300 bp) are separated by a large single copy region (LSC, 82,459 bp) and a small single-copy region (SSC, 17,190 bp). The chloroplast genome contains 114 unique genes, 80 protein-coding genes, 30 tRNA genes and 4 rRNA genes. In these genes, 15 genes contained 1 intron and 3 genes comprised of 2 introns.
Deterministic nonlinear phase gates induced by a single qubit

NASA Astrophysics Data System (ADS)

Park, Kimin; Marek, Petr; Filip, Radim

2018-05-01

We propose deterministic realizations of nonlinear phase gates by repeating a finite sequence of non-commuting Rabi interactions between a harmonic oscillator and only a single two-level ancillary qubit. We show explicitly that the key nonclassical features of the ideal cubic phase gate and the quartic phase gate are generated in the harmonic oscillator faithfully by our method. We numerically analyzed the performance of our scheme under realistic imperfections of the oscillator and the two-level system. The methodology is extended further to higher-order nonlinear phase gates. This theoretical proposal completes the set of operations required for continuous-variable quantum computation.
Phylogenomic relationship of feijoa (Acca sellowiana (O.Berg) Burret) with other Myrtaceae based on complete chloroplast genome sequences.

PubMed

Machado, Lilian de Oliveira; Vieira, Leila do Nascimento; Stefenon, Valdir Marcos; Oliveira Pedrosa, Fábio de; Souza, Emanuel Maltempi de; Guerra, Miguel Pedro; Nodari, Rubens Onofre

2017-04-01

Given their distribution, importance, and richness, Myrtaceae species comprise a model system for studying the evolution of tropical plant diversity. In addition, chloroplast (cp) genome sequencing is an efficient tool for phylogenetic relationship studies. Feijoa [Acca sellowiana (O. Berg) Burret; CN: pineapple-guava] is a Myrtaceae species that occurs naturally in southern Brazil and northern Uruguay. Feijoa is known for its exquisite perfume and flavorful fruits, pharmacological properties, ornamental value and increasing economic relevance. In the present work, we reported the complete cp genome of feijoa. The feijoa cp genome is a circular molecule of 159,370 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC 88,028 bp) and a Small Single Copy region (SSC 18,598 bp) separated by Inverted Repeat regions (IRs 26,372 bp). The genome structure, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. When compared to other cp genome sequences of Myrtaceae, feijoa showed closest relationship with pitanga (Eugenia uniflora L.). Furthermore, a comparison of pitanga synonymous (Ks) and nonsynonymous (Ka) substitution rates revealed extremely low values. Maximum Likelihood and Bayesian Inference analyses produced phylogenomic trees identical in topology. These trees supported monophyly of three Myrtoideae clades.
Genetic variation and evolutionary stability of the FMR1 CGG repeat in six closed human populations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichler, E.E.; Nelson, D.L.

1996-07-12

In an attempt to understand the allelic diversity and mutability of the human FMR1 CGG repeat, we have analyzed the AGG substructure of this locus within six genetically-closed populations (Mbuti pygmy, Baka pygmy, R. surui, Karitiana, Mayan, and Hutterite). Most alleles (61/92 or 66%) possessed two AGG interspersions occurring with a periodicity of one AGG every nine or ten CGG repeats, indicating that this pattern is highly conserved in all human populations. Significant differences in allele distribution were observed among the populations for rare variants possessing fewer or more AGG interruptions than the canonical FMR1 CGG repeat sequence. Comparisons ofmore » expected heterozygosity of the FMR1 CGG repeat locus with 30 other microsatellite loci, demonstrated remarkably similar levels of polymorphism within each population, suggesting that most FMR1 CGG repeat alleles mutate at rates indistinguishable from other microsatellite loci. A single allele (1 out of 92) was identified with a large uninterrupted tract of pure repeats (42 pure CGG triplets). Retrospective pedigree analysis indicated that this allele had been transmitted unstably. Although such alleles mutate rapidly and likely represent evolving premutations, our analysis suggests that in spite of the estimated frequency of their occurrence, these unstable alleles do not significantly alter the expected heterozygosity of the FMR1 CGG repeat in the human population. 45 refs., 1 fig., 2 tabs.« less
The Complete Chloroplast Genome of Wild Rice (Oryza minuta) and Its Comparison to Related Species.

PubMed

Asaf, Sajjad; Waqas, Muhammad; Khan, Abdul L; Khan, Muhammad A; Kang, Sang-Mo; Imran, Qari M; Shahzad, Raheem; Bilal, Saqib; Yun, Byung-Wook; Lee, In-Jung

2017-01-01

Oryza minuta , a tetraploid wild relative of cultivated rice (family Poaceae), possesses a BBCC genome and contains genes that confer resistance to bacterial blight (BB) and white-backed (WBPH) and brown (BPH) plant hoppers. Based on the importance of this wild species, this study aimed to understand the phylogenetic relationships of O. minuta with other Oryza species through an in-depth analysis of the composition and diversity of the chloroplast (cp) genome. The analysis revealed a cp genome size of 135,094 bp with a typical quadripartite structure and consisting of a pair of inverted repeats separated by small and large single copies, 139 representative genes, and 419 randomly distributed microsatellites. The genomic organization, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. Approximately 30 forward, 28 tandem and 20 palindromic repeats were detected in the O . minuta cp genome. Comparison of the complete O. minuta cp genome with another eleven Oryza species showed a high degree of sequence similarity and relatively high divergence of intergenic spacers. Phylogenetic analyses were conducted based on the complete genome sequence, 65 shared genes and matK gene showed same topologies and O. minuta forms a single clade with parental O. punctata . Thus, the complete O . minuta cp genome provides interesting insights and valuable information that can be used to identify related species and reconstruct its phylogeny.
Long interspersed repeated DNA (LINE) causes polymorphism at the rat insulin 1 locus.

PubMed

Lakshmikumaran, M S; D'Ambrosio, E; Laimins, L A; Lin, D T; Furano, A V

1985-09-01

The insulin 1, but not the insulin 2, locus is polymorphic (i.e., exhibits allelic variation) in rats. Restriction enzyme analysis and hybridization studies showed that the polymorphic region is 2.2 kilobases upstream of the insulin 1 coding region and is due to the presence or absence of an approximately 2.7-kilobase repeated DNA element. DNA sequence determination showed that this DNA element is a member of a long interspersed repeated DNA family (LINE) that is highly repeated (greater than 50,000 copies) and highly transcribed in the rat. Although the presence or absence of LINE sequences at the insulin 1 locus occurs in both the homozygous and heterozygous states, LINE-containing insulin 1 alleles are more prevalent in the rat population than are alleles without LINEs. Restriction enzyme analysis of the LINE-containing alleles indicated that at least two versions of the LINE sequence may be present at the insulin 1 locus in different rats. Either repeated transposition of LINE sequences or gene conversion between the resident insulin 1 LINE and other sequences in the genome are possible explanations for this.
C9orf72 hexanucleotide repeat expansion and Guam amyotrophic lateral sclerosis-Parkinsonism-dementia complex.

PubMed

Dombroski, Beth A; Galasko, Douglas R; Mata, Ignacio F; Zabetian, Cyrus P; Craig, Ulla-Katrina; Garruto, Ralph M; Oyanagi, Kiyomitsu; Schellenberg, Gerard D

2013-06-01

High-prevalence foci of amyotrophic lateral sclerosis (ALS) and parkinsonism-dementia complex (PDC) exist in Japanese on the Kii Peninsula of Japan and in the Chamorros of Guam. Clinical and neuropathologic similarities suggest that the disease in these 2 populations may be related. Recent findings showed that some of the Kii Peninsula ALS cases had pathogenic C9orf72 repeat expansions, a genotype that causes ALS in Western populations. To perform genotyping among Guam residents to determine if the C9orf72 expanded repeat allele contributes to ALS-PDC in this population and to evaluate LRRK2 for mutations in the same population. Case-control series from neurodegenerative disease research programs on Guam that screened residents for ALS, PDC, and dementia. Study participants included 24 with ALS and 22 with PDC and 43 older control subjects with normal cognition ascertained between 1956 and 2006. All but one participant were Chamorro, the indigenous people of Guam. A single individual of white race/ethnicity with ALS was ascertained on Guam during the study. Participants were screened for C9orf72 hexanucleotide repeat length. Participants with repeat numbers in great excess of 30 were considered to have pathogenic repeat expansions. LRRK2 was screened for point mutations by DNA sequencing. We found a single individual with an expanded pathogenic hexanucleotide repeat. This individual of white race/ethnicity with ALS was living on Guam at the time of ascertainment but had been born in the United States. All Chamorro participants with ALS and PDC and control subjects had normal repeats, ranging from 2 to 17 copies. No pathogenic LRRK2 mutations were found. Unlike participants with ALS from the Kii Peninsula, C9orf72 expansions do not cause ALS-PDC in Chamorros. Likewise, LRRK2 mutations do not cause Guam ALS-PDC.
Genome-wide characterization and selection of expressed sequence tag simple sequence repeat primers for optimized marker distribution and reliability in peach

USDA-ARS?s Scientific Manuscript database

Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...
Microsatellite analysis in the genome of Acanthaceae: An in silico approach.

PubMed

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.
Complete mitochondrial genome of the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae).

PubMed

Kim, Min Jee; Im, Hyun Hwak; Lee, Kwang Youll; Han, Yeon Soo; Kim, Iksoo

2014-06-01

Abstract The complete nucleotide sequences of the mitochondrial genome from the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae), was determined. The 20,319-bp long circular genome is the longest among completely sequenced Coleoptera. As is typical in animals, the P. brevitarsis genome consisted of two ribosomal RNAs, 22 transfer RNAs, 13 protein-coding genes and one A + T-rich region. Although the size of the coding genes was typical, the non-coding A + T-rich region was 5654 bp, which is the longest in insects. The extraordinary length of this region was composed of 28,117-bp tandem repeats and 782-bp tandem repeats. These repeat sequences were encompassed by three non-repeat sequences constituting 1804 bp.
Improved maize reference genome with single-molecule technologies.

PubMed

Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen

2017-06-22

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements.

PubMed

Parson, Walther; Ballard, David; Budowle, Bruce; Butler, John M; Gettings, Katherine B; Gill, Peter; Gusmão, Leonor; Hares, Douglas R; Irwin, Jodi A; King, Jonathan L; Knijff, Peter de; Morling, Niels; Prinz, Mechthild; Schneider, Peter M; Neste, Christophe Van; Willuweit, Sascha; Phillips, Christopher

2016-05-01

The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that provide a precise description of the repeat allele structure of a STR marker and variants that may reside in the flanking areas of the repeat region. When a STR contains a complex arrangement of repeat motifs, the level of genetic polymorphism revealed by the sequence data can increase substantially. As repeat structures can be complex and include substitutions, insertions, deletions, variable tandem repeat arrangements of multiple nucleotide motifs, and flanking region SNPs, established capillary electrophoresis (CE) allele descriptions must be supplemented by a new system of STR allele nomenclature, which retains backward compatibility with the CE data that currently populate national DNA databases and that will continue to be produced for the coming years. Thus, there is a pressing need to produce a standardized framework for describing complex sequences that enable comparison with currently used repeat allele nomenclature derived from conventional CE systems. It is important to discern three levels of information in hierarchical order (i) the sequence, (ii) the alignment, and (iii) the nomenclature of STR sequence data. We propose a sequence (text) string format the minimal requirement of data storage that laboratories should follow when adopting MPS of STRs. We further discuss the variant annotation and sequence comparison framework necessary to maintain compatibility among established and future data. This system must be easy to use and interpret by the DNA specialist, based on a universally accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need to follow updated rules and be generated by expert systems that translate MPS sequences to match CE conventions in order to guarantee compatibility between the different generations of STR data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
SSRscanner: a program for reporting distribution and exact location of simple sequence repeats

PubMed Central

Anwar, Tamanna; Khan, Asad U

2006-01-01

Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence. Availability This program has been written in PERL and is freely available for non-commercial users by request from the authors. Please contact the authors by E-mail: huzzi99@hotmail.com PMID:17597863
Structure and stability of the ankyrin domain of the Drosophila Notch receptor.

PubMed

Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug

2003-11-01

The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.
Molecular characterization and in situ mRNA localization of the neural recognition molecule J1-160/180: a modular structure similar to tenascin

PubMed Central

1993-01-01

The oligodendrocyte-derived extracellular matrix glycoprotein J1- 160/180 is a recognition molecule expressed exclusively in the central nervous system. J1-160/180 has been shown to be adhesive for astrocytes and repellent towards neurons and growth cones. We report here the complete nucleotide sequence of J1-160/180 in the rat. The predicted amino acid sequence showed a structural architecture very similar to tenascin: a cysteine-rich amino terminal region is followed by 4.5 epidermal growth factor-like repeats, 9 fibronectin type III homologous repeats and a domain homologous to fibrinogen. Sequence comparison analysis revealed highest homology of rat J1-160/180 to mouse tenascin and chicken restrictin with a similarity of 66% and 85%, respectively. The J1-160/180-coding mRNA is derived from a single copy gene. Using the polymerase chain reaction we could show that two J1-160/180 isoforms are generated by alternative splicing of the sixth fibronectin type III homologous repeat. Localization of J1-160/180 mRNA by in situ hybridization in the cerebellum, hippocampus and olfactory bulb confirmed the expression of J1-160/180 by oligodendrocytes with a peak of transcription at 7-14 d after birth, indicating a functional role during myelination. In addition, J1-160/180-specific RNA was found in a small subset of neurons in all three structures of the CNS analyzed. These neurons continue to express J1-160/180 in the adult. PMID:7679676
Integration of Lupinus angustifolius L. (narrow-leafed lupin) genome maps and comparative mapping within legumes.

PubMed

Wyrwa, Katarzyna; Książkiewicz, Michał; Szczepaniak, Anna; Susek, Karolina; Podkowiński, Jan; Naganowska, Barbara

2016-09-01

Narrow-leafed lupin (Lupinus angustifolius L.) has recently been considered a reference genome for the Lupinus genus. In the present work, genetic and cytogenetic maps of L. angustifolius were supplemented with 30 new molecular markers representing lupin genome regions, harboring genes involved in nitrogen fixation during the symbiotic interaction of legumes and soil bacteria (Rhizobiaceae). Our studies resulted in the precise localization of bacterial artificial chromosomes (BACs) carrying sequence variants for early nodulin 40, nodulin 26, nodulin 45, aspartate aminotransferase P2, asparagine synthetase, cytosolic glutamine synthetase, and phosphoenolpyruvate carboxylase. Together with previously mapped chromosomes, the integrated L. angustifolius map encompasses 73 chromosome markers, including 5S ribosomal DNA (rDNA) and 45S rDNA, and anchors 20 L. angustifolius linkage groups to corresponding chromosomes. Chromosomal identification using BAC fluorescence in situ hybridization identified two BAC clones as narrow-leafed lupin centromere-specific markers, which served as templates for preliminary studies of centromere composition within the genus. Bioinformatic analysis of these two BACs revealed that centromeric/pericentromeric regions of narrow-leafed lupin chromosomes consisted of simple sequence repeats ordered into tandem repeats containing the trinucleotide and pentanucleotide simple sequence repeats AGG and GATAC, structured into long arrays. Moreover, cross-genus microsynteny analysis revealed syntenic patterns of 31 single-locus BAC clones among several legume species. The gene and chromosome level findings provide evidence of ancient duplication events that must have occurred very early in the divergence of papilionoid lineages. This work provides a strong foundation for future comparative mapping among legumes and may facilitate understanding of mechanisms involved in shaping legume chromosomes.
A direct repeat of E-box-like elements is required for cell-autonomous circadian rhythm of clock genes

PubMed Central

Nakahata, Yasukazu; Yoshida, Mayumi; Takano, Atsuko; Soma, Haruhiko; Yamamoto, Takuro; Yasuda, Akio; Nakatsu, Toru; Takumi, Toru

2008-01-01

Background The circadian expression of the mammalian clock genes is based on transcriptional feedback loops. Two basic helix-loop-helix (bHLH) PAS (for Period-Arnt-Sim) domain-containing transcriptional activators, CLOCK and BMAL1, are known to regulate gene expression by interacting with a promoter element termed the E-box (CACGTG). The non-canonical E-boxes or E-box-like sequences have also been reported to be necessary for circadian oscillation. Results We report a new cis-element required for cell-autonomous circadian transcription of clock genes. This new element consists of a canonical E-box or a non-canonical E-box and an E-box-like sequence in tandem with the latter with a short interval, 6 base pairs, between them. We demonstrate that both E-box or E-box-like sequences are needed to generate cell-autonomous oscillation. We also verify that the spacing nucleotides with constant length between these 2 E-elements are crucial for robust oscillation. Furthermore, by in silico analysis we conclude that several clock and clock-controlled genes possess a direct repeat of the E-box-like elements in their promoter region. Conclusion We propose a novel possible mechanism regulated by double E-box-like elements, not to a single E-box, for circadian transcriptional oscillation. The direct repeat of the E-box-like elements identified in this study is the minimal required element for the generation of cell-autonomous transcriptional oscillation of clock and clock-controlled genes. PMID:18177499
Whole-genome multiple displacement amplification from single cells.

PubMed

Spits, Claudia; Le Caignec, Cédric; De Rycke, Martine; Van Haute, Lindsey; Van Steirteghem, André; Liebaers, Inge; Sermon, Karen

2006-01-01

Multiple displacement amplification (MDA) is a recently described method of whole-genome amplification (WGA) that has proven efficient in the amplification of small amounts of DNA, including DNA from single cells. Compared with PCR-based WGA methods, MDA generates DNA with a higher molecular weight and shows better genome coverage. This protocol was developed for preimplantation genetic diagnosis, and details a method for performing single-cell MDA using the phi29 DNA polymerase. It can also be useful for the amplification of other minute quantities of DNA, such as from forensic material or microdissected tissue. The protocol includes the collection and lysis of single cells, and all materials and steps involved in the MDA reaction. The whole procedure takes 3 h and generates 1-2 microg of DNA from a single cell, which is suitable for multiple downstream applications, such as sequencing, short tandem repeat analysis or array comparative genomic hybridization.
An atypical CNG channel activated by a single cGMP molecule controls sperm chemotaxis.

PubMed

Bönigk, Wolfgang; Loogen, Astrid; Seifert, Reinhard; Kashikar, Nachiket; Klemm, Clementine; Krause, Eberhard; Hagen, Volker; Kremmer, Elisabeth; Strünker, Timo; Kaupp, U Benjamin

2009-10-27

Sperm of the sea urchin Arbacia punctulata can respond to a single molecule of chemoattractant released by an egg. The mechanism underlying this extreme sensitivity is unknown. Crucial signaling events in the response of A. punctulata sperm to chemoattractant include the rapid synthesis of the intracellular messenger guanosine 3',5'-monophosphate (cGMP) and the ensuing membrane hyperpolarization that results from the opening of potassium-selective cyclic nucleotide-gated (CNGK) channels. Here, we use calibrated photolysis of caged cGMP to show that approximately 45 cGMP molecules are generated during the response to a single molecule of chemoattractant. The CNGK channel can respond to such small cGMP changes because it is exquisitely sensitive to cGMP and activated in a noncooperative fashion. Like voltage-activated Ca(v) and Na(v) channels, the CNGK polypeptide consists of four homologous repeat sequences. Disabling each of the four cyclic nucleotide-binding sites through mutagenesis revealed that binding of a single cGMP molecule to repeat 3 is necessary and sufficient to activate the CNGK channel. Thus, CNGK has developed a mechanism of activation that is different from the activation of other CNG channels, which requires the cooperative binding of several ligands and operates in the micromolar rather than the nanomolar range.

Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae)

PubMed Central

Vatanparast, Mohammad; Shetty, Prateek; Chopra, Ratan; Doyle, Jeff J.; Sathyanarayana, N.; Egan, Ashley N.

2016-01-01

Winged bean, Psophocarpus tetragonolobus (L.) DC., is similar to soybean in yield and nutritional value but more viable in tropical conditions. Here, we strengthen genetic resources for this orphan crop by producing a de novo transcriptome assembly and annotation of two Sri Lankan accessions (denoted herein as CPP34 [PI 491423] and CPP37 [PI 639033]), developing simple sequence repeat (SSR) markers, and identifying single nucleotide polymorphisms (SNPs) between geographically separated genotypes. A combined assembly based on 804,757 reads from two accessions produced 16,115 contigs with an N50 of 889 bp, over 90% of which has significant sequence similarity to other legumes. Combining contigs with singletons produced 97,241 transcripts. We identified 12,956 SSRs, including 2,594 repeats for which primers were designed and 5,190 high-confidence SNPs between Sri Lankan and Nigerian genotypes. The transcriptomic data sets generated here provide new resources for gene discovery and marker development in this orphan crop, and will be vital for future plant breeding efforts. We also analyzed the soybean trypsin inhibitor (STI) gene family, important plant defense genes, in the context of related legumes and found evidence for radiation of the Kunitz trypsin inhibitor (KTI) gene family within winged bean. PMID:27356763
Unprecedented large inverted repeats at the replication terminus of circular bacterial chromosomes suggest a novel mode of chromosome rescue

PubMed Central

El Kafsi, Hela; Loux, Valentin; Mariadassou, Mahendra; Blin, Camille; Chiapello, Hélène; Abraham, Anne-Laure; Maguin, Emmanuelle; van de Guchte, Maarten

2017-01-01

The first Lactobacillus delbrueckii ssp. bulgaricus genome sequence revealed the presence of a very large inverted repeat (IR), a DNA sequence arrangement which thus far seemed inconceivable in a non-manipulated circular bacterial chromosome, at the replication terminus. This intriguing observation prompted us to investigate if similar IRs could be found in other bacteria. IRs with sizes varying from 38 to 76 kbp were found at the replication terminus of all 5 L. delbrueckii ssp. bulgaricus chromosomes analysed, but in none of 1373 other chromosomes. They represent the first naturally occurring very large IRs detected in circular bacterial genomes. A comparison of the L. bulgaricus replication terminus regions and the corresponding regions without IR in 5 L. delbrueckii ssp. lactis genomes leads us to propose a model for the formation and evolution of the IRs. The DNA sequence data are consistent with a novel model of chromosome rescue after premature replication termination or irreversible chromosome damage near the replication terminus, involving mechanisms analogous to those proposed in the formation of very large IRs in human cancer cells. We postulate that the L. delbrueckii ssp. bulgaricus-specific IRs in different strains derive from a single ancestral IR of at least 93 kbp. PMID:28281695
Repetition as the essence of life on this earth: music and genes.

PubMed

Ohno, S

1987-01-01

In prebiotic nucleic acid replication, templates appear to have been in short supply. A single round of tandem duplication of existing oligomers assured progressive extension of templates to the length adequate for encoding of polypeptide chains. Thus, the first set of coding sequences had to be repeats of base oligomers encoding polypeptide chains of various periodicities. On one hand, the readiness of these periodical polypeptide chains to assume alpha-helical and/or beta-sheet secondary structures contributed to the extremely rapid initial functional diversification of these polypeptide chains. It would be recalled that most, if not all, of the sugar-metabolizing enzymes had already achieved the inviolable functional competence before the division of prokaryotes from eukaryotes. On the other hand, a certain (dipeptidic?) of the peptidic periodicities was apparently chosen as the timekeeping unit by the biological clock. Musical compositions too apparently evolved originally as a timekeeping device. Accordingly, repetitiousness is evident in all musical compositions. Evolution of musical compositions from the early Baroque to the late Romantic parallels that of coding sequences from rather exact repeats of base oligomers to more complex modern coding sequences in which repetitious elements are less conspicuous and more varied. Inasmuch as the earth is governed by the hierarchy of periodicities (days, months and years), such reliance on periodicities is rather expected.
Structure, organization, and sequence of alpha satellite DNA from human chromosome 17: evidence for evolution by unequal crossing-over and an ancestral pentamer repeat shared with the human X chromosome.

PubMed

Waye, J S; Willard, H F

1986-09-01

The centromeric regions of all human chromosomes are characterized by distinct subsets of a diverse tandemly repeated DNA family, alpha satellite. On human chromosome 17, the predominant form of alpha satellite is a 2.7-kilobase-pair higher-order repeat unit consisting of 16 alphoid monomers. We present the complete nucleotide sequence of the 16-monomer repeat, which is present in 500 to 1,000 copies per chromosome 17, as well as that of a less abundant 15-monomer repeat, also from chromosome 17. These repeat units were approximately 98% identical in sequence, differing by the exclusion of precisely 1 monomer from the 15-monomer repeat. Homologous unequal crossing-over is suggested as a probable mechanism by which the different repeat lengths on chromosome 17 were generated, and the putative site of such a recombination event is identified. The monomer organization of the chromosome 17 higher-order repeat unit is based, in part, on tandemly repeated pentamers. A similar pentameric suborganization has been previously demonstrated for alpha satellite of the human X chromosome. Despite the organizational similarities, substantial sequence divergence distinguishes these subsets. Hybridization experiments indicate that the chromosome 17 and X subsets are more similar to each other than to the subsets found on several other human chromosomes. We suggest that the chromosome 17 and X alpha satellite subsets may be related components of a larger alphoid subfamily which have evolved from a common ancestral repeat into the contemporary chromosome-specific subsets.
Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats.

PubMed

Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe

2010-11-26

Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Nucleic acid detection methods

DOEpatents

Smith, C.L.; Yaar, R.; Szafranski, P.; Cantor, C.R.

1998-05-19

The invention relates to methods for rapidly determining the sequence and/or length a target sequence. The target sequence may be a series of known or unknown repeat sequences which are hybridized to an array of probes. The hybridized array is digested with a single-strand nuclease and free 3{prime}-hydroxyl groups extended with a nucleic acid polymerase. Nuclease cleaved heteroduplexes can be easily distinguish from nuclease uncleaved heteroduplexes by differential labeling. Probes and target can be differentially labeled with detectable labels. Matched target can be detected by cleaving resulting loops from the hybridized target and creating free 3-hydroxyl groups. These groups are recognized and extended by polymerases added into the reaction system which also adds or releases one label into solution. Analysis of the resulting products using either solid phase or solution. These methods can be used to detect characteristic nucleic acid sequences, to determine target sequence and to screen for genetic defects and disorders. Assays can be conducted on solid surfaces allowing for multiple reactions to be conducted in parallel and, if desired, automated. 18 figs.
TRedD—A database for tandem repeats over the edit distance

PubMed Central

Sokol, Dina; Atagun, Firat

2010-01-01

A ‘tandem repeat’ in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats are common in the genomes of both eukaryotic and prokaryotic organisms. They are significant markers for human identity testing, disease diagnosis, sequence homology and population studies. In this article, we describe a new database, TRedD, which contains the tandem repeats found in the human genome. The database is publicly available online, and the software for locating the repeats is also freely available. The definition of tandem repeats used by TRedD is a new and innovative definition based upon the concept of ‘evolutive tandem repeats’. In addition, we have developed a tool, called TandemGraph, to graphically depict the repeats occurring in a sequence. This tool can be coupled with any repeat finding software, and it should greatly facilitate analysis of results. Database URL: http://tandem.sci.brooklyn.cuny.edu/ PMID:20624712
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA.

PubMed

Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A

2016-10-07

Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA*

PubMed Central

Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A.

2016-01-01

Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. PMID:27563064
IN SITU DEMONSTRATION OF DNA HYBRIDIZING WITH CHROMOSOMAL AND NUCLEAR SAP RNA IN CHIRONOMUS TENTANS

PubMed Central

Lambert, B.; Wieslander, L.; Daneholt, B.; Egyházi, E.; Ringborg, U.

1972-01-01

Cytological hybridization combined with microdissection of Chironomus tentans salivary gland cells was used to locate DNA complementary to newly synthesized RNA from chromosomes and nuclear sap and from a single chromosomal puff, the Balbiani ring 2 (BR 2). Salivary glands were incubated with tritiated nucleosides. The labeled RNA was extracted from microdissected nuclei and hybridized to denatured squash preparations of salivary gland cells under conditions which primarily allow repeated sequences to interact. The bound RNA, resistant to ribonuclease treatment, was detected radioautographically. It was found that BR 2 RNA hybridizes specifically with the BR 2 region of chromosome IV. Nuclear sap RNA was fractionated into high and low molecular-weight RNA; the former hybridizes with the BR 2 region of chromosome IV, the latter in a diffuse distribution over the whole chromosome set. RNA from chromosome I hybridizes diffusely with all chromosomes. Nucleolar RNA hybridizes specifically with the nucleolar organizers, contained in chromosomes II and III. It is concluded that the BR 2 region of chromosome IV contains repeated DNA sequences and that nuclear sap contains BR 2 RNA. PMID:5025107
Comparative Analysis of the Complete Chloroplast Genome of Four Endangered Herbals of Notopterygium

PubMed Central

Yang, Jiao; Yue, Ming; Niu, Chuan; Ma, Xiong-Feng; Li, Zhong-Hu

2017-01-01

Notopterygium H. de Boissieu (Apiaceae) is an endangered perennial herb endemic to China. A good knowledge of phylogenetic evolution and population genomics is conducive to the establishment of effective management and conservation strategies of the genus Notopterygium. In this study, the complete chloroplast (cp) genomes of four Notopterygium species (N. incisum C. C. Ting ex H. T. Chang, N. oviforme R. H. Shan, N. franchetii H. de Boissieu and N. forrestii H. Wolff) were assembled and characterized using next-generation sequencing. We investigated the gene organization, order, size and repeat sequences of the cp genome and constructed the phylogenetic relationships of Notopterygium species based on the chloroplast DNA and nuclear internal transcribed spacer (ITS) sequences. Comparative analysis of plastid genome showed that the cp DNA are the standard double-stranded molecule, ranging from 157,462 bp (N. oviforme) to 159,607 bp (N. forrestii) in length. The circular DNA each contained a large single-copy (LSC) region, a small single-copy (SSC) region, and a pair of inverted repeats (IRs). The cp DNA of four species contained 85 protein-coding genes, 37 transfer RNA (tRNA) genes and 8 ribosomal RNA (rRNA) genes, respectively. We determined the marked conservation of gene content and sequence evolutionary rate in the cp genome of four Notopterygium species. Three genes (psaI, psbI and rpoA) were possibly under positive selection among the four sampled species. Phylogenetic analysis showed that four Notopterygium species formed a monophyletic clade with high bootstrap support. However, the inconsistent interspecific relationships with the genus Notopterygium were identified between the cp DNA and ITS markers. The incomplete lineage sorting, convergence evolution or hybridization, gene infiltration and different sampling strategies among species may have caused the incongruence between the nuclear and cp DNA relationships. The present results suggested that Notopterygium species may have experienced a complex evolutionary history and speciation process. PMID:28422071
ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae.

PubMed

Albornos, Lucía; Martín, Ignacio; Iglesias, Rebeca; Jiménez, Teresa; Labrador, Emilia; Dopico, Berta

2012-11-07

Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.
ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae

PubMed Central

2012-01-01

Background Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. Results ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. Conclusions We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found. PMID:23134664
A candidate gene for choanal atresia in alpaca.

PubMed

Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G

2010-03-01

Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.
A theory that may explain the Hayflick limit--a means to delete one copy of a repeating sequence during each cell cycle in certain human cells such as fibroblasts.

PubMed

Naveilhan, P; Baudet, C; Jabbour, W; Wion, D

1994-09-01

A model that may explain the limited division potential of certain cells such as human fibroblasts in culture is presented. The central postulate of this theory is that there exists, prior to certain key exons that code for materials needed for cell division, a unique sequence of specific repeating segments of DNA. One copy of such repeating segments is deleted during each cell cycle in cells that are not protected from such deletion through methylation of their cytosine residues. According to this theory, the means through which such repeated sequences are removed, one per cycle, is through the sequential action of enzymes that act much as bacterial restriction enzymes do--namely to produce scissions in both strands of DNA in areas that correspond to the DNA base sequence recognition specificities of such enzymes. After the first scission early in a replicative cycle, that enzyme becomes inhibited, but the cleavage of the first site exposes the closest site in the repetitive element to the action of a second restriction enzyme after which that enzyme also becomes inhibited. Then repair occurs, regenerating the original first site. Through this sequential activation and inhibition of two different restriction enzymes, only one copy of the repeating sequence is deleted during each cell cycle. In effect, the repeating sequence operates as a precise counter of the numbers of cell doubling that have occurred since the cells involved differentiated during development.
Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species.

PubMed

Kuipers, A G J; Kamstra, S A; de Jeu, M J; Visser, R G F

2002-01-01

Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragments with sizes varying from 68-127 bp, and constituted a larger HinfI repeat of approximately 400 bp. Southern hybridization showed a similar molecular organization of the tandem repeats in each of the Brazilian Alstroemeria species tested. None of the repeats hybridized with DNA from Chilean Alstroemeria species, which indicates that they are specific for the Brazilian species. In-situ localization studies revealed the tandem repeats to be localized in clusters on the chromosomes of A. inodora and A. psittacina: distal hybridization sites were found on chromosome arms 2PS, 6PL, 7PS, 7PL and 8PL, interstitial sites on chromosome arms 2PL, 3PL, 4PL and 5PL. The applicability of the tandem repeats for cytogenetic analysis of interspecific hybrids and their role in heterochromatin organization are discussed.
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

PubMed Central

Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

2012-01-01

The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

PubMed

Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

2015-05-01

Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.
Evolutionary genomics of miniature inverted-repeat transposable elements (MITEs) in Brassica.

PubMed

Nouroz, Faisal; Noreen, Shumaila; Heslop-Harrison, J S

2015-12-01

Miniature inverted-repeat transposable elements (MITEs) are truncated derivatives of autonomous DNA transposons, and are dispersed abundantly in most eukaryotic genomes. We aimed to characterize various MITEs families in Brassica in terms of their presence, sequence characteristics and evolutionary activity. Dot plot analyses involving comparison of homoeologous bacterial artificial chromosome (BAC) sequences allowed identification of 15 novel families of mobile MITEs. Of which, 5 were Stowaway-like with TA Target Site Duplications (TSDs), 4 Tourist-like with TAA/TTA TSDs, 5 Mutator-like with 9-10 bp TSDs and 1 novel MITE (BoXMITE1) flanked by 3 bp TSDs. Our data suggested that there are about 30,000 MITE-related sequences in Brassica rapa and B. oleracea genomes. In situ hybridization showed one abundant family was dispersed in the A-genome, while another was located near 45S rDNA sites. PCR analysis using primers flanking sequences of MITE elements detected MITE insertion polymorphisms between and within the three Brassica (AA, BB, CC) genomes, with many insertions being specific to single genomes and others showing evidence of more recent evolutionary insertions. Our BAC sequence comparison strategy enables identification of evolutionarily active MITEs with no prior knowledge of MITE sequences. The details of MITE families reported in Brassica enable their identification, characterization and annotation. Insertion polymorphisms of MITEs and their transposition activity indicated important mechanism of genome evolution and diversification. MITE families derived from known Mariner, Harbinger and Mutator DNA transposons were discovered, as well as some novel structures. The identification of Brassica MITEs will have broad applications in Brassica genomics, breeding, hybridization and phylogeny through their use as DNA markers.
The Danish STR sequence database: duplicate typing of 363 Danes with the ForenSeq™ DNA Signature Prep Kit.

PubMed

Hussing, C; Bytyci, R; Huber, C; Morling, N; Børsting, C

2018-05-24

Some STR loci have internal sequence variations, which are not revealed by the standard STR typing methods used in forensic genetics (PCR and fragment length analysis by capillary electrophoresis (CE)). Typing of STRs with next-generation sequencing (NGS) uncovers the sequence variation in the repeat region and in the flanking regions. In this study, 363 Danish individuals were typed for 56 STRs (26 autosomal STRs, 24 Y-STRs, and 6 X-STRs) using the ForenSeq™ DNA Signature Prep Kit to establish a Danish STR sequence database. Increased allelic diversity was observed in 34 STRs by the PCR-NGS assay. The largest increases were found in DYS389II and D12S391, where the numbers of sequenced alleles were around four times larger than the numbers of alleles determined by repeat length alone. Thirteen SNPs and one InDel were identified in the flanking regions of 12 STRs. Furthermore, 36 single positions and five longer stretches in the STR flanking regions were found to have dubious genotyping quality. The combined match probability of the 26 autosomal STRs was 10,000 times larger using the PCR-NGS assay than by using PCR-CE. The typical paternity indices for trios and duos were 500 and 100 times larger, respectively, than those obtained with PCR-CE. The assay also amplified 94 SNPs selected for human identification. Eleven of these loci were not in Hardy-Weinberg equilibrium in the Danish population, most likely because the minimum threshold for allele calling (30 reads) in the ForenSeq™ Universal Analysis Software was too low and frequent allele dropouts were not detected.

Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.)

USDA-ARS?s Scientific Manuscript database

Background: Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed S...
Development and characterization of simple sequence repeats for Bipolaris sokiniana and cross transferability to related species

USDA-ARS?s Scientific Manuscript database

Simple sequence repeats (SSR) markers were developed from a small insert genomic library for Bipolaris sorokiniana, a mitosporic fungal pathogen that causes spot blotch and root rot in switchgrass. About 59% of sequenced clones (n=384) harbored various SSR motifs. After eliminating the redundant seq...
Are the TTAGG and TTAGGG telomeric repeats phylogenetically conserved in aculeate Hymenoptera?

NASA Astrophysics Data System (ADS)

Menezes, Rodolpho S. T.; Bardella, Vanessa B.; Cabral-de-Mello, Diogo C.; Lucena, Daercio A. A.; Almeida, Eduardo A. B.

2017-10-01

Despite the (TTAGG)n telomeric repeat supposed being the ancestral DNA motif of telomeres in insects, it was repeatedly lost within some insect orders. Notably, parasitoid hymenopterans and the social wasp Metapolybia decorata (Gribodo) lack the (TTAGG)n sequence, but in other representatives of Hymenoptera, this motif was noticed, such as different ant species and the honeybee. These findings raise the question of whether the insect telomeric repeat is or not phylogenetically predominant in Hymenoptera. Thus, we evaluated the occurrence of both the (TTAGG)n sequence and the vertebrate telomere sequence (TTAGGG)n using dot-blotting hybridization in 25 aculeate species of Hymenoptera. Our results revealed the absence of (TTAGG)n sequence in all tested species, elevating the number of hymenopteran families lacking this telomeric sequence to 13 out of the 15 tested families so far. The (TTAGGG)n was not observed in any tested species. Based on our data and compiled information, we suggest that the (TTAGG)n sequence was putatively lost in the ancestor of Apocrita with at least two subsequent independent regains (in Formicidae and Apidae).
Molecular basis of length polymorphism in the human zeta-globin gene complex.

PubMed Central

Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J

1983-01-01

The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667
Single-stranded DNA-binding Protein in Vitro Eliminates the Orientation-dependent Impediment to Polymerase Passage on CAG/CTG Repeats*

PubMed Central

Delagoutte, Emmanuelle; Goellner, Geoffrey M.; Guo, Jie; Baldacci, Giuseppe; McMurray, Cynthia T.

2008-01-01

Small insertions and deletions of trinucleotide repeats (TNRs) can occur by polymerase slippage and hairpin formation on either template or newly synthesized strands during replication. Although not predicted by a slippage model, deletions occur preferentially when 5′-CTG is in the lagging strand template and are highly favored over insertion events in rapidly replicating cells. The mechanism for the deletion bias and the orientation dependence of TNR instability is poorly understood. We report here that there is an orientation-dependent impediment to polymerase progression on 5′-CAG and 5′-CTG repeats that can be relieved by the binding of single-stranded DNA-binding protein. The block depends on the primary sequence of the TNR but does not correlate with the thermodynamic stability of hairpins. The orientation-dependent block of polymerase passage is the strongest when 5′-CAG is the template. We propose a “template-push” model in which the slow speed of DNA polymerase across the 5′-CAG leading strand template creates a threat to helicase-polymerase coupling. To prevent uncoupling, the TNR template is pushed out and by-passed. Hairpins do not cause the block, but appear to occur as a consequence of polymerase pass-over. PMID:18263578
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2013-12-01

The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.
Length measurement and spatial orientation reconstruction of single nanowires.

PubMed

Prestopino, Giuseppe; Orsini, Andrea; Falconi, Christian; Bietti, Sergio; Verona-Rinati, Gianluca; Caselli, Federica; Bisegna, Paolo

2018-06-27

The accurate determination of the geometrical features of quasi one-dimensional nanostructures is mandatory for reducing errors and improving repeatability in the estimation of a number of geometry-dependent properties in nanotechnology. In this paper a method for the reconstruction of length and spatial orientation of single nanowires is presented. Those quantities are calculated from a sequence of scanning electron microscope images taken at different tilt angles using a simple 3D geometric model. The proposed method is evaluated on a collection of scanning electron microscope images of single GaAs nanowires. It is validated through the reconstruction of known geometric features of a standard reference calibration pattern. An overall uncertainty of about 1% in the estimated length of the nanowires is achieved. © 2018 IOP Publishing Ltd.
PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

PubMed

Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

2011-01-01

PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
Long interspersed repeated DNA (LINE) causes polymorphism at the rat insulin 1 locus.

PubMed Central

Lakshmikumaran, M S; D'Ambrosio, E; Laimins, L A; Lin, D T; Furano, A V

1985-01-01

The insulin 1, but not the insulin 2, locus is polymorphic (i.e., exhibits allelic variation) in rats. Restriction enzyme analysis and hybridization studies showed that the polymorphic region is 2.2 kilobases upstream of the insulin 1 coding region and is due to the presence or absence of an approximately 2.7-kilobase repeated DNA element. DNA sequence determination showed that this DNA element is a member of a long interspersed repeated DNA family (LINE) that is highly repeated (greater than 50,000 copies) and highly transcribed in the rat. Although the presence or absence of LINE sequences at the insulin 1 locus occurs in both the homozygous and heterozygous states, LINE-containing insulin 1 alleles are more prevalent in the rat population than are alleles without LINEs. Restriction enzyme analysis of the LINE-containing alleles indicated that at least two versions of the LINE sequence may be present at the insulin 1 locus in different rats. Either repeated transposition of LINE sequences or gene conversion between the resident insulin 1 LINE and other sequences in the genome are possible explanations for this. Images PMID:3016521
Molecular characterization of Atractolytocestus sagittatus (Cestoda: Caryophyllidea), monozoic parasite of common carp, and its differentiation from the invasive species Atractolytocestus huronensis.

PubMed

Bazsalovicsová, Eva; Králová-Hromadová, Ivica; Stefka, Jan; Scholz, Tomáš

2012-05-01

Sequence structure of complete internal transcribed spacer 1 and 2 (ITS1 and ITS2) of the ribosomal DNA region and partial mitochondrial cytochrome c oxidase subunit I (cox1) gene sequences were studied in the monozoic tapeworm Atractolytocestus sagittatus (Kulakovskaya et Akhmerov, 1965) (Cestoda: Caryophyllidea), a parasite of common carp (Cyprinus carpio carpio L.). Intraindividual sequence diversity was observed in both ribosomal spacers. In ITS1, a total number of 19 recombinant clones yielded eight different sequence types (pairwise sequence identity, 99.7-100%) which, however, did not resemble the structure typical for divergent intragenomic ITS copies (paralogues). Polymorphism was displayed by several single nucleotide mutations present exclusively in single clones, but variation in the number of short repetitive motifs was not observed. In ITS2, a total of 21 recombinant clones yielded ten different sequence types (pairwise sequence identity, 97.5-100%). They were mostly characterized by a varying number of (TCGT)(n) repeats resulting in assortment of ITS2 sequences into two sequence variants, which reflected the structure specific for ITS paralogues. The third DNA region analysed, mitochondrial cox1 gene (669 bp) was detected to be 100% identical in all studied A. sagittatus individuals. Comparison of molecular data on A. sagittatus with those on Atractolytocestus huronensis Anthony, 1958, an invasive parasite of common carp, has shown that interspecific differences significantly exceeded intraspecific variation in both ribosomal spacers (81.4-82.5% in ITS1, 74.4-75.2% in ITS2) as well as in mitochondrial cox1, which confirms validity of both congeneric tapeworms parasitic in the same fish host.
Polycistronic lentiviral vector for "hit and run" reprogramming of adult skin fibroblasts to induced pluripotent stem cells.

PubMed

Chang, Chia-Wei; Lai, Yi-Shin; Pawlik, Kevin M; Liu, Kaimao; Sun, Chiao-Wang; Li, Chao; Schoeb, Trenton R; Townes, Tim M

2009-05-01

We report the derivation of induced pluripotent stem (iPS) cells from adult skin fibroblasts using a single, polycistronic lentiviral vector encoding the reprogramming factors Oct4, Sox2, and Klf4. Porcine teschovirus-1 2A sequences that trigger ribosome skipping were inserted between human cDNAs for these factors, and the polycistron was subcloned downstream of the elongation factor 1 alpha promoter in a self-inactivating (SIN) lentiviral vector containing a loxP site in the truncated 3' long terminal repeat (LTR). Adult skin fibroblasts from a humanized mouse model of sickle cell disease were transduced with this single lentiviral vector, and iPS cell colonies were picked within 30 days. These cells expressed endogenous Oct4, Sox2, Nanog, alkaline phosphatase, stage-specific embryonic antigen-1, and other markers of pluripotency. The iPS cells produced teratomas containing tissue derived from all three germ layers after injection into immunocompromised mice and formed high-level chimeras after injection into murine blastocysts. iPS cell lines with as few as three lentiviral insertions were obtained. Expression of Cre recombinase in these iPS cells resulted in deletion of the lentiviral vector, and sequencing of insertion sites demonstrated that remnant 291-bp SIN LTRs containing a single loxP site did not interrupt coding sequences, promoters, or known regulatory elements. These results suggest that a single, polycistronic "hit and run" vector can safely and effectively reprogram adult dermal fibroblasts into iPS cells.
Live births after simultaneous avoidance of monogenic diseases and chromosome abnormality by next-generation sequencing with linkage analyses.

PubMed

Yan, Liying; Huang, Lei; Xu, Liya; Huang, Jin; Ma, Fei; Zhu, Xiaohui; Tang, Yaqiong; Liu, Mingshan; Lian, Ying; Liu, Ping; Li, Rong; Lu, Sijia; Tang, Fuchou; Qiao, Jie; Xie, X Sunney

2015-12-29

In vitro fertilization (IVF), preimplantation genetic diagnosis (PGD), and preimplantation genetic screening (PGS) help patients to select embryos free of monogenic diseases and aneuploidy (chromosome abnormality). Next-generation sequencing (NGS) methods, while experiencing a rapid cost reduction, have improved the precision of PGD/PGS. However, the precision of PGD has been limited by the false-positive and false-negative single-nucleotide variations (SNVs), which are not acceptable in IVF and can be circumvented by linkage analyses, such as short tandem repeats or karyomapping. It is noteworthy that existing methods of detecting SNV/copy number variation (CNV) and linkage analysis often require separate procedures for the same embryo. Here we report an NGS-based PGD/PGS procedure that can simultaneously detect a single-gene disorder and aneuploidy and is capable of linkage analysis in a cost-effective way. This method, called "mutated allele revealed by sequencing with aneuploidy and linkage analyses" (MARSALA), involves multiple annealing and looping-based amplification cycles (MALBAC) for single-cell whole-genome amplification. Aneuploidy is determined by CNVs, whereas SNVs associated with the monogenic diseases are detected by PCR amplification of the MALBAC product. The false-positive and -negative SNVs are avoided by an NGS-based linkage analysis. Two healthy babies, free of the monogenic diseases of their parents, were born after such embryo selection. The monogenic diseases originated from a single base mutation on the autosome and the X-chromosome of the disease-carrying father and mother, respectively.
Spectroscopic insights into quadruplexes of five-repeat telomere DNA sequences upon G-block damage.

PubMed

Dvořáková, Zuzana; Vorlíčková, Michaela; Renčiuk, Daniel

2017-11-01

The DNA lesions, resulting from oxidative damage, were shown to destabilize human telomere four-repeat quadruplex and to alter its structure. Long telomere DNA, as a repetitive sequence, offers, however, other mechanisms of dealing with the lesion: extrusion of the damaged repeat into loop or shifting the quadruplex position by one repeat. Using circular dichroism and UV absorption spectroscopy and polyacrylamide electrophoresis, we studied consequences of lesions at different positions of the model five-repeat human telomere DNA sequences on the structure and stability of their quadruplexes in sodium and in potassium. The repeats affected by lesion are preferentially positioned as terminal overhangs of the core quadruplex structurally similar to the four-repeat one. Forced affecting of the inner repeats leads to presence of variety of more parallel folds in potassium. In sodium the designed models form mixture of two dominant antiparallel quadruplexes whose population varies with the position of the affected repeat. The shapes of quadruplex CD spectra, namely the height of dominant peaks, significantly correlate with melting temperatures. Lesion in one guanine tract of a more than four repeats long human telomere DNA sequence may cause re-positioning of its quadruplex arrangement associated with a shift of the structure to less common quadruplex conformations. The type of the quadruplex depends on the loop position and external conditions. The telomere DNA quadruplexes are quite resistant to the effect of point mutations due to the telomere DNA repetitive nature, although their structure and, consequently, function might be altered. Copyright © 2017. Published by Elsevier B.V.
Microsatellite analysis in the genome of Acanthaceae: An in silico approach

PubMed Central

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future. PMID:25709226
The repeating nucleotide sequence in the repetitive mitochondrial DNA from a "low-density" petite mutant of yeast.

PubMed Central

Van Kreijl, C F; Bos, J L

1977-01-01

The repeating nucleotide sequence of 68 base pairs in the mtDNA from an ethidium-induced cytoplasmic petite mutant of yeast has been determined. For sequence analysis specifically primed and terminated RNA copies, obtained by in vitro transcription of the separated strands, were use. The sequence consists of 66 consecutive AT base pairs flanked by two GC pairs and comprises nearly all of the mutant mitochondrial genome. The sequence, moreover, also represents the first part of wild-type mtDNA sequence so far. Images PMID:198740
A 48 SNP set for grapevine cultivar identification

PubMed Central

2011-01-01

Background Rapid and consistent genotyping is an important requirement for cultivar identification in many crop species. Among them grapevine cultivars have been the subject of multiple studies given the large number of synonyms and homonyms generated during many centuries of vegetative multiplication and exchange. Simple sequence repeat (SSR) markers have been preferred until now because of their high level of polymorphism, their codominant nature and their high profile repeatability. However, the rapid application of partial or complete genome sequencing approaches is identifying thousands of single nucleotide polymorphisms (SNP) that can be very useful for such purposes. Although SNP markers are bi-allelic, and therefore not as polymorphic as microsatellites, the high number of loci that can be multiplexed and the possibilities of automation as well as their highly repeatable results under any analytical procedure make them the future markers of choice for any type of genetic identification. Results We analyzed over 300 SNP in the genome of grapevine using a re-sequencing strategy in a selection of 11 genotypes. Among the identified polymorphisms, we selected 48 SNP spread across all grapevine chromosomes with allele frequencies balanced enough as to provide sufficient information content for genetic identification in grapevine allowing for good genotyping success rate. Marker stability was tested in repeated analyses of a selected group of cultivars obtained worldwide to demonstrate their usefulness in genetic identification. Conclusions We have selected a set of 48 stable SNP markers with a high discrimination power and a uniform genome distribution (2-3 markers/chromosome), which is proposed as a standard set for grapevine (Vitis vinifera L.) genotyping. Any previous problems derived from microsatellite allele confusion between labs or the need to run reference cultivars to identify allele sizes disappear using this type of marker. Furthermore, because SNP markers are bi-allelic, allele identification and genotype naming are extremely simple and genotypes obtained with different equipments and by different laboratories are always fully comparable. PMID:22060012
A Lossy Compression Technique Enabling Duplication-Aware Sequence Alignment

PubMed Central

Freschi, Valerio; Bogliolo, Alessandro

2012-01-01

In spite of the recognized importance of tandem duplications in genome evolution, commonly adopted sequence comparison algorithms do not take into account complex mutation events involving more than one residue at the time, since they are not compliant with the underlying assumption of statistical independence of adjacent residues. As a consequence, the presence of tandem repeats in sequences under comparison may impair the biological significance of the resulting alignment. Although solutions have been proposed, repeat-aware sequence alignment is still considered to be an open problem and new efficient and effective methods have been advocated. The present paper describes an alternative lossy compression scheme for genomic sequences which iteratively collapses repeats of increasing length. The resulting approximate representations do not contain tandem duplications, while retaining enough information for making their comparison even more significant than the edit distance between the original sequences. This allows us to exploit traditional alignment algorithms directly on the compressed sequences. Results confirm the validity of the proposed approach for the problem of duplication-aware sequence alignment. PMID:22518086
Tandemly repeated sequences in mtDNA control region of whitefish, Coregonus lavaretus.

PubMed

Brzuzan, P

2000-06-01

Length variation of the mitochondrial DNA control region was observed with PCR amplification of a sample of 138 whitefish (Coregonus lavaretus). Nucleotide sequences of representative PCR products showed that the variation was due to the presence of an approximately 100-bp motif tandemly repeated two, three, or five times in the region between the conserved sequence block-3 (CSB-3) and the gene for phenylalanine tRNA. This is the first report on the tandem array composed of long repeat units in mitochondrial DNA of salmonids.
The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species

PubMed Central

Park, Inkyu; Kim, Wook-jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin

2017-01-01

Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC–trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species. PMID:28863163
The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

PubMed

Park, Inkyu; Kim, Wook-Jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin; Moon, Byeong Cheol

2017-01-01

Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.

Reprint of: Early Behavioural Facilitation by Temporal Expectations in Complex Visual-motor Sequences.

PubMed

Heideman, Simone G; van Ede, Freek; Nobre, Anna C

2018-05-24

In daily life, temporal expectations may derive from incidental learning of recurring patterns of intervals. We investigated the incidental acquisition and utilisation of combined temporal-ordinal (spatial/effector) structure in complex visual-motor sequences using a modified version of a serial reaction time (SRT) task. In this task, not only the series of targets/responses, but also the series of intervals between subsequent targets was repeated across multiple presentations of the same sequence. Each participant completed three sessions. In the first session, only the repeating sequence was presented. During the second and third session, occasional probe blocks were presented, where a new (unlearned) spatial-temporal sequence was introduced. We first confirm that participants not only got faster over time, but that they were slower and less accurate during probe blocks, indicating that they incidentally learned the sequence structure. Having established a robust behavioural benefit induced by the repeating spatial-temporal sequence, we next addressed our central hypothesis that implicit temporal orienting (evoked by the learned temporal structure) would have the largest influence on performance for targets following short (as opposed to longer) intervals between temporally structured sequence elements, paralleling classical observations in tasks using explicit temporal cues. We found that indeed, reaction time differences between new and repeated sequences were largest for the short interval, compared to the medium and long intervals, and that this was the case, even when comparing late blocks (where the repeated sequence had been incidentally learned), to early blocks (where this sequence was still unfamiliar). We conclude that incidentally acquired temporal expectations that follow a sequential structure can have a robust facilitatory influence on visually-guided behavioural responses and that, like more explicit forms of temporal orienting, this effect is most pronounced for sequence elements that are expected at short inter-element intervals. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Draft Sequences of the Radish (Raphanus sativus L.) Genome

PubMed Central

Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

2014-01-01

Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699
Evolutionary conservation of sequence and secondary structures inCRISPR repeats

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeatsmore » identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.« less
Improved PCR-Based Detection of Soil Transmitted Helminth Infections Using a Next-Generation Sequencing Approach to Assay Design.

PubMed

Pilotte, Nils; Papaiakovou, Marina; Grant, Jessica R; Bierwert, Lou Ann; Llewellyn, Stacey; McCarthy, James S; Williams, Steven A

2016-03-01

The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays. Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis). Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay. The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.
Evolution Analysis of Simple Sequence Repeats in Plant Genome.

PubMed

Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

2015-01-01

Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.
Perceived empty duration between sounds of different lengths: Possible relation with repetition and rhythmic grouping.

PubMed

Kuroda, Tsuyoshi; Tomimatsu, Erika; Grondin, Simon; Miyazaki, Makoto

2016-11-01

We investigated how perceived duration of empty time intervals would be modulated by the length of sounds marking those intervals. Three sounds were successively presented in Experiment 1. Each sound was short (S) or long (L), and the temporal position of the middle sound's onset was varied. The lengthening of each sound resulted in delayed perception of the onset; thus, the middle sound's onset had to be presented earlier in the SLS than in the LSL sequence so that participants perceived the three sounds as presented at equal interonset intervals. In Experiment 2, a short sound and a long sound were alternated repeatedly, and the relative duration of the SL interval to the LS interval was varied. This repeated sequence was perceived as consisting of equal interonset intervals when the onsets of all sounds were aligned at physically equal intervals. If the same onset delay as in the preceding experiment had occurred, participants should have perceived equality between the interonset intervals in the repeated sequence when the SL interval was physically shortened relative to the LS interval. The effects of sound length seemed to be canceled out when the presentation of intervals was repeated. Finally, the perceived duration of the interonset intervals in the repeated sequence was not influenced by whether the participant's native language was French or Japanese, or by how the repeated sequence was perceptually segmented into rhythmic groups.
Evolution of Transcription Activator-Like Effectors in Xanthomonas oryzae

PubMed Central

Erkes, Annett; Reschke, Maik; Boch, Jens

2017-01-01

Abstract Transcription activator-like effectors (TALEs) are secreted by plant–pathogenic Xanthomonas bacteria into plant cells where they act as transcriptional activators and, hence, are major drivers in reprogramming the plant for the benefit of the pathogen. TALEs possess a highly repetitive DNA-binding domain of typically 34 amino acid (AA) tandem repeats, where AA 12 and 13, termed repeat variable di-residue (RVD), determine target specificity. Different Xanthomonas strains possess different repertoires of TALEs. Here, we study the evolution of TALEs from the level of RVDs determining target specificity down to the level of DNA sequence with focus on rice-pathogenic Xanthomonas oryzae pv. oryzae (Xoo) and Xanthomonas oryzae pv. oryzicola (Xoc) strains. We observe that codon pairs coding for individual RVDs are conserved to a similar degree as the flanking repeat sequence. We find strong indications that TALEs may evolve 1) by base substitutions in codon pairs coding for RVDs, 2) by recombination of N-terminal or C-terminal regions of existing TALEs, or 3) by deletion of individual TALE repeats, and we propose possible mechanisms. We find indications that the reassortment of TALE genes in clusters is mediated by an integron-like mechanism in Xoc. We finally study the effect of the presence/absence and evolutionary modifications of TALEs on transcriptional activation of putative target genes in rice, and find that even single RVD swaps may lead to considerable differences in activation. This correlation allowed a refined prediction of TALE targets, which is the crucial step to decipher their virulence activity. PMID:28637323
Ordered mapping of 3 alphoid DNA subsets on human chromosome 22

DOE Office of Scientific and Technical Information (OSTI.GOV)

Antonacci, R.; Baldini, A.; Archidiacono, N.

1994-09-01

Alpha satellite DNA consists of tandemly repeated monomers of 171 bp clustered in the centromeric region of primate chromosomes. Sequence divergence between subsets located in different human chromosomes is usually high enough to ensure chromosome-specific hybridization. Alphoid probes specific for almost every human chromosome have been reported. A single chromosome can carry different subsets of alphoid DNA and some alphoid subsets can be shared by different chromosomes. We report the physical order of three alphoid DNA subsets on human chromosome 22 determined by a combination of low and high resolution cytological mapping methods. Results visually demonstrate the presence of threemore » distinct alphoid DNA domains at the centromeric region of chromosome 22. We have measured the interphase distances between the three probes in three-color FISH experiments. Statistical analysis of the results indicated the order of the subsets. Two color experiments on prometaphase chromosomes established the order of the three domains relative to the arms of chromosome 22 and confirmed the results obtained using interphase mapping. This demonstrates the applicability of interphase mapping for alpha satellite DNA orderering. However, in our experiments, interphase mapping did not provide any information about the relationship between extremities of the repeat arrays. This information was gained from extended chromatin hybridization. The extremities of two of the repeat arrays were seen to be almost overlapping whereas the third repeat array was clearly separated from the other two. Our data show the value of extended chromatin hybridization as a complement of other cytological techniques for high resolution mapping of repetitive DNA sequences.« less
RNA regulatory networks diversified through curvature of the PUF protein scaffold

DOE PAGES

Wilinski, Daniel; Qiu, Chen; Lapointe, Christopher P.; ...

2015-09-14

Proteins bind and control mRNAs, directing their localization, translation and stability. Members of the PUF family of RNA-binding proteins control multiple mRNAs in a single cell, and play key roles in development, stem cell maintenance and memory formation. Here we identified the mRNA targets of a S. cerevisiae PUF protein, Puf5p, by ultraviolet-crosslinking-affinity purification and high-throughput sequencing (HITS-CLIP). The binding sites recognized by Puf5p are diverse, with variable spacer lengths between two specific sequences. Each length of site correlates with a distinct biological function. Crystal structures of Puf5p–RNA complexes reveal that the protein scaffold presents an exceptionally flat and extendedmore » interaction surface relative to other PUF proteins. In complexes with RNAs of different lengths, the protein is unchanged. A single PUF protein repeat is sufficient to induce broadening of specificity. Changes in protein architecture, such as alterations in curvature, may lead to evolution of mRNA regulatory networks.« less
RNA regulatory networks diversified through curvature of the PUF protein scaffold

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilinski, Daniel; Qiu, Chen; Lapointe, Christopher P.

Proteins bind and control mRNAs, directing their localization, translation and stability. Members of the PUF family of RNA-binding proteins control multiple mRNAs in a single cell, and play key roles in development, stem cell maintenance and memory formation. Here we identified the mRNA targets of a S. cerevisiae PUF protein, Puf5p, by ultraviolet-crosslinking-affinity purification and high-throughput sequencing (HITS-CLIP). The binding sites recognized by Puf5p are diverse, with variable spacer lengths between two specific sequences. Each length of site correlates with a distinct biological function. Crystal structures of Puf5p–RNA complexes reveal that the protein scaffold presents an exceptionally flat and extendedmore » interaction surface relative to other PUF proteins. In complexes with RNAs of different lengths, the protein is unchanged. A single PUF protein repeat is sufficient to induce broadening of specificity. Changes in protein architecture, such as alterations in curvature, may lead to evolution of mRNA regulatory networks.« less
Complete chloroplast genome sequence of green foxtail (Setaria viridis), a promising model system for C4 photosynthesis.

PubMed

Wang, Shuo; Gao, Li-Zhi

2016-09-01

The complete chloroplast genome of green foxtail (Setaria viridis), a promising model system for C4 photosynthesis, is first reported in this study. The genome harbors a large single copy (LSC) region of 81 016 bp and a small single copy (SSC) region of 12 456 bp separated by a pair of inverted repeat (IRa and IRb) regions of 22 315 bp. GC content is 38.92%. The proportion of coding sequence is 57.97%, comprising of 111 (19 duplicated in IR regions) unique genes, 71 of which are protein-coding genes, four are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated that S. viridis was clustered with its cultivated species S. italica in the tribe Paniceae of the family Poaceae. This newly determined chloroplast genome will provide valuable genetic resources to assist future studies on C4 photosynthesis in grasses.
Sunflower centromeres consist of a centromere-specific LINE and a chromosome-specific tandem repeat.

PubMed

Nagaki, Kiyotaka; Tanaka, Keisuke; Yamaji, Naoki; Kobayashi, Hisato; Murata, Minoru

2015-01-01

The kinetochore is a protein complex including kinetochore-specific proteins that plays a role in chromatid segregation during mitosis and meiosis. The complex associates with centromeric DNA sequences that are usually species-specific. In plant species, tandem repeats including satellite DNA sequences and retrotransposons have been reported as centromeric DNA sequences. In this study on sunflowers, a cDNA-encoding centromere-specific histone H3 (CENH3) was isolated from a cDNA pool from a seedling, and an antibody was raised against a peptide synthesized from the deduced cDNA. The antibody specifically recognized the sunflower CENH3 (HaCENH3) and showed centromeric signals by immunostaining and immunohistochemical staining analysis. The antibody was also applied in chromatin immunoprecipitation (ChIP)-Seq to isolate centromeric DNA sequences and two different types of repetitive DNA sequences were identified. One was a long interspersed nuclear element (LINE)-like sequence, which showed centromere-specific signals on almost all chromosomes in sunflowers. This is the first report of a centromeric LINE sequence, suggesting possible centromere targeting ability. Another type of identified repetitive DNA was a tandem repeat sequence with a 187-bp unit that was found only on a pair of chromosomes. The HaCENH3 content of the tandem repeats was estimated to be much higher than that of the LINE, which implies centromere evolution from LINE-based centromeres to more stable tandem-repeat-based centromeres. In addition, the epigenetic status of the sunflower centromeres was investigated by immunohistochemical staining and ChIP, and it was found that centromeres were heterochromatic.
Changing practice: red blood cell typing by molecular methods for patients with sickle cell disease.

PubMed

Casas, Jessica; Friedman, David F; Jackson, Tannoa; Vege, Sunitha; Westhoff, Connie M; Chou, Stella T

2015-06-01

Extended red blood cell (RBC) antigen matching is recommended to limit alloimmunization in patients with sickle cell disease (SCD). DNA-based testing to predict blood group phenotypes has enhanced availability of antigen-negative donor units and improved typing of transfused patients, but replacement of routine serologic typing for non-ABO antigens with molecular typing for patients has not been reported. This study compared the historical RBC antigen phenotypes obtained by hemagglutination methods with genotype predictions in 494 patients with SCD. For discrepant results, repeat serologic testing was performed and/or investigated by gene sequencing for silent or variant alleles. Seventy-one typing discrepancies were identified among 6360 antigen comparisons (1.1%). New specimens for repeat serologic testing were obtained for 66 discrepancies and retyping agreed with the genotype in 64 cases. One repeat Jk(b-) serologic phenotype, predicted Jk(b+) by genotype, was found by direct sequencing of JK to be a silenced allele, and one N typing discrepancy remains under investigation. Fifteen false-negative serologic results were associated with alleles encoding weak antigens or single-dose Fy(b) expression. DNA-based RBC typing provided improved accuracy and expanded information on RBC antigens compared to hemagglutination methods, leading to its implementation as the primary method for extended RBC typing for patients with SCD at our institution. © 2015 AABB.
Distribution of FMR1 and FMR2 Repeats in Argentinean Patients with Primary Ovarian Insufficiency

PubMed Central

Espeche, Lucía Daniela; Chiauzzi, Violeta; Ferder, Ianina; Arrar, Mehrnoosh; Solari, Andrea Paula; Bruque, Carlos David; Delea, Marisol; Belli, Susana; Fernández, Cecilia Soledad; Buzzalino, Noemí Delia; Charreau, Eduardo Hernán; Dain, Liliana Beatriz

2017-01-01

The premutation state of FMR1 (Fragile X Mental Retardation 1) has been associated with primary ovarian insufficiency (POI), and is the most common known genetic cause for 46,XX patients. Nevertheless, very few studies have analyzed its frequency in Latin American populations. Additionally, a relationship between alleles carrying a cryptic microdeletion in the 5’UTR of FMR2 and the onset of POI has only been studied in one population. Our aim was to analyze the incidence of FMR1 premutations and putative microdeletions in exon 1 of FMR2 in a cohort of Argentinean women with POI. We studied 133 patients and 84 controls. Fluorescent PCR was performed, and the FMR2 exon 1 was further sequenced in samples presenting less than 11 repeats. We found the frequency of FMR1 premutations to be 6.7% and 2.9% for familial and sporadic patients, respectively. Among controls, 1/84 women presented a premutation. In addition, although we did not find microdeletions in FMR2, we observed a change (T >C) adjacent to the repeats in two sisters with POI. Given the repetitive nature of the sequence involved, we could not ascertain whether this represents a single nucleotide polymorphism (SNP) or a deletion. Therefore, a relationship between FMR2 and POI could not be established for our population. PMID:28812997
Distribution of FMR1 and FMR2 Repeats in Argentinean Patients with Primary Ovarian Insufficiency.

PubMed

Espeche, Lucía Daniela; Chiauzzi, Violeta; Ferder, Ianina; Arrar, Mehrnoosh; Solari, Andrea Paula; Bruque, Carlos David; Delea, Marisol; Belli, Susana; Fernández, Cecilia Soledad; Buzzalino, Noemí Delia; Charreau, Eduardo Hernán; Dain, Liliana Beatriz

2017-08-16

The premutation state of FMR1 (Fragile X Mental Retardation 1) has been associated with primary ovarian insufficiency (POI), and is the most common known genetic cause for 46,XX patients. Nevertheless, very few studies have analyzed its frequency in Latin American populations. Additionally, a relationship between alleles carrying a cryptic microdeletion in the 5'UTR of FMR2 and the onset of POI has only been studied in one population. Our aim was to analyze the incidence of FMR1 premutations and putative microdeletions in exon 1 of FMR2 in a cohort of Argentinean women with POI. We studied 133 patients and 84 controls. Fluorescent PCR was performed, and the FMR2 exon 1 was further sequenced in samples presenting less than 11 repeats. We found the frequency of FMR1 premutations to be 6.7% and 2.9% for familial and sporadic patients, respectively. Among controls, 1/84 women presented a premutation. In addition, although we did not find microdeletions in FMR2 , we observed a change (T >C) adjacent to the repeats in two sisters with POI. Given the repetitive nature of the sequence involved, we could not ascertain whether this represents a single nucleotide polymorphism (SNP) or a deletion. Therefore, a relationship between FMR2 and POI could not be established for our population.
Entropic fluctuations in DNA sequences

NASA Astrophysics Data System (ADS)

Thanos, Dimitrios; Li, Wentian; Provata, Astero

2018-03-01

The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.
New development and validation of 50 SSR markers in breadfruit (Artocarpus altilis, Moraceae) by next-generation sequencing1

PubMed Central

De Bellis, Fabien; Malapa, Roger; Kagy, Valérie; Lebegin, Stéphane; Billot, Claire; Labouisse, Jean-Pierre

2016-01-01

Premise of the study: Using next-generation sequencing technology, new microsatellite loci were characterized in Artocarpus altilis (Moraceae) and two congeners to increase the number of available markers for genotyping breadfruit cultivars. Methods and Results: A total of 47,607 simple sequence repeat loci were obtained by sequencing a library of breadfruit genomic DNA with an Illumina MiSeq system. Among them, 50 single-locus markers were selected and assessed using 41 samples (39 A. altilis, one A. camansi, and one A. heterophyllus). All loci were polymorphic in A. altilis, 44 in A. camansi, and 21 in A. heterophyllus. The number of alleles per locus ranged from two to 19. Conclusions: The new markers will be useful for assessing the identity and genetic diversity of breadfruit cultivars on a small geographical scale, gaining a better understanding of farmer management practices, and will help to optimize breadfruit genebank management. PMID:27610273
Whole transcriptome analysis using next-generation sequencing of model species Setaria viridis to support C4 photosynthesis research.

PubMed

Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang

2013-09-01

Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.
Origin of the CMS gene locus in rapeseed cybrid mitochondria: active and inactive recombination produces the complex CMS gene region in the mitochondrial genomes of Brassicaceae.

PubMed

Oshima, Masao; Kikuchi, Rie; Imamura, Jun; Handa, Hirokazu

2010-01-01

CMS (cytoplasmic male sterile) rapeseed is produced by asymmetrical somatic cell fusion between the Brassica napus cv. Westar and the Raphanus sativus Kosena CMS line (Kosena radish). The CMS rapeseed contains a CMS gene, orf125, which is derived from Kosena radish. Our sequence analyses revealed that the orf125 region in CMS rapeseed originated from recombination between the orf125/orfB region and the nad1C/ccmFN1 region by way of a 63 bp repeat. A precise sequence comparison among the related sequences in CMS rapeseed, Kosena radish and normal rapeseed showed that the orf125 region in CMS rapeseed consisted of the Kosena orf125/orfB region and the rapeseed nad1C/ccmFN1 region, even though Kosena radish had both the orf125/orfB region and the nad1C/ccmFN1 region in its mitochondrial genome. We also identified three tandem repeat sequences in the regions surrounding orf125, including a 63 bp repeat, which were involved in several recombination events. Interestingly, differences in the recombination activity for each repeat sequence were observed, even though these sequences were located adjacent to each other in the mitochondrial genome. We report results indicating that recombination events within the mitochondrial genomes are regulated at the level of specific repeat sequences depending on the cellular environment.
Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome.

PubMed

Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Giordani, Tommaso; Cavallini, Andrea

2017-02-01

Short Interspersed Nuclear Elements (SINEs) are nonautonomous retrotransposons in the genome of most eukaryotic species. While SINEs have been intensively investigated in humans and other animal systems, SINE identification has been carried out only in a limited number of plant species. This lack of information is apparent especially in non-model plants whose genome has not been sequenced yet. The aim of this work was to produce a specific bioinformatics pipeline for analysing second generation sequence reads of a non-model species and identifying SINEs. We have identified, for the first time, 227 putative SINEs of the olive tree (Olea europaea), that constitute one of the few sets of such sequences in dicotyledonous species. The identified SINEs ranged from 140 to 362 bp in length and were characterised with regard to the occurrence of the tRNA domain in their sequence. The majority of identified elements resulted in single copy or very lowly repeated, often in association with genic sequences. Analysis of sequence similarity allowed us to identify two major groups of SINEs showing different abundances in the olive tree genome, the former with sequence similarity to SINEs of Scrophulariaceae and Solanaceae and the latter to SINEs of Salicaceae. A comparison of sequence conservation between olive SINEs and LTR retrotransposon families suggested that SINE expansion in the genome occurred especially in very ancient times, before LTR retrotransposon expansion, and presumably before the separation of the rosids (to which Oleaceae belong) from the Asterids. Besides providing data on olive SINEs, our results demonstrate the suitability of the pipeline employed for SINE identification. Applying this pipeline will favour further structural and functional analyses on these relatively unknown elements to be performed also in other plant species, even in the absence of a reference genome, and will allow establishing general evolutionary patterns for this kind of repeats in plants.

Analysis of Two Cosmid Clones from Chromosome 4 of Drosophila melanogaster Reveals Two New Genes Amid an Unusual Arrangement of Repeated Sequences

PubMed Central

Locke, John; Podemski, Lynn; Roy, Ken; Pilgrim, David; Hodgetts, Ross

1999-01-01

Chromosome 4 from Drosophila melanogaster has several unusual features that distinguish it from the other chromosomes. These include a diffuse appearance in salivary gland polytene chromosomes, an absence of recombination, and the variegated expression of P-element transgenes. As part of a larger project to understand these properties, we are assembling a physical map of this chromosome. Here we report the sequence of two cosmids representing ∼5% of the polytenized region. Both cosmid clones contain numerous repeated DNA sequences, as identified by cross hybridization with labeled genomic DNA, BLAST searches, and dot matrix analysis, which are positioned between and within the transcribed sequences. The repetitive sequences include three copies of the mobile element Hoppel, one copy of the mobile element HB, and 18 DINE repeats. DINE is a novel, short repeated sequence dispersed throughout both cosmid sequences. One cosmid includes the previously described cubitus interruptus (ci) gene and two new genes: that a gene with a predicted amino acid sequence similar to ribosomal protein S3a which is consistent with the Minute(4)101 locus thought to be in the region, and a novel member of the protein family that includes plexin and met–hepatocyte growth factor receptor. The other cosmid contains only the two short 5′-most exons from the zinc-finger-homolog-2 (zfh-2) gene. This is the first extensive sequence analysis of noncoding DNA from chromosome 4. The distribution of the various repeats suggests its organization is similar to the β-heterochromatic regions near the base of the major chromosome arms. Such a pattern may account for the diffuse banding of the polytene chromosome 4 and the variegation of many P-element transgenes on the chromosome. PMID:10022978
Comprehensive mutation analysis of 17 Y-chromosomal short tandem repeat polymorphisms included in the AmpFlSTR Yfiler PCR amplification kit.

PubMed

Goedbloed, Miriam; Vermeulen, Mark; Fang, Rixun N; Lembring, Maria; Wollstein, Andreas; Ballantyne, Kaye; Lao, Oscar; Brauer, Silke; Krüger, Carmen; Roewer, Lutz; Lessig, Rüdiger; Ploski, Rafal; Dobosz, Tadeusz; Henke, Lotte; Henke, Jürgen; Furtado, Manohar R; Kayser, Manfred

2009-11-01

The Y-chromosomal short tandem repeat (Y-STR) polymorphisms included in the AmpFlSTR Yfiler polymerase chain reaction amplification kit have become widely used for forensic and evolutionary applications where a reliable knowledge on mutation properties is necessary for correct data interpretation. Therefore, we investigated the 17 Yfiler Y-STRs in 1,730-1,764 DNA-confirmed father-son pairs per locus and found 84 sequence-confirmed mutations among the 29,792 meiotic transfers covered. Of the 84 mutations, 83 (98.8%) were single-repeat changes and one (1.2%) was a double-repeat change (ratio, 1:0.01), as well as 43 (51.2%) were repeat gains and 41 (48.8%) repeat losses (ratio, 1:0.95). Medians from Bayesian estimation of locus-specific mutation rates ranged from 0.0003 for DYS448 to 0.0074 for DYS458, with a median rate across all 17 Y-STRs of 0.0025. The mean age (at the time of son's birth) of fathers with mutations was with 34.40 (+/-11.63) years higher than that of fathers without ones at 30.32 (+/-10.22) years, a difference that is highly statistically significant (p < 0.001). A Poisson-based modeling revealed that the Y-STR mutation rate increased with increasing father's age on a statistically significant level (alpha = 0.0294, 2.5% quantile = 0.0001). From combining our data with those previously published, considering all together 135,212 meiotic events and 331 mutations, we conclude for the Yfiler Y-STRs that (1) none had a mutation rate of >1%, 12 had mutation rates of >0.1% and four of <0.1%, (2) single-repeat changes were strongly favored over multiple-repeat ones for all loci but 1 and (3) considerable variation existed among loci in the ratio of repeat gains versus losses. Our finding of three Y-STR mutations in one father-son pair (and two pairs with two mutations each) has consequences for determining the threshold of allelic differences to conclude exclusion constellations in future applications of Y-STRs in paternity testing and pedigree analyses.
Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics

PubMed Central

2013-01-01

Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that not only was the gene nucleotide sequence conserved between soybean and lupin GRRs, but the order and orientation of particular genes in syntenic blocks was homologous, as well. These findings will be valuable to the forthcoming sequencing of the lupin genome. PMID:23379841
Experimental definition of a clustered regularly interspaced short palindromic duplicon in Escherichia coli.

PubMed

Goren, Moran G; Yosef, Ido; Auster, Oren; Qimron, Udi

2012-10-12

We analyzed sequences of newly inserted repeats in an Escherichia coli CRISPR (clustered regularly interspaced short palindromic repeats) array in vivo and showed that a base previously thought to belong to the repeat is actually derived from a protospacer. Based on further experimental results, we propose to use the term "duplicon" for a repeated sequence in a CRISPR array that serves as a template for a new duplicon. Our findings suggest the possibility of redrawing the borders between repeats, spacers, and protospacer adjacent motifs. Copyright © 2012 Elsevier Ltd. All rights reserved.
Graphene Nanopores for Protein Sequencing.

PubMed

Wilson, James; Sloman, Leila; He, Zhiren; Aksimentiev, Aleksei

2016-07-19

An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)-parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.
Initial sequence and comparative analysis of the cat genome

PubMed Central

Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

2007-01-01

The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172
Using the self-select paradigm to delineate the nature of speech motor programming.

PubMed

Wright, David L; Robin, Don A; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H; Fox, Peter T

2009-06-01

The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance.
Phylogeny and strain typing of Escherichia coli, inferred from variation at mononucleotide repeat loci.

PubMed

Diamant, Eran; Palti, Yniv; Gur-Arie, Riva; Cohen, Helit; Hallerman, Eric M; Kashi, Yechezkel

2004-04-01

Multilocus sequencing of housekeeping genes has been used previously for bacterial strain typing and for inferring evolutionary relationships among strains of Escherichia coli. In this study, we used shorter intergenic sequences that contained simple sequence repeats (SSRs) of repeating mononucleotide motifs (mononucleotide repeats [MNRs]) to infer the phylogeny of pathogenic and commensal E. coli strains. Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 strains, including enterohemorrhagic (six isolates of O157:H7), enteropathogenic, enterotoxigenic, B, and K-12 strains. The four MNRs were also sequenced in 20 representative strains of the E. coli reference (ECOR) collection. Sequence polymorphism was significantly higher at the MNR loci, including the flanking sequences, indicating a higher mutation rate in the sequences flanking the MNR tracts. The four MNR loci were amplifiable by PCR in the standard ECOR A, B1, and D groups, but only one (yaiN) in the B2 group was amplified, which is consistent with previous studies that suggested that B2 is the most ancient group. High sequence compatibility was found between the four MNR loci, indicating that they are in the same clonal frame. The phylogenetic trees that were constructed from the sequence data were in good agreement with those of previous studies that used multilocus enzyme electrophoresis. The results demonstrate that MNR loci are useful for inferring phylogenetic relationships and provide much higher sequence variation than housekeeping genes. Therefore, the use of MNR loci for multilocus sequence typing should prove efficient for clinical diagnostics, epidemiology, and evolutionary study of bacteria.
Phylogeny and Strain Typing of Escherichia coli, Inferred from Variation at Mononucleotide Repeat Loci

PubMed Central

Diamant, Eran; Palti, Yniv; Gur-Arie, Riva; Cohen, Helit; Hallerman, Eric M.; Kashi, Yechezkel

2004-01-01

Multilocus sequencing of housekeeping genes has been used previously for bacterial strain typing and for inferring evolutionary relationships among strains of Escherichia coli. In this study, we used shorter intergenic sequences that contained simple sequence repeats (SSRs) of repeating mononucleotide motifs (mononucleotide repeats [MNRs]) to infer the phylogeny of pathogenic and commensal E. coli strains. Seven noncoding loci (four MNRs and three non-SSRs) were sequenced in 27 strains, including enterohemorrhagic (six isolates of O157:H7), enteropathogenic, enterotoxigenic, B, and K-12 strains. The four MNRs were also sequenced in 20 representative strains of the E. coli reference (ECOR) collection. Sequence polymorphism was significantly higher at the MNR loci, including the flanking sequences, indicating a higher mutation rate in the sequences flanking the MNR tracts. The four MNR loci were amplifiable by PCR in the standard ECOR A, B1, and D groups, but only one (yaiN) in the B2 group was amplified, which is consistent with previous studies that suggested that B2 is the most ancient group. High sequence compatibility was found between the four MNR loci, indicating that they are in the same clonal frame. The phylogenetic trees that were constructed from the sequence data were in good agreement with those of previous studies that used multilocus enzyme electrophoresis. The results demonstrate that MNR loci are useful for inferring phylogenetic relationships and provide much higher sequence variation than housekeeping genes. Therefore, the use of MNR loci for multilocus sequence typing should prove efficient for clinical diagnostics, epidemiology, and evolutionary study of bacteria. PMID:15066845
Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing

PubMed Central

2011-01-01

Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models. PMID:21542930
Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing.

PubMed

Straub, Shannon C K; Fishbein, Mark; Livshultz, Tatyana; Foster, Zachary; Parks, Matthew; Weitemier, Kevin; Cronn, Richard C; Liston, Aaron

2011-05-04

Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.
Characterization and assessment of an avian repetitive DNA sequence as an icterid phylogenetic marker.

PubMed

Quinn, J S; Guglich, E; Seutin, G; Lau, R; Marsolais, J; Parna, L; Boag, P T; White, B N

1992-02-01

The first tandemly repeated sequence examined in a passerine bird, a 431-bp PstI fragment named pMAT1, has been cloned from the genome of the brown-headed cowbird (Molothrus ater). The sequence represents about 5-10% of the genome (about 4 x 10(5) copies) and yields prominent ethidium bromide stained bands when genomic DNA cut with a variety of restriction enzymes is electrophoresed in agarose gels. A particularly striking ladder of fragments is apparent when the DNA is cut with HinfI, indicative of a tandem arrangement of the monomer. The cloned PstI monomer has been sequenced, revealing no internal repeated structure. There are sequences that hybridize with pMAT1 found in related nine-primaried oscines but not in more distantly related oscines, suboscines, or nonpasserine species. Little sequence similarity to tandemly repeated PstI cut sequences from the merlin (Falco columbarius), saurus crane (Grus antigone), or Puerto Rican parrot (Amazona vittata) or to HinfI digested sequence from the Toulouse goose (Anser anser) was detected. The isolated sequence was used as a probe to examine DNA samples of eight members of the tribe Icterini. This examination revealed phylogenetically informative characters. The repeat contains cutting sites from a number of restriction enzymes, which, if sufficiently polymorphic, would provide new phylogenetic characters. Sequences like these, conserved within a species, but variable between closely related species, may be very useful for phylogenetic studies of closely related taxa.
Phonological learning in semantic dementia.

PubMed

Jefferies, Elizabeth; Bott, Samantha; Ehsan, Sheeba; Lambon Ralph, Matthew A

2011-04-01

Patients with semantic dementia (SD) have anterior temporal lobe (ATL) atrophy that gives rise to a highly selective deterioration of semantic knowledge. Despite pronounced anomia and poor comprehension of words and pictures, SD patients have well-formed, fluent speech and normal digit span. Given the intimate connection between phonological STM and word learning revealed by both neuropsychological and developmental studies, SD patients might be expected to show good acquisition of new phonological forms, even though their ability to map these onto meanings is impaired. In contradiction of these predictions, a limited amount of previous research has found poor learning of new phonological forms in SD. In a series of experiments, we examined whether SD patient, GE, could learn novel phonological sequences and, if so, under which circumstances. GE showed normal benefits of phonological knowledge in STM (i.e., normal phonotactic frequency and phonological similarity effects) but reduced support from semantic memory (i.e., poor immediate serial recall for semantically degraded words, characterised by frequent item errors). Next, we demonstrated normal learning of serial order information for repeated lists of single-digit number words using the Hebb paradigm: these items were well-understood allowing them to be repeated without frequent item errors. In contrast, patient GE showed little learning of nonsense syllable sequences using the same Hebb paradigm. Detailed analysis revealed that both GE and the controls showed a tendency to learn their own errors as opposed to the target items. Finally, we showed normal learning of phonological sequences for GE when he was prevented from repeating his errors. These findings confirm that the ATL atrophy in SD disrupts phonological processing for semantically degraded words but leaves the phonological architecture intact. Consequently, when item errors are minimised, phonological STM can support the acquisition of new phoneme sequences in patients with SD. Copyright © 2011 Elsevier Ltd. All rights reserved.
Insights from the complete chloroplast genome into the evolution of Sesamum indicum L.

PubMed

Zhang, Haiyang; Li, Chun; Miao, Hongmei; Xiong, Songjin

2013-01-01

Sesame (Sesamum indicum L.) is one of the oldest oilseed crops. In order to investigate the evolutionary characters according to the Sesame Genome Project, apart from sequencing its nuclear genome, we sequenced the complete chloroplast genome of S. indicum cv. Yuzhi 11 (white seeded) using Illumina and 454 sequencing. Comparisons of chloroplast genomes between S. indicum and the 18 other higher plants were then analyzed. The chloroplast genome of cv. Yuzhi 11 contains 153,338 bp and a total of 114 unique genes (KC569603). The number of chloroplast genes in sesame is the same as that in Nicotiana tabacum, Vitis vinifera and Platanus occidentalis. The variation in the length of the large single-copy (LSC) regions and inverted repeats (IR) in sesame compared to 18 other higher plant species was the main contributor to size variation in the cp genome in these species. The 77 functional chloroplast genes, except for ycf1 and ycf2, were highly conserved. The deletion of the cp ycf1 gene sequence in cp genomes may be due either to its transfer to the nuclear genome, as has occurred in sesame, or direct deletion, as has occurred in Panax ginseng and Cucumis sativus. The sesame ycf2 gene is only 5,721 bp in length and has lost about 1,179 bp. Nucleotides 1-585 of ycf2 when queried in BLAST had hits in the sesame draft genome. Five repeats (R10, R12, R13, R14 and R17) were unique to the sesame chloroplast genome. We also found that IR contraction/expansion in the cp genome alters its rate of evolution. Chloroplast genes and repeats display the signature of convergent evolution in sesame and other species. These findings provide a foundation for further investigation of cp genome evolution in Sesamum and other higher plants.
GTF2IRD2 is located in the Williams–Beuren syndrome critical region 7q11.23 and encodes a protein with two TFII-I-like helix–loop–helix repeats

PubMed Central

Makeyev, Aleksandr V.; Erdenechimeg, Lkhamsuren; Mungunsukh, Ognoon; Roth, Jutta J.; Enkhmandakh, Badam; Ruddle, Frank H.; Bayarsaihan, Dashzeveg

2004-01-01

Williams–Beuren syndrome (also known as Williams syndrome) is caused by a deletion of a 1.55- to 1.84-megabase region from chromosome band 7q11.23. GTF2IRD1 and GTF2I, located within this critical region, encode proteins of the TFII-I family with multiple helix–loop–helix domains known as I repeats. In the present work, we characterize a third member, GTF2IRD2, which has sequence and structural similarity to the GTF2I and GTF2IRD1 paralogs. The ORF encodes a protein with several features characteristic of regulatory factors, including two I repeats, two leucine zippers, and a single Cys-2/His-2 zinc finger. The genomic organization of human, baboon, rat, and mouse genes is well conserved. Our exon-by-exon comparison has revealed that GTF2IRD2 is more closely related to GTF2I than to GTF2IRD1 and apparently is derived from the GTF2I sequence. The comparison of GTF2I and GTF2IRD2 genes revealed two distinct regions of homology, indicating that the helix–loop–helix domain structure of the GTF2IRD2 gene has been generated by two independent genomic duplications. We speculate that GTF2I is derived from GTF2IRD1 as a result of local duplication and the further evolution of its structure was associated with its functional specialization. Comparison of genomic sequences surrounding GTF2IRD2 genes in mice and humans allows refinement of the centromeric breakpoint position of the primate-specific inversion within the Williams–Beuren syndrome critical region. PMID:15243160
GTF2IRD2 is located in the Williams-Beuren syndrome critical region 7q11.23 and encodes a protein with two TFII-I-like helix-loop-helix repeats.

PubMed

Makeyev, Aleksandr V; Erdenechimeg, Lkhamsuren; Mungunsukh, Ognoon; Roth, Jutta J; Enkhmandakh, Badam; Ruddle, Frank H; Bayarsaihan, Dashzeveg

2004-07-27

Williams-Beuren syndrome (also known as Williams syndrome) is caused by a deletion of a 1.55- to 1.84-megabase region from chromosome band 7q11.23. GTF2IRD1 and GTF2I, located within this critical region, encode proteins of the TFII-I family with multiple helix-loop-helix domains known as I repeats. In the present work, we characterize a third member, GTF2IRD2, which has sequence and structural similarity to the GTF2I and GTF2IRD1 paralogs. The ORF encodes a protein with several features characteristic of regulatory factors, including two I repeats, two leucine zippers, and a single Cys-2/His-2 zinc finger. The genomic organization of human, baboon, rat, and mouse genes is well conserved. Our exon-by-exon comparison has revealed that GTF2IRD2 is more closely related to GTF2I than to GTF2IRD1 and apparently is derived from the GTF2I sequence. The comparison of GTF2I and GTF2IRD2 genes revealed two distinct regions of homology, indicating that the helix-loop-helix domain structure of the GTF2IRD2 gene has been generated by two independent genomic duplications. We speculate that GTF2I is derived from GTF2IRD1 as a result of local duplication and the further evolution of its structure was associated with its functional specialization. Comparison of genomic sequences surrounding GTF2IRD2 genes in mice and humans allows refinement of the centromeric breakpoint position of the primate-specific inversion within the Williams-Beuren syndrome critical region.
Effects of "D"-Amphetamine and Ethanol on Variable and Repetitive Key-Peck Sequences in Pigeons

ERIC Educational Resources Information Center

Ward, Ryan D.; Bailey, Ericka M.; Odum, Amy L.

2006-01-01

This experiment assessed the effects of "d"-Amphetamine and ethanol on reinforced variable and repetitive key-peck sequences in pigeons. Pigeons responded on two keys under a multiple schedule of Repeat and Vary components. In the Repeat component, completion of a target sequence of right, right, left, left resulted in food. In the Vary component,…
The mitochondrial genome of the legume Vigna radiata and the analysis of recombination across short mitochondrial repeats.

PubMed

Alverson, Andrew J; Zhuo, Shi; Rice, Danny W; Sloan, Daniel B; Palmer, Jeffrey D

2011-01-20

The mitochondrial genomes of seed plants are exceptionally fluid in size, structure, and sequence content, with the accumulation and activity of repetitive sequences underlying much of this variation. We report the first fully sequenced mitochondrial genome of a legume, Vigna radiata (mung bean), and show that despite its unexceptional size (401,262 nt), the genome is unusually depauperate in repetitive DNA and "promiscuous" sequences from the chloroplast and nuclear genomes. Although Vigna lacks the large, recombinationally active repeats typical of most other seed plants, a PCR survey of its modest repertoire of short (38-297 nt) repeats nevertheless revealed evidence for recombination across all of them. A set of novel control assays showed, however, that these results could instead reflect, in part or entirely, artifacts of PCR-mediated recombination. Consequently, we recommend that other methods, especially high-depth genome sequencing, be used instead of PCR to infer patterns of plant mitochondrial recombination. The average-sized but repeat- and feature-poor mitochondrial genome of Vigna makes it ever more difficult to generalize about the factors shaping the size and sequence content of plant mitochondrial genomes.
Comparative Genomics and Phylogenomics of East Asian Tulips (Amana, Liliaceae)

PubMed Central

Li, Pan; Lu, Rui-Sen; Xu, Wu-Qin; Ohi-Toma, Tetsuo; Cai, Min-Qi; Qiu, Ying-Xiong; Cameron, Kenneth M.; Fu, Cheng-Xin

2017-01-01

The genus Amana Honda (Liliaceae), when it is treated as separate from Tulipa, comprises six perennial herbaceous species that are restricted to China, Japan and the Korean Peninsula. Although all six Amana species have important medicinal and horticultural uses, studies focused on species identification and molecular phylogenetics are few. Here we report the nucleotide sequences of six complete Amana chloroplast (cp) genomes. The cp genomes of Amana range from 150,613 bp to 151,136 bp in length, all including a pair of inverted repeats (25,629–25,859 bp) separated by the large single-copy (81,482–82,218 bp) and small single-copy (17,366–17,465 bp) regions. Each cp genome equivalently contains 112 unique genes consisting of 30 transfer RNA genes, four ribosomal RNA genes, and 78 protein coding genes. Gene content, gene order, AT content, and IR/SC boundary structure are nearly identical among all Amana cp genomes. However, the relative contraction and expansion of the IR/SC borders among the six Amana cp genomes results in length variation among them. Simple sequence repeat (SSR) analyses of these Amana cp genomes indicate that the richest SSRs are A/T mononucleotides. The number of repeats among the six Amana species varies from 54 (A. anhuiensis) to 69 (Amana kuocangshanica) with palindromic (28–35) and forward repeats (23–30) as the most common types. Phylogenomic analyses based on these complete cp genomes and 74 common protein-coding genes strongly support the monophyly of the genus, and a sister relationship between Amana and Erythronium, rather than a shared common ancestor with Tulipa. Nine DNA markers (rps15–ycf1, accD–psaI, petA–psbJ, rpl32–trnL, atpH–atpI, petD–rpoA, trnS–trnG, psbM–trnD, and ycf4–cemA) with number of variable sites greater than 0.9% were identified, and these may be useful for future population genetic and phylogeographic studies of Amana species. PMID:28421090
Learning of goal-relevant and -irrelevant complex visual sequences in human V1.

PubMed

Rosenthal, Clive R; Mallik, Indira; Caballero-Gaudes, Cesar; Sereno, Martin I; Soto, David

2018-06-12

Learning and memory are supported by a network involving the medial temporal lobe and linked neocortical regions. Emerging evidence indicates that primary visual cortex (i.e., V1) may contribute to recognition memory, but this has been tested only with a single visuospatial sequence as the target memorandum. The present study used functional magnetic resonance imaging to investigate whether human V1 can support the learning of multiple, concurrent complex visual sequences involving discontinous (second-order) associations. Two peripheral, goal-irrelevant but structured sequences of orientated gratings appeared simultaneously in fixed locations of the right and left visual fields alongside a central, goal-relevant sequence that was in the focus of spatial attention. Pseudorandom sequences were introduced at multiple intervals during the presentation of the three structured visual sequences to provide an online measure of sequence-specific knowledge at each retinotopic location. We found that a network involving the precuneus and V1 was involved in learning the structured sequence presented at central fixation, whereas right V1 was modulated by repeated exposure to the concurrent structured sequence presented in the left visual field. The same result was not found in left V1. These results indicate for the first time that human V1 can support the learning of multiple concurrent sequences involving complex discontinuous inter-item associations, even peripheral sequences that are goal-irrelevant. Copyright © 2018. Published by Elsevier Inc.

Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...
Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

PubMed Central

Davis, C A; Wyatt, G R

1989-01-01

The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148
Amino acid sequence analysis of the annexin super-gene family of proteins.

PubMed

Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

1991-06-15

The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of the predictions and shows the power of techniques for the determination of tertiary structural information from the amino acid sequences of an aligned protein family.
Exception to the Rule: Genomic Characterization of Naturally Occurring Unusual Vibrio cholerae Strains with a Single Chromosome

DOE PAGES

Xie, Gary; Johnson, Shannon Lyn; Davenport, Karen Walston; ...

2017-08-29

Here, the genetic make-up of most bacteria is encoded in a single chromosome while about 10% have more than one chromosome. Among these, Vibrio cholerae, with two chromosomes, has served as a model system to study various aspects of chromosome maintenance, mainly replication, and faithful partitioning of multipartite genomes. Here, we describe the genomic characterization of strains that are an exception to the two chromosome rules: naturally occurring single-chromosome V. cholerae. Whole genome sequence analyses of NSCV1 and NSCV2 (natural single-chromosome vibrio) revealed that the Chr1 and Chr2 fusion junctions contain prophages, IS elements, and direct repeats, in addition tomore » large-scale chromosomal rearrangements such as inversions, insertions, and long tandem repeats elsewhere in the chromosome compared to prototypical two chromosome V. cholerae genomes. Many of the known cholera virulence factors are absent. The two origins of replication and associated genes are generally intact with synonymous mutations in some genes, as arerecAand mismatch repair (MMR) genes dam, mutH, and mutL; MutS function is probably impaired in NSCV2. These strains are ideal tools for studying mechanistic aspects of maintenance of chromosomes with multiple origins and other rearrangements and the biological, functional, and evolutionary significance of multipartite genome architecture in general.« less
Exception to the Rule: Genomic Characterization of Naturally Occurring Unusual Vibrio cholerae Strains with a Single Chromosome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, Gary; Johnson, Shannon Lyn; Davenport, Karen Walston

Here, the genetic make-up of most bacteria is encoded in a single chromosome while about 10% have more than one chromosome. Among these, Vibrio cholerae, with two chromosomes, has served as a model system to study various aspects of chromosome maintenance, mainly replication, and faithful partitioning of multipartite genomes. Here, we describe the genomic characterization of strains that are an exception to the two chromosome rules: naturally occurring single-chromosome V. cholerae. Whole genome sequence analyses of NSCV1 and NSCV2 (natural single-chromosome vibrio) revealed that the Chr1 and Chr2 fusion junctions contain prophages, IS elements, and direct repeats, in addition tomore » large-scale chromosomal rearrangements such as inversions, insertions, and long tandem repeats elsewhere in the chromosome compared to prototypical two chromosome V. cholerae genomes. Many of the known cholera virulence factors are absent. The two origins of replication and associated genes are generally intact with synonymous mutations in some genes, as arerecAand mismatch repair (MMR) genes dam, mutH, and mutL; MutS function is probably impaired in NSCV2. These strains are ideal tools for studying mechanistic aspects of maintenance of chromosomes with multiple origins and other rearrangements and the biological, functional, and evolutionary significance of multipartite genome architecture in general.« less
Discovery of Escherichia coli CRISPR sequences in an undergraduate laboratory.

PubMed

Militello, Kevin T; Lazatin, Justine C

2017-05-01

Clustered regularly interspaced short palindromic repeats (CRISPRs) represent a novel type of adaptive immune system found in eubacteria and archaebacteria. CRISPRs have recently generated a lot of attention due to their unique ability to catalog foreign nucleic acids, their ability to destroy foreign nucleic acids in a mechanism that shares some similarity to RNA interference, and the ability to utilize reconstituted CRISPR systems for genome editing in numerous organisms. In order to introduce CRISPR biology into an undergraduate upper-level laboratory, a five-week set of exercises was designed to allow students to examine the CRISPR status of uncharacterized Escherichia coli strains and to allow the discovery of new repeats and spacers. Students started the project by isolating genomic DNA from E. coli and amplifying the iap CRISPR locus using the polymerase chain reaction (PCR). The PCR products were analyzed by Sanger DNA sequencing, and the sequences were examined for the presence of CRISPR repeat sequences. The regions between the repeats, the spacers, were extracted and analyzed with BLASTN searches. Overall, CRISPR loci were sequenced from several previously uncharacterized E. coli strains and one E. coli K-12 strain. Sanger DNA sequencing resulted in the discovery of 36 spacer sequences and their corresponding surrounding repeat sequences. Five of the spacers were homologous to foreign (non-E. coli) DNA. Assessment of the laboratory indicates that improvements were made in the ability of students to answer questions relating to the structure and function of CRISPRs. Future directions of the laboratory are presented and discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 45(3):262-269, 2017. © 2016 The International Union of Biochemistry and Molecular Biology.
Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.

PubMed

Kohany, Oleksiy; Gentles, Andrew J; Hankus, Lukasz; Jurka, Jerzy

2006-10-25

Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases. We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments. Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at http://www.girinst.org/repbase/submission.html (RepbaseSubmitter) and http://www.girinst.org/censor/index.php (Censor).
Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence

PubMed Central

2017-01-01

During cell division, spindle fibers attach to chromosomes at centromeres. The DNA sequence at regional centromeres is fast evolving with no conserved genetic signature for centromere identity. Instead CENH3, a centromere-specific histone H3 variant, is the epigenetic signature that specifies centromere location across both plant and animal kingdoms. Paradoxically, CENH3 is also adaptively evolving. An ongoing question is whether CENH3 evolution is driven by a functional relationship with the underlying DNA sequence. Here, we demonstrate that despite extensive protein sequence divergence, CENH3 histones from distant species assemble centromeres on the same underlying DNA sequence. We first characterized the organization and diversity of centromere repeats in wild-type Arabidopsis thaliana. We show that A. thaliana CENH3-containing nucleosomes exhibit a strong preference for a unique subset of centromeric repeats. These sequences are largely missing from the genome assemblies and represent the youngest and most homogeneous class of repeats. Next, we tested the evolutionary specificity of this interaction in a background in which the native A. thaliana CENH3 is replaced with CENH3s from distant species. Strikingly, we find that CENH3 from Lepidium oleraceum and Zea mays, although specifying epigenetically weaker centromeres that result in genome elimination upon outcrossing, show a binding pattern on A. thaliana centromere repeats that is indistinguishable from the native CENH3. Our results demonstrate positional stability of a highly diverged CENH3 on independently evolved repeats, suggesting that the sequence specificity of centromeres is determined by a mechanism independent of CENH3. PMID:28223399
Repeating aftershocks of the great 2004 Sumatra and 2005 Nias earthquakes

NASA Astrophysics Data System (ADS)

Yu, Wen-che; Song, Teh-Ru Alex; Silver, Paul G.

2013-05-01

We investigate repeating aftershocks associated with the great 2004 Sumatra-Andaman (Mw 9.2) and 2005 Nias-Simeulue (Mw 8.6) earthquakes by cross-correlating waveforms recorded by the regional seismographic station PSI and teleseismic stations. We identify 10 and 18 correlated aftershock sequences associated with the great 2004 Sumatra and 2005 Nias earthquakes, respectively. The majority of the correlated aftershock sequences are located near the down-dip end of a large afterslip patch. We determine the precise relative locations of event pairs among these sequences and estimate the source rupture areas. The correlated event pairs identified are appropriately referred to as repeating aftershocks, in that the source rupture areas are comparable and significantly overlap within a sequence. We use the repeating aftershocks to estimate afterslip based on the slip-seismic moment scaling relationship and to infer the temporal decay rate of the recurrence interval. The estimated afterslip resembles that measured from the near-field geodetic data to the first order. The decay rate of repeating aftershocks as a function of lapse time t follows a power-law decay 1/tp with the exponent p in the range 0.8-1.1. Both types of observations indicate that repeating aftershocks are governed by post-seismic afterslip.
Genome-Wide Stochastic Adaptive DNA Amplification at Direct and Inverted DNA Repeats in the Parasite Leishmania

PubMed Central

Plourde, Marie; Gingras, Hélène; Roy, Gaétan; Lapointe, Andréanne; Leprohon, Philippe; Papadopoulou, Barbara; Corbeil, Jacques; Ouellette, Marc

2014-01-01

Gene amplification of specific loci has been described in all kingdoms of life. In the protozoan parasite Leishmania, the product of amplification is usually part of extrachromosomal circular or linear amplicons that are formed at the level of direct or inverted repeated sequences. A bioinformatics screen revealed that repeated sequences are widely distributed in the Leishmania genome and the repeats are chromosome-specific, conserved among species, and generally present in low copy number. Using sensitive PCR assays, we provide evidence that the Leishmania genome is continuously being rearranged at the level of these repeated sequences, which serve as a functional platform for constitutive and stochastic amplification (and deletion) of genomic segments in the population. This process is adaptive as the copy number of advantageous extrachromosomal circular or linear elements increases upon selective pressure and is reversible when selection is removed. We also provide mechanistic insights on the formation of circular and linear amplicons through RAD51 recombinase-dependent and -independent mechanisms, respectively. The whole genome of Leishmania is thus stochastically rearranged at the level of repeated sequences, and the selection of parasite subpopulations with changes in the copy number of specific loci is used as a strategy to respond to a changing environment. PMID:24844805
Population-scale whole genome sequencing identifies 271 highly polymorphic short tandem repeats from Japanese population.

PubMed

Hirata, Satoshi; Kojima, Kaname; Misawa, Kazuharu; Gervais, Olivier; Kawai, Yosuke; Nagasaki, Masao

2018-05-01

Forensic DNA typing is widely used to identify missing persons and plays a central role in forensic profiling. DNA typing usually uses capillary electrophoresis fragment analysis of PCR amplification products to detect the length of short tandem repeat (STR) markers. Here, we analyzed whole genome data from 1,070 Japanese individuals generated using massively parallel short-read sequencing of 162 paired-end bases. We have analyzed 843,473 STR loci with two to six basepair repeat units and cataloged highly polymorphic STR loci in the Japanese population. To evaluate the performance of the cataloged STR loci, we compared 23 STR loci, widely used in forensic DNA typing, with capillary electrophoresis based STR genotyping results in the Japanese population. Seventeen loci had high correlations and high call rates. The other six loci had low call rates or low correlations due to either the limitations of short-read sequencing technology, the bioinformatics tool used, or the complexity of repeat patterns. With these analyses, we have also purified the suitable 218 STR loci with four basepair repeat units and 53 loci with five basepair repeat units both for short read sequencing and PCR based technologies, which would be candidates to the actual forensic DNA typing in Japanese population.
Target sites for the transposition of rat long interspersed repeated DNA elements (LINEs) are not random.

PubMed Central

Furano, A V; Somerville, C C; Tsichlis, P N; D'Ambrosio, E

1986-01-01

The long interspersed repeated DNA family of rats (LINE or L1Rn family) contains about 40,000 6.7-kilobase (kb) long members (1). LINE members may be currently mobile since their presence or absence causes allelic variation at three single copy loci (2, 3): insulin 1, Moloney leukemia virus integration 2 (Mlvi-2) (4), and immunoglobulin heavy chain (Igh). To characterize target sites for LINE insertion, we compared the DNA sequences of the unoccupied Mlvi-2 target site, its LINE-containing allele, and several other LINE-containing sites. Although not homologous overall, the target sites share three characteristics: First, depending on the site, they are from 68% to 86% (A+T) compared to 58% (A+T) for total rat DNA (5). Depending on the site, a 7- to 15-bp target site sequence becomes duplicated and flanks the inserted LINE member. The second is a version (0 or 1 mismatch) of the hexanucleotide, TACTCA, which is also present in the LINE member, in a highly conserved region located just before the A-rich right end of the LINE member. The third is a stretch of alternating purine/pyrimidine (PQ). The A-rich right ends of different LINE members vary in length and composition, and the sequence of a particularly long one suggests that it contains the A-rich target site from a previous transposition. PMID:3012480
Using long ssDNA polynucleotides to amplify STRs loci in degraded DNA samples

PubMed Central

Pérez Santángelo, Agustín; Corti Bielsa, Rodrigo M.; Sala, Andrea; Ginart, Santiago; Corach, Daniel

2017-01-01

Obtaining informative short tandem repeat (STR) profiles from degraded DNA samples is a challenging task usually undermined by locus or allele dropouts and peak-high imbalances observed in capillary electrophoresis (CE) electropherograms, especially for those markers with large amplicon sizes. We hereby show that the current STR assays may be greatly improved for the detection of genetic markers in degraded DNA samples by using long single stranded DNA polynucleotides (ssDNA polynucleotides) as surrogates for PCR primers. These long primers allow a closer annealing to the repeat sequences, thereby reducing the length of the template required for the amplification in fragmented DNA samples, while at the same time rendering amplicons of larger sizes suitable for multiplex assays. We also demonstrate that the annealing of long ssDNA polynucleotides does not need to be fully complementary in the 5’ region of the primers, thus allowing for the design of practically any long primer sequence for developing new multiplex assays. Furthermore, genotyping of intact DNA samples could also benefit from utilizing long primers since their close annealing to the target STR sequences may overcome wrong profiling generated by insertions/deletions present between the STR region and the annealing site of the primers. Additionally, long ssDNA polynucleotides might be utilized in multiplex PCR assays for other types of degraded or fragmented DNA, e.g. circulating, cell-free DNA (ccfDNA). PMID:29099837
White spot syndrome virus (WSSV) genome stability maintained over six passages through three different penaeid shrimp species.

PubMed

Sindhupriya, M; Saravanan, P; Otta, S K; Amarnath, C Bala; Arulraj, R; Bhuvaneswari, T; Praveena, P Ezhil; Jithendran, K P; Ponniah, A G

2014-08-21

White spot syndrome virus (WSSV) replicates rapidly, can be extremely pathogenic and is a common cause of mass mortality in cultured shrimp. Variable number tandem repeat (VNTR) sequences present in the open reading frame (ORF)94, ORF125 and ORF75 regions of the WSSV genome have been used widely as genetic markers in epidemiological studies. However, reports that VNTRs might evolve rapidly following even a single transmission through penaeid shrimp or other crustacean hosts have created confusion as to how VNTR data is interpreted. To examine VNTR stability again, 2 WSSV strains (PmTN4RU and LvAP11RU) with differing ORF94 tandem repeat numbers and slight differences in apparent virulence were passaged sequentially 6 times through black tiger shrimp Penaeus monodon, Indian white shrimp Feneropenaeus indicus or Pacific white leg shrimp Litopenaeus vannamei. PCR analyses to genotype the ORF94, ORF125 and ORF75 VNTRs did not identify any differences from either of the 2 parental WSSV strains after multiple passages through any of the shrimp species. These data were confirmed by sequence analysis and indicate that the stability of the genome regions containing these VNTRs is quite high at least for the WSSV strains, hosts and number of passages examined and that the VNTR sequences thus represent useful genetic markers for studying WSSV epidemiology.
Analysis of resistance genes of clinical Pannonibacter phragmitetus strain 31801 by complete genome sequencing.

PubMed

Ming, De-Song; Chen, Qing-Qing; Chen, Xiao-Tin

2018-05-14

To clarify the resistance mechanisms of Pannonibacter phragmitetus 31801, isolated from the blood of a liver abscess patient, at the genomic level, we performed whole genomic sequencing using a PacBio RS II single-molecule real-time long-read sequencer. Bioinformatic analysis of the resulting sequence was then carried out to identify any possible resistance genes. Analyses included Basic Local Alignment Search Tool searches against the Antibiotic Resistance Genes Database, ResFinder analysis of the genome sequence, and Resistance Gene Identifier analysis within the Comprehensive Antibiotic Resistance Database. Prophages, clustered regularly interspaced short palindromic repeats (CRISPR), and other putative virulence factors were also identified using PHAST, CRISPRfinder, and the Virulence Factors Database, respectively. The circular chromosome and single plasmid of P. phragmitetus 31801 contained multiple antibiotic resistance genes, including those coding for three different types of β-lactamase [NPS β-lactamase (EC 3.5.2.6), β-lactamase class C, and a metal-dependent hydrolase of β-lactamase superfamily I]. In addition, genes coding for subunits of several multidrug-resistance efflux pumps were identified, including those targeting macrolides (adeJ, cmeB), tetracycline (acrB, adeAB), fluoroquinolones (acrF, ceoB), and aminoglycosides (acrD, amrB, ceoB, mexY, smeB). However, apart from the tripartite macrolide efflux pump macAB-tolC, the genome did not appear to contain the complete complement of subunit genes required for production of most of the major multidrug-resistance efflux pumps.
Effective DNA Inhibitors of Cathepsin G by In Vitro Selection

PubMed Central

Gatto, Barbara; Vianini, Elena; Lucatello, Lorena; Sissi, Claudia; Moltrasio, Danilo; Pescador, Rodolfo; Porta, Roberto; Palumbo, Manlio

2008-01-01

Cathepsin G (CatG) is a chymotrypsin-like protease released upon degranulation of neutrophils. In several inflammatory and ischaemic diseases the impaired balance between CatG and its physiological inhibitors leads to tissue destruction and platelet aggregation. Inhibitors of CatG are suitable for the treatment of inflammatory diseases and procoagulant conditions. DNA released upon the death of neutrophils at injury sites binds CatG. Moreover, short DNA fragments are more inhibitory than genomic DNA. Defibrotide, a single stranded polydeoxyribonucleotide with antithrombotic effect is also a potent CatG inhibitor. Given the above experimental evidences we employed a selection protocol to assess whether DNA inhibition of CatG may be ascribed to specific sequences present in defibrotide DNA. A Selex protocol was applied to identify the single-stranded DNA sequences exhibiting the highest affinity for CatG, the diversity of a combinatorial pool of oligodeoxyribonucleotides being a good representation of the complexity found in defibrotide. Biophysical and biochemical studies confirmed that the selected sequences bind tightly to the target enzyme and also efficiently inhibit its catalytic activity. Sequence analysis carried out to unveil a motif responsible for CatG recognition showed a recurrence of alternating TG repeats in the selected CatG binders, adopting an extended conformation that grants maximal interaction with the highly charged protein surface. This unprecedented finding is validated by our results showing high affinity and inhibition of CatG by specific DNA sequences of variable length designed to maximally reduce pairing/folding interactions. PMID:19325843
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2.

PubMed

Gotter, Anthony L; Shaikh, Tamim H; Budarf, Marcia L; Rhodes, C Harker; Emanuel, Beverly S

2004-01-01

Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem-loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem-loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure.
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2

PubMed Central

Gotter, Anthony L.; Shaikh, Tamim H.; Budarf, Marcia L.; Rhodes, C. Harker; Emanuel, Beverly S.

2010-01-01

Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem–loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem–loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure. PMID:14613967
Implications of the plastid genome sequence of typha (typhaceae, poales) for understanding genome evolution in poaceae.

PubMed

Guisinger, Mary M; Chumley, Timothy W; Kuehl, Jennifer V; Boore, Jeffrey L; Jansen, Robert K

2010-02-01

Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes.
Comparative molecular cytogenetics of major repetitive sequence families of three Dendrobium species (Orchidaceae) from Bangladesh

PubMed Central

Begum, Rabeya; Alam, Sheikh Shamimul; Menzel, Gerhard; Schmidt, Thomas

2009-01-01

Background and Aims Dendrobium species show tremendous morphological diversity and have broad geographical distribution. As repetitive sequence analysis is a useful tool to investigate the evolution of chromosomes and genomes, the aim of the present study was the characterization of repetitive sequences from Dendrobium moschatum for comparative molecular and cytogenetic studies in the related species Dendrobium aphyllum, Dendrobium aggregatum and representatives from other orchid genera. Methods In order to isolate highly repetitive sequences, a c0t-1 DNA plasmid library was established. Repeats were sequenced and used as probes for Southern hybridization. Sequence divergence was analysed using bioinformatic tools. Repetitive sequences were localized along orchid chromosomes by fluorescence in situ hybridization (FISH). Key Results Characterization of the c0t-1 library resulted in the detection of repetitive sequences including the (GA)n dinucleotide DmoO11, numerous Arabidopsis-like telomeric repeats and the highly amplified dispersed repeat DmoF14. The DmoF14 repeat is conserved in six Dendrobium species but diversified in representative species of three other orchid genera. FISH analyses showed the genome-wide distribution of DmoF14 in D. moschatum, D. aphyllum and D. aggregatum. Hybridization with the telomeric repeats demonstrated Arabidopsis-like telomeres at the chromosome ends of Dendrobium species. However, FISH using the telomeric probe revealed two pairs of chromosomes with strong intercalary signals in D. aphyllum. FISH showed the terminal position of 5S and 18S–5·8S–25S rRNA genes and a characteristic number of rDNA sites in the three Dendrobium species. Conclusions The repeated sequences isolated from D. moschatum c0t-1 DNA constitute major DNA families of the D. moschatum, D. aphyllum and D. aggregatum genomes with DmoF14 representing an ancient component of orchid genomes. Large intercalary telomere-like arrays suggest chromosomal rearrangements in D. aphyllum while the number and localization of rRNA genes as well as the species-specific distribution pattern of an abundant microsatellite reflect the genomic diversity of the three Dendrobium species. PMID:19635741

Some links on this page may take you to non-federal websites. Their policies may differ from this site.