Science.gov

Sample records for highly conserved sequences

  1. Highly conserved repetitive DNA sequences are present at human centromeres.

    PubMed Central

    Grady, D L; Ratliff, R L; Robinson, D L; McCanlies, E C; Meyne, J; Moyzis, R K

    1992-01-01

    Highly conserved repetitive DNA sequence clones, largely consisting of (GGAAT)n repeats, have been isolated from a human recombinant repetitive DNA library by high-stringency hybridization with rodent repetitive DNA. This sequence, the predominant repetitive sequence in human satellites II and III, is similar to the essential core DNA of the Saccharomyces cerevisiae centromere, centromere DNA element (CDE) III. In situ hybridization to human telophase and Drosophila polytene chromosomes shows localization of the (GGAAT)n sequence to centromeric regions. Hyperchromicity studies indicate that the (GGAAT)n sequence exhibits unusual hydrogen bonding properties. The purine-rich strand alone has the same thermal stability as the duplex. Hyperchromicity studies of synthetic DNA variants indicate that all sequences with the composition (AATGN)n exhibit this unusual thermal stability. DNA-mobility-shift assays indicate that specific HeLa-cell nuclear proteins recognize this sequence with a relative affinity greater than 10(5). The extreme evolutionary conservation of this DNA sequence, its centromeric location, its unusual hydrogen bonding properties, its high affinity for specific nuclear proteins, and its similarity to functional centromeres isolated from yeast suggest that this sequence may be a component of the functional human centromere. Images PMID:1542662

  2. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. High sequence conservation among cucumber mosaic virus isolates from lily.

    PubMed

    Chen, Y K; Derks, A F; Langeveld, S; Goldbach, R; Prins, M

    2001-08-01

    For classification of Cucumber mosaic virus (CMV) isolates from ornamental crops of different geographical areas, these were characterized by comparing the nucleotide sequences of RNAs 4 and the encoded coat proteins. Within the ornamental-infecting CMV viruses both subgroups were represented. CMV isolates of Alstroemeria and crocus were classified as subgroup II isolates, whereas 8 other isolates, from lily, gladiolus, amaranthus, larkspur, and lisianthus, were identified as subgroup I members. In general, nucleotide sequence comparisons correlated well with geographic distribution, with one notable exception: the analyzed nucleotide sequences of 5 lily isolates showed remarkably high homology despite different origins.

  4. Highly conserved d-loop sequences in woolly mouse opossums Marmosa (Micoureus).

    PubMed

    Rocha, Rita Gomes; Leite, Yuri Luiz Reis; Ferreira, Eduardo; Justino, Juliana; Costa, Leonora Pires

    2012-04-01

    This study reports the occurrence of highly conserved d-loop sequences in the mitochondrial genome of the woolly mouse opossum genus Marmosa subgenus Micoureus (Mammalia, Didelphimorphia, Didelphidae). Sixty-six sequences of Marmosa (Micoureus) demerarae, Marmosa (Micoureus) constantiae, and Marmosa (Micoureus) paraguayanus were amplified using universal d-loop primers and virtually no genetic differences were detected within and among species. These sequences matched the control region of the mitochondrial marsupial genome. Analyses of qualitative aspects of these sequences revealed that their structural composition is very similar to the d-loop region of other didelphid species. However, the total lack of variability has not been reported from other closely related species. The data analyzed here support the occurrence of highly conserved d-loop sequences, and we found no support for the hypothesis that these sequences are d-loop-like nuclear pseudogenes. Furthermore, the control and flanking regions obtained with different primers corroborate the lack of variability of the d-loop sequences in the mitochondrial genome of Marmosa (Micoureus).

  5. High-Throughput Sequencing, Characterization and Detection of New and Conserved Cucumber miRNAs

    PubMed Central

    Martínez, Germán; Forment, Javier; Llave, Cesar; Pallás, Vicente; Gómez, Gustavo

    2011-01-01

    Micro RNAS (miRNAs) are a class of endogenous small non coding RNAs involved in the post-transcriptional regulation of gene expression. In plants, a great number of conserved and specific miRNAs, mainly arising from model species, have been identified to date. However less is known about the diversity of these regulatory RNAs in vegetal species with agricultural and/or horticultural importance. Here we report a combined approach of bioinformatics prediction, high-throughput sequencing data and molecular methods to analyze miRNAs populations in cucumber (Cucumis sativus) plants. A set of 19 conserved and 6 known but non-conserved miRNA families were found in our cucumber small RNA dataset. We also identified 7 (3 with their miRNA* strand) not previously described miRNAs, candidates to be cucumber-specific. To validate their description these new C. sativus miRNAs were detected by northern blot hybridization. Additionally, potential targets for most conserved and new miRNAs were identified in cucumber genome. In summary, in this study we have identified, by first time, conserved, known non-conserved and new miRNAs arising from an agronomically important species such as C. sativus. The detection of this complex population of regulatory small RNAs suggests that similarly to that observe in other plant species, cucumber miRNAs may possibly play an important role in diverse biological and metabolic processes. PMID:21603611

  6. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris).

    PubMed

    Zhang, Wenping; Zhang, Zhihe; Shen, Fujun; Hou, Rong; Lv, Xiaoping; Yue, Bisong

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8-17 million years ago in the tiger and 4.6-16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular 'fossils' that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup.

  7. A highly conserved repeated chromosomal sequence in the radioresistant bacterium Deinococcus radiodurans SARK

    SciTech Connect

    Lennon, E.; Gutman, P.D.; Hanlong Yao; Minton, K.W. )

    1991-03-01

    A DNA fragment containing a portion of a DNA damage-inducible gene from Deinococcus radiodurans SARK hybridized to numerous fragments of SARK genomic DNA because of a highly conserved repetitive chromosomal element. The element is of variable length, ranging from 150 to 192 bp, depending on the absence or presence of one or two 21-bp sequences located internally. A putative translational start site of the damage-inducible gene is within the reiterated element. The element contains dyad symmetries that suggest modes of transcriptional and/or translational control.

  8. Identification of the Conserved and Novel miRNAs in Mulberry by High-Throughput Sequencing

    PubMed Central

    Jia, Ling; Zhang, Dayan; Qi, Xiwu; Ma, Bi; Xiang, Zhonghuia; He, Ningjia

    2014-01-01

    miRNAs are a class of non-coding endogenous small RNAs. They play vital roles in plant growth, development, and response to biotic and abiotic stress by negatively regulating genes. Mulberry trees are economically important species with multiple uses. However, to date, little is known about mulberry miRNAs and their target genes. In the present study, three small mulberry RNA libraries were constructed and sequenced using high-throughput sequencing technology. Results showed 85 conserved miRNAs belonging to 31 miRNA families and 262 novel miRNAs at 371 loci. Quantitative real-time PCR (qRT-PCR) analysis confirmed the expression pattern of 9 conserved and 5 novel miRNAs in leaves, bark, and male flowers. A total of 332 potential target genes were predicted to be associated with these 113 novel miRNAs. These results provide a basis for further understanding of mulberry miRNAs and the biological processes in which they are involved. PMID:25118991

  9. Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development

    PubMed Central

    Mahmoudi Saber, Morteza

    2017-01-01

    Abstract Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes. PMID:28633494

  10. High Throughput Sequencing of T Cell Antigen Receptors Reveals a Conserved TCR Repertoire.

    PubMed

    Hou, Xianliang; Lu, Chong; Chen, Sisi; Xie, Qian; Cui, Guangying; Chen, Jianing; Chen, Zhi; Wu, Zhongwen; Ding, Yulong; Ye, Ping; Dai, Yong; Diao, Hongyan

    2016-03-01

    The T-cell receptor (TCR) repertoire is a mirror of the human immune system that reflects processes caused by infections, cancer, autoimmunity, and aging. Next-generation sequencing has become a powerful tool for deep TCR profiling. Herein, we used this technology to study the repertoire features of TCR beta chain in the blood of healthy individuals.Peripheral blood samples were collected from 10 healthy donors. T cells were isolated with anti-human CD3 magnetic beads according to the manufacturer's protocol. We then combined multiplex-PCR, Illumina sequencing, and IMGT/High V-QUEST to analyze the characteristics and polymorphisms of the TCR.Most of the individual T cell clones were present at very low frequencies, suggesting that they had not undergone clonal expansion. The usage frequencies of the TCR beta variable, beta joining, and beta diversity gene segments were similar among T cells from different individuals. Notably, the usage frequency of individual nucleotides and amino acids within complementarity-determining region (CDR3) intervals was remarkably consistent between individuals. Moreover, our data show that terminal deoxynucleotidyl transferase activity was biased toward the insertion of G (31.92%) and C (27.14%) over A (21.82%) and T (19.12%) nucleotides.Some conserved features could be observed in the composition of CDR3, which may inform future studies of human TCR gene recombination.

  11. High Throughput Sequencing of T Cell Antigen Receptors Reveals a Conserved TCR Repertoire

    PubMed Central

    Hou, Xianliang; Lu, Chong; Chen, Sisi; Xie, Qian; Cui, Guangying; Chen, Jianing; Chen, Zhi; Wu, Zhongwen; Ding, Yulong; Ye, Ping; Dai, Yong; Diao, Hongyan

    2016-01-01

    Abstract The T-cell receptor (TCR) repertoire is a mirror of the human immune system that reflects processes caused by infections, cancer, autoimmunity, and aging. Next-generation sequencing has become a powerful tool for deep TCR profiling. Herein, we used this technology to study the repertoire features of TCR beta chain in the blood of healthy individuals. Peripheral blood samples were collected from 10 healthy donors. T cells were isolated with anti-human CD3 magnetic beads according to the manufacturer's protocol. We then combined multiplex-PCR, Illumina sequencing, and IMGT/High V-QUEST to analyze the characteristics and polymorphisms of the TCR. Most of the individual T cell clones were present at very low frequencies, suggesting that they had not undergone clonal expansion. The usage frequencies of the TCR beta variable, beta joining, and beta diversity gene segments were similar among T cells from different individuals. Notably, the usage frequency of individual nucleotides and amino acids within complementarity-determining region (CDR3) intervals was remarkably consistent between individuals. Moreover, our data show that terminal deoxynucleotidyl transferase activity was biased toward the insertion of G (31.92%) and C (27.14%) over A (21.82%) and T (19.12%) nucleotides. Some conserved features could be observed in the composition of CDR3, which may inform future studies of human TCR gene recombination. PMID:26962778

  12. Characterization and comparative analyses of zebrafish intelectins: highly conserved sequences, diversified structures and functions.

    PubMed

    Lin, Bin; Cao, Zhen; Su, Peng; Zhang, Haibo; Li, Mengzhen; Lin, Yiqun; Zhao, Dezhi; Shen, Yang; Jing, Chenfeng; Chen, Shangwu; Xu, Anlong

    2009-03-01

    Intelectin family, also called the X-lectin family, is a newly discovered gene family involved in development and innate immunity. However, no research was carried out for this gene family in the model organism zebrafish. Here we present the first characterization of seven zebrafish intelectins (zINTLs) and the first systematic comparative analysis of intelectins from various species in order to provide some clues to the function and evolution of this gene family. We examined the expression patterns of zINTLs in various development stages, normal adults, and Aeromonas salmonicida infected adults. Results showed that zINTL1-3 were highly expressed in one or several adult tissues. zINTL4-7, however, were expressed at quite low levels both in adults and various development stages, and some of them showed relaxation of functional constrains as revealed by K(a)/K(s) calculation. Of the seven zINTLs, zINTL3 was expressed predominantly in the liver and highly up-regulated upon infection, suggesting its important roles in immunity. Based on the characterization of zebrafish intelectins, we then conducted a systematic survey of intelectin members in various species and made comparative analyses. We found out that intelectin family may be a deuterostome specific gene family; and their expression patterns, quaternary structures and glycosylations vary considerably among various species, though their sequences are highly conserved. Moreover, these varied features have evolved multiple times independently in different species, resulting in species-specific protein structures and expression patterns.

  13. Characterization of an Unusually Conserved Alui Highly Reiterated DNA Sequence Family from the Honeybee, Apis Mellifera

    PubMed Central

    Tares, S.; Cornuet, J. M.; Abad, P.

    1993-01-01

    An AluI family of highly reiterated nontranscribed sequences has been found in the genome of the honeybee Apis mellifera. This repeated sequence is shown to be present at approximately 23,000 copies per haploid genome constituting about 2% of the total genomic DNA. The nucleotide sequence of 10 monomers was determined. The consensus sequence is 176 nucleotides long and has an A + T content of 58%. There are clusters of both direct and inverted repeats. Internal subrepeating units ranging from 11 to 17 nucleotides are observed, suggesting that it could have evolved from a shorter sequence. DNA sequence data reveal that this repeat class is unusually homogeneous compared to the other class of invertebrate highly reiterated DNA sequences. The average pairwise sequence divergence between the repeats is 2.5%. In spite of this unusual homogeneity, divergence has been found in the repeated sequence hybridization ladder between four different honeybee subspecies. Therefore, the AluI highly reiterated sequences provide a new probe for fingerprinting in A. m. mellifera. PMID:8104160

  14. Low molecular weight serine protease inhibitors from insects are proteins with highly conserved sequences.

    PubMed

    Boigegrain, R A; Pugnière, M; Paroutaud, P; Castro, B; Brehélin, M

    2000-02-01

    A low molecular weight protease inhibitor peptide found in ovaries of the desert locust Schistocerca gregaria (SGPI-2), was purified from plasma of the same locust and sequenced. It was named SGCI. It was found active towards chymotrypsin and human leukocyte elastase. SGCI was synthesized using a solid-phase procedure and the sequence of its reactive site for chymotrypsin was determined. Compared with an inhibitor purified earlier from another locust species, the total sequence of SGCI showed 88% identity. In particular, the sequence of the reactive site of these inhibitors was identical. Our search for a closely related peptide in an insect species far removed from locusts, the lepidopteran Spodoptera littoralis, was unfruitful but a different chymotrypsin inhibitor, belonging to the Kazal family, was found whose mass is greater than that of SGCI (20 vs 3.6 kDa). Its N-terminal sequence shares 80% identity with that of a chymotrypsin inhibitor purified earlier from the haemolymph of another lepidopteran. Conservation of the amino acid sequence in the reactive site seems to be an exception among protease inhibitors.

  15. Evolutionarily conserved sequences on human chromosome 21

    SciTech Connect

    Frazer, Kelly A.; Sheehan, John B.; Stokowski, Renee P.; Chen, Xiyin; Hosseini, Roya; Cheng, Jan-Fang; Fodor, Stephen P.A.; Cox, David R.; Patil, Nila

    2001-09-01

    Comparison of human sequences with the DNA of other mammals is an excellent means of identifying functional elements in the human genome. Here we describe the utility of high-density oligonucleotide arrays as a rapid approach for comparing human sequences with the DNA of multiple species whose sequences are not presently available. High-density arrays representing approximately 22.5 Mb of nonrepetitive human chromosome 21 sequence were synthesized and then hybridized with mouse and dog DNA to identify sequences conserved between humans and mice (human-mouse elements) and between humans and dogs (human-dog elements). Our data show that sequence comparison of multiple species provides a powerful empiric method for identifying actively conserved elements in the human genome. A large fraction of these evolutionarily conserved elements are present in regions on chromosome 21 that do not encode known genes.

  16. Touchdown digital polymerase chain reaction for quantification of highly conserved sequences in the HIV-1 genome.

    PubMed

    De Spiegelaere, Ward; Malatinkova, Eva; Kiselinova, Maja; Bonczkowski, Pawel; Verhofstede, Chris; Vogelaers, Dirk; Vandekerckhove, Linos

    2013-08-15

    Digital polymerase chain reaction (PCR) is an emerging absolute quantification method based on the limiting dilution principle and end-point PCR. This methodology provides high flexibility in assay design without influencing quantitative accuracy. This article describes an assay to quantify HIV DNA that targets a highly conserved region of the HIV-1 genome that hampers optimal probe design. To maintain high specificity and allow probe binding and hydrolysis of a probe with low melting temperature, a two-stage touchdown PCR was designed with a first round of amplification at high temperature and a subsequent round at low temperature to allow accumulation of fluorescence.

  17. High-throughput genomic sequencing of cassava bacterial blight strains identifies conserved effectors to target for durable resistance.

    PubMed

    Bart, Rebecca; Cohn, Megan; Kassen, Andrew; McCallum, Emily J; Shybut, Mikel; Petriello, Annalise; Krasileva, Ksenia; Dahlbeck, Douglas; Medina, Cesar; Alicai, Titus; Kumar, Lava; Moreira, Leandro M; Rodrigues Neto, Júlio; Verdier, Valerie; Santana, María Angélica; Kositcharoenkul, Nuttima; Vanderschuren, Hervé; Gruissem, Wilhelm; Bernal, Adriana; Staskawicz, Brian J

    2012-07-10

    Cassava bacterial blight (CBB), incited by Xanthomonas axonopodis pv. manihotis (Xam), is the most important bacterial disease of cassava, a staple food source for millions of people in developing countries. Here we present a widely applicable strategy for elucidating the virulence components of a pathogen population. We report Illumina-based draft genomes for 65 Xam strains and deduce the phylogenetic relatedness of Xam across the areas where cassava is grown. Using an extensive database of effector proteins from animal and plant pathogens, we identify the effector repertoire for each sequenced strain and use a comparative sequence analysis to deduce the least polymorphic of the conserved effectors. These highly conserved effectors have been maintained over 11 countries, three continents, and 70 y of evolution and as such represent ideal targets for developing resistance strategies.

  18. Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences.

    PubMed

    Doğan, Tunca; Karaçalı, Bilge

    2013-01-01

    Identifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences.

  19. Sequence Fingerprints of MicroRNA Conservation

    PubMed Central

    Shi, Bing; Gao, Wei; Wang, Juan

    2012-01-01

    It is known that the conservation of protein-coding genes is associated with their sequences both various species, such as animals and plants. However, the association between microRNA (miRNA) conservation and their sequences in various species remains unexplored. Here we report the association of miRNA conservation with its sequence features, such as base content and cleavage sites, suggesting that miRNA sequences contain the fingerprints for miRNA conservation. More interestingly, different species show different and even opposite patterns between miRNA conservation and sequence features. For example, mammalian miRNAs show a positive/negative correlation between conservation and AU/GC content, whereas plant miRNAs show a negative/positive correlation between conservation and AU/GC content. Further analysis puts forward the hypothesis that the introns of protein-coding genes may be a main driving force for the origin and evolution of mammalian miRNAs. At the 5′ end, conserved miRNAs have a preference for base U, while less-conserved miRNAs have a preference for a non-U base in mammals. This difference does not exist in insects and plants, in which both conserved miRNAs and less-conserved miRNAs have a preference for base U at the 5′ end. We further revealed that the non-U preference at the 5′ end of less-conserved mammalian miRNAs is associated with miRNA function diversity, which may have evolved from the pressure of a highly sophisticated environmental stimulus the mammals encountered during evolution. These results indicated that miRNA sequences contain the fingerprints for conservation, and these fingerprints vary according to species. More importantly, the results suggest that although species share common mechanisms by which miRNAs originate and evolve, mammals may develop a novel mechanism for miRNA origin and evolution. In addition, the fingerprint found in this study can be predictor of miRNA conservation, and the findings are helpful in achieving a

  20. High-throughput sequencing discovery of conserved and novel microRNAs in Chinese cabbage (Brassica rapa L. ssp. pekinensis).

    PubMed

    Wang, Fengde; Li, Libin; Liu, Lifeng; Li, Huayin; Zhang, Yihui; Yao, Yingyin; Ni, Zhongfu; Gao, Jianwei

    2012-07-01

    MicroRNAs (miRNAs) are a class of 21-24 nucleotide non-coding RNAs that down-regulate gene expression by cleaving or inhibiting the translation of target gene transcripts. miRNAs have been extensively analyzed in a few model plant species such as Arabidopsis, rice and Populus, and partially investigated in other non-model plant species. However, only a few conserved miRNAs have been identified in Chinese cabbage, a common and economically important crop in Asia. To identify novel and conserved miRNAs in Chinese cabbage (Brassica rapa L. ssp. pekinensis) we constructed a small RNA library. Using high-throughput Solexa sequencing to identify microRNAs we found 11,210 unique sequences belonging to 321 conserved miRNA families and 228 novel miRNAs. We ran a Blast search with these sequences against the Chinese cabbage mRNA database and found 2,308 and 736 potential target genes for 221 conserved and 125 novel miRNAs, respectively. The BlastX search against the Arabidopsis genome and GO analysis suggested most of the targets were involved in plant growth, metabolism, development and stress response. This study provides the first large scale-cloning and characterization of Chinese cabbage miRNAs and their potential targets. These miRNAs add to the growing database of new miRNAs, prompt further study on Chinese cabbage miRNA regulation mechanisms, and help toward a greater understanding of the important roles of miRNAs in Chinese cabbage.

  1. Characterization of the dead ringer gene identifies a novel, highly conserved family of sequence-specific DNA-binding proteins.

    PubMed Central

    Gregory, S L; Kortschak, R D; Kalionis, B; Saint, R

    1996-01-01

    We reported the identification of a new family of DNA-binding proteins from our characterization of the dead ringer (dri) gene of Drosophila melanogaster. We show that dri encodes a nuclear protein that contains a sequence-specific DNA-binding domain that bears no similarity to known DNA-binding domains. A number of proteins were found to contain sequences homologous to this domain. Other proteins containing the conserved motif include yeast SWI1, two human retinoblastoma binding proteins, and other mammalian regulatory proteins. A mouse B-cell-specific regulator exhibits 75% identity with DRI over the 137-amino-acid DNA-binding domains of these proteins, indicating a high degree of conservation of this domain. Gel retardation and optimal binding site screens revealed that the in vitro sequence specificity of DRI is strikingly similar to that of many homeodomain proteins, although the sequence and predicted secondary structure do not resemble a homeodomain. The early general expression of dri and the similarity of DRI and homeodomain in vitro DNA-binding specificity compound the problem of understanding the in vivo specificity of action of these proteins. Maternally derived dri product is found throughout the embryo until germ band extension, when dri is expressed in a developmentally regulated set of tissues, including salivary gland ducts, parts of the gut, and a subset of neural cells. The discovery of this new, conserved DNA-binding domain offers an explanation for the regulatory activity of several important members of this class and predicts significant regulatory roles for the others. PMID:8622680

  2. High Sequence Conservation of Human Immunodeficiency Virus Type 1 Reverse Transcriptase under Drug Pressure despite the Continuous Appearance of Mutations

    PubMed Central

    Ceccherini-Silberstein, Francesca; Gago, Federico; Santoro, Maria; Gori, Caterina; Svicher, Valentina; Rodríguez-Barrios, Fátima; d'Arrigo, Roberta; Ciccozzi, Massimo; Bertoli, Ada; Monforte, Antonella d'Arminio; Balzarini, Jan; Antinori, Andrea; Perno, Carlo-Federico

    2005-01-01

    To define the extent of sequence conservation in human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) in vivo, the first 320 amino acids of RT obtained from 2,236 plasma-derived samples from a well-defined cohort of 1,704 HIV-1-infected individuals (457 drug naïve and 1,247 drug treated) were analyzed and examined in structural terms. In naïve patients, 233 out of these 320 residues (73%) were conserved (<1% variability). The majority of invariant amino acids clustered into defined regions comprising between 5 and 29 consecutive residues. Of the nine longest invariant regions identified, some contained residues and domains critical for enzyme stability and function. In patients treated with RT inhibitors, despite profound drug pressure and the appearance of mutations primarily associated with resistance, 202 amino acids (63%) remained highly conserved and appeared mostly distributed in regions of variable length. This finding suggests that participation of consecutive residues in structural domains is strictly required for cooperative functions and sustainability of HIV-1 RT activity. Besides confirming the conservation of amino acids that are already known to be important for catalytic activity, stability of the heterodimer interface, and/or primer/template binding, the other 62 new invariable residues are now identified and mapped onto the three-dimensional structure of the enzyme. This new knowledge could be of help in the structure-based design of novel resistance-evading drugs. PMID:16051864

  3. Identification and characterization of novel and conserved microRNAs in radish (Raphanus sativus L.) using high-throughput sequencing.

    PubMed

    Xu, Liang; Wang, Yan; Xu, Yuanyuan; Wang, Liangju; Zhai, Lulu; Zhu, Xianwen; Gong, Yiqin; Ye, Shan; Liu, Liwang

    2013-03-01

    MicroRNAs (miRNAs) are endogenous, non-coding, small RNAs that play significant regulatory roles in plant growth, development, and biotic and abiotic stress responses. To date, a great number of conserved and species-specific miRNAs have been identified in many important plant species such as Arabidopsis, rice and poplar. However, little is known about identification of miRNAs and their target genes in radish (Raphanus sativus L.). In the present study, a small RNA library from radish root was constructed and sequenced using the high-throughput Solexa sequencing. Through sequence alignment and secondary structure prediction, a total of 545 conserved miRNA families as well as 15 novel (with their miRNA* strand) and 64 potentially novel miRNAs were identified. Quantitative real-time PCR (qRT-PCR) analysis confirmed that both conserved and novel miRNAs were expressed in radish, and some of them were preferentially expressed in certain tissues. A total of 196 potential target genes were predicted for 42 novel radish miRNAs. Gene ontology (GO) analysis showed that most of the targets were involved in plant growth, development, metabolism and stress responses. This study represents a first large-scale identification and characterization of radish miRNAs and their potential target genes. These results could lead to the further identification of radish miRNAs and enhance our understanding of radish miRNA regulatory mechanisms in diverse biological and metabolic processes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  4. Highly conserved influenza A virus epitope sequences as candidates of H3N2 flu vaccine targets.

    PubMed

    Wu, Ko-Wen; Chien, Chih-Yi; Li, Shiao-Wen; King, Chwan-Chuen; Chang, Chuan-Hsiung

    2012-08-01

    This study focused on identifying the conserved epitopes in a single subtype A (H3N2)-as candidates for vaccine targets. We identified a total of 32 conserved epitopes in four viral proteins [22 HA, 4PB1, 3 NA, 3 NP]. Evaluation of conserved epitopes in coverage during 1968-2010 revealed that (1) 12 HA conserved epitopes were highly present in the circulating viruses; (2) the remaining 10 HA conserved epitopes appeared with lower percentage but a significantly increasing trend after 1989 [p<0.001]; and (3) the conserved epitopes in NA, NP and PB1 are also highly frequent in wild-type viruses. These conserved epitopes also covered an extremely high percentage of the 16 vaccine strains during the 42 year period. The identification of highly conserved epitopes using our approach can also be applied to develop broad-spectrum vaccines. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations.

    PubMed

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-10-20

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species.

  6. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations

    PubMed Central

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-01-01

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species. PMID:26492246

  7. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  8. Identification of conserved hepatic transcriptomic responses to 17β-estradiol using high-throughput sequencing in brown trout

    PubMed Central

    Uren Webster, Tamsyn M.; Shears, Janice A.; Moore, Karen

    2015-01-01

    Estrogenic chemicals are major contaminants of surface waters and can threaten the sustainability of natural fish populations. Characterization of the global molecular mechanisms of toxicity of environmental contaminants has been conducted primarily in model species rather than species with limited existing transcriptomic or genomic sequence information. We aimed to investigate the global mechanisms of toxicity of an endocrine disrupting chemical of environmental concern [17β-estradiol (E2)] using high-throughput RNA sequencing (RNA-Seq) in an environmentally relevant species, brown trout (Salmo trutta). We exposed mature males to measured concentrations of 1.94, 18.06, and 34.38 ng E2/l for 4 days and sequenced three individual liver samples per treatment using an Illumina HiSeq 2500 platform. Exposure to 34.4 ng E2/L resulted in 2,113 differentially regulated transcripts (FDR < 0.05). Functional analysis revealed upregulation of processes associated with vitellogenesis, including lipid metabolism, cellular proliferation, and ribosome biogenesis, together with a downregulation of carbohydrate metabolism. Using real-time quantitative PCR, we validated the expression of eight target genes and identified significant differences in the regulation of several known estrogen-responsive transcripts in fish exposed to the lower treatment concentrations (including esr1 and zp2.5). We successfully used RNA-Seq to identify highly conserved responses to estrogen and also identified some estrogen-responsive transcripts that have been less well characterized, including nots and tgm2l. These results demonstrate the potential application of RNA-Seq as a valuable tool for assessing mechanistic effects of pollutants in ecologically relevant species for which little genomic information is available. PMID:26082144

  9. The highly conserved codon following the slippery sequence supports -1 frameshift efficiency at the HIV-1 frameshift site.

    PubMed

    Mathew, Suneeth F; Crowe-McAuliffe, Caillan; Graves, Ryan; Cardno, Tony S; McKinney, Cushla; Poole, Elizabeth S; Tate, Warren P

    2015-01-01

    HIV-1 utilises -1 programmed ribosomal frameshifting to translate structural and enzymatic domains in a defined proportion required for replication. A slippery sequence, U UUU UUA, and a stem-loop are well-defined RNA features modulating -1 frameshifting in HIV-1. The GGG glycine codon immediately following the slippery sequence (the 'intercodon') contributes structurally to the start of the stem-loop but has no defined role in current models of the frameshift mechanism, as slippage is inferred to occur before the intercodon has reached the ribosomal decoding site. This GGG codon is highly conserved in natural isolates of HIV. When the natural intercodon was replaced with a stop codon two different decoding molecules-eRF1 protein or a cognate suppressor tRNA-were able to access and decode the intercodon prior to -1 frameshifting. This implies significant slippage occurs when the intercodon is in the (perhaps distorted) ribosomal A site. We accommodate the influence of the intercodon in a model of frame maintenance versus frameshifting in HIV-1.

  10. High-throughput sequencing identification of novel and conserved miRNAs in the Brassica oleracea leaves.

    PubMed

    Lukasik, Anna; Pietrykowska, Halina; Paczek, Leszek; Szweykowska-Kulinska, Zofia; Zielenkiewicz, Piotr

    2013-11-19

    Plant microRNAs are short (~21 nt) non-coding molecules that regulate gene expression by targeting the mRNA cleavage or protein translation inhibition. In this manner, they play many important roles in the cells of living organisms. One of the plant species in which the entire set of miRNAs has not been yet completely identified is Brassica oleracea var. capitata (cabbage). For this reason and for the economic and nutritional importance of this food crop, high-throughput small RNAs sequencing has been performed to discover the novel and conserved miRNAs in mature cabbage leaves. In this study, raw reads generated from three small RNA libraries were bioinformatically processed and further analyzed to select sequences homologous to known B. oleracea and other plant miRNAs. As a result of this analysis, 261 conserved miRNAs (belonging to 62 families) have been discovered. MIR169, MIR167 and MIR166 were the largest miRNA families, while the highest abundance molecules were miR167, miR166, miR168c and miR157a. Among the generated sequencing reads, miRNAs* were also found, such as the miR162c*, miR160a* and miR157a*. The unannotated tags were used in the prediction and evaluation of novel miRNAs, which resulted in the 26 potential miRNAs proposal. The expressions of 13 selected miRNAs were analyzed by northern blot hybridization. The target prediction and annotation for identified miRNAs were performed, according to which discovered molecules may target mRNAs encoding several potential proteins - e.g., transcription factors, polypeptides that regulate hormone stimuli and abiotic stress response, and molecules participating in transport and cell communication. Additionally, KEGG maps analysis suggested that the miRNAs in cabbage are involved in important processing pathways, including glycolysis, glycerolipid metabolism, flavonoid biosynthesis and oxidative phosphorylation. Conclusively, for the first time, the large set of miRNAs was identified in mature cabbage leaves

  11. High-throughput sequencing identification of novel and conserved miRNAs in the Brassica oleracea leaves

    PubMed Central

    2013-01-01

    Background Plant microRNAs are short (~21 nt) non-coding molecules that regulate gene expression by targeting the mRNA cleavage or protein translation inhibition. In this manner, they play many important roles in the cells of living organisms. One of the plant species in which the entire set of miRNAs has not been yet completely identified is Brassica oleracea var. capitata (cabbage). For this reason and for the economic and nutritional importance of this food crop, high-throughput small RNAs sequencing has been performed to discover the novel and conserved miRNAs in mature cabbage leaves. Results In this study, raw reads generated from three small RNA libraries were bioinformatically processed and further analyzed to select sequences homologous to known B. oleracea and other plant miRNAs. As a result of this analysis, 261 conserved miRNAs (belonging to 62 families) have been discovered. MIR169, MIR167 and MIR166 were the largest miRNA families, while the highest abundance molecules were miR167, miR166, miR168c and miR157a. Among the generated sequencing reads, miRNAs* were also found, such as the miR162c*, miR160a* and miR157a*. The unannotated tags were used in the prediction and evaluation of novel miRNAs, which resulted in the 26 potential miRNAs proposal. The expressions of 13 selected miRNAs were analyzed by northern blot hybridization. The target prediction and annotation for identified miRNAs were performed, according to which discovered molecules may target mRNAs encoding several potential proteins – e.g., transcription factors, polypeptides that regulate hormone stimuli and abiotic stress response, and molecules participating in transport and cell communication. Additionally, KEGG maps analysis suggested that the miRNAs in cabbage are involved in important processing pathways, including glycolysis, glycerolipid metabolism, flavonoid biosynthesis and oxidative phosphorylation. Conclusions Conclusively, for the first time, the large set of miRNAs was

  12. The sequence organization of Yp/proximal Xq homologous regions of the human sex chromosomes is highly conserved.

    PubMed

    Sargent, C A; Briggs, H; Chalmers, I J; Lambson, B; Walker, E; Affara, N A

    1996-03-01

    Detailed deletion analysis of patients with breakpoints in Yp has allowed the definition of two distinct intervals on the Y chromosome short arm outside the pseudoautosomal region that are homologous to Xq2l.3. Detailed YAC contigs have been developed over these regions on both the X and Y chromosomes, and the relative order of markers has been compared to assess whether rearrangements on either sex chromosome have occurred since the transposition events creating these patterns of homology. On the X chromosome, the region forms almost one contiguous block of homology, whereas on the Y chromosome, there has been one major rearrangement leading to the two separate Yp-Xq2l blocks of homology. The rearrangement breakpoint has been mapped. Within these separate X-Y homologous blocks on Yp, the order of loci homologous to X has been conserved to a high degree between the sex chromosomes. With the exception of the amelogenin gene (proximal Yp block), all the XY homologous sequences in the two Yp blocks have homolognes in Xq2l.3, with the former having its X counterpart in Xp22.2. This suggests an independent evolutionary event leading to the formation of the amelogenin X-Y homology.

  13. The sequence organization of Yp/proximal Xq homologous regions of the human sex chromosomes is highly conserved

    SciTech Connect

    Sargent, C.A.; Briggs, H.; Chalmers, I.J.

    1996-03-01

    Detailed deletion analysis of patients with breakpoints in Yp has allowed the definition of two distinct intervals on the Y chromosome short arm outside the pseudoautosomal region that are homologous to Xq21.3. Detailed YAC contigs have been developed over these regions on both the X and Y chromosomes, and the relative order of markers has been compared to assess whether rearrangements on either sex chromosome have occurred since the transposition events creating these patterns of homology. On the X chromosome, the region forms almost one contiguous block of homology, whereas on the Y chromosome, there has been one major rearrangement leading to the two separate Yp-Xq21 blocks of homology. The rearrangement breakpoint has been mapped. Within these separate X-Y homologous blocks on Yp, the order of loci homologous to X has been conserved to a high degree between the sex chromosomes. With the exception of the amelogenin gene (proximal Yp block), all the X-Y homologous sequences in the two Yp blocks have homologues in Xq21.3, with the former having its X counterpart in Xp22.2. This suggests an independent evolutionary event leading to the formation of the amelogenin X-Y homology. 45 refs., 4 figs., 1 tab.

  14. SNPs occur in regions with less genomic sequence conservation.

    PubMed

    Castle, John C

    2011-01-01

    Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3' splice sites, 5' splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3' UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation.

  15. Comprehensive Sequence Analysis of the Human IL23A Gene Defines New Variation Content and High Rate of Evolutionary Conservation

    PubMed Central

    Tindall, Elizabeth A.; Hayes, Vanessa M.

    2010-01-01

    A newly described heterodimeric cytokine, interleukin-23 (IL-23) is emerging as a key player in both the innate and the adaptive T helper (Th)17 driven immune response as well as an initiator of several autoimmune diseases. The rate-limiting element of IL-23 production is believed to be driven by expression of the unique p19 subunit encoded by IL23A. We set out to perform comprehensive DNA sequencing of this previously under-studied gene in 96 individuals from two evolutionary distinct human population groups, Southern African Bantu and European. We observed a total of 33 different DNA variants within these two groups, 22 (67%) of which are currently not reported in any available database. We further demonstrate both inter-population and intra-species sequence conservation within the coding and known regulatory regions of IL23A, supporting a critical physiological role for IL-23. We conclude that IL23A may have undergone positive selection pressure directed towards conservation, suggesting that functional genetic variants within IL23A will have a significant impact on the host immune response. PMID:20154336

  16. Spider minor ampullate silk proteins contain new repetitive sequences and highly conserved non-silk-like "spacer regions".

    PubMed

    Colgin, M A; Lewis, R V

    1998-03-01

    Spider minor ampullate silk is a strong non-elastic deformably stretchable silk used in web formation. This silk from Nephila clavipes is composed of two proteins, MiSp 1 and 2, whose transcripts are 9.5 and 7.5 kb, respectively, as determined by Northern blots. Both MiSp proteins are organized into a predominantly repetitive region and a small nonrepetitive carboxy terminal region. These highly repetitive regions are composed mainly of glycine and alanine, but also contain tyrosine, glutamine, and arginine. The sequences are mainly GGX and GA repeats. The repetitive regions are interrupted by nonrepetitive serine-rich spacer regions. Although the sequences of the spacer regions differ from the repetitive regions, sequences of the spacers from different regions of the proteins are nearly identical. The sequence differences between major and minor ampullate silks may explain the differing mechanical properties of the fibers.

  17. Alignment of U3 region sequences of mammalian type C viruses: identification of highly conserved motifs and implications for enhancer design.

    PubMed Central

    Golemis, E A; Speck, N A; Hopkins, N

    1990-01-01

    We aligned published sequences for the U3 region of 35 type C mammalian retroviruses. The alignment reveals that certain sequence motifs within the U3 region are strikingly conserved. A number of these motifs correspond to previously identified sites. In particular, we found that the enhancer region of most of the viruses examined contains a binding site for leukemia virus factor b, a viral corelike element, the consensus motif for nuclear factor 1, and the glucocorticoid response element. Most viruses containing more than one copy of enhancer sequences include these binding sites in both copies of the repeat. We consider this set of binding sites to constitute a framework for the enhancers of this set of viruses. Other highly conserved motifs in the U3 region include the retrovirus inverted repeat sequence, a negative regulatory element, and the CCAAT and TATA boxes. In addition, we identified two novel motifs in the promoter region that were exceptionally highly conserved but have not been previously described. PMID:2153223

  18. High-Throughput Sequencing Reveals Diverse Sets of Conserved, Nonconserved, and Species-Specific miRNAs in Jute.

    PubMed

    Islam, Md Tariqul; Ferdous, Ahlan Sabah; Najnin, Rifat Ara; Sarker, Suprovath Kumar; Khan, Haseena

    2015-01-01

    MicroRNAs play a pivotal role in regulating a broad range of biological processes, acting by cleaving mRNAs or by translational repression. A group of plant microRNAs are evolutionarily conserved; however, others are expressed in a species-specific manner. Jute is an agroeconomically important fibre crop; nonetheless, no practical information is available for microRNAs in jute to date. In this study, Illumina sequencing revealed a total of 227 known microRNAs and 17 potential novel microRNA candidates in jute, of which 164 belong to 23 conserved families and the remaining 63 belong to 58 nonconserved families. Among a total of 81 identified microRNA families, 116 potential target genes were predicted for 39 families and 11 targets were predicted for 4 among the 17 identified novel microRNAs. For understanding better the functions of microRNAs, target genes were analyzed by Gene Ontology and their pathways illustrated by KEGG pathway analyses. The presence of microRNAs identified in jute was validated by stem-loop RT-PCR followed by end point PCR and qPCR for randomly selected 20 known and novel microRNAs. This study exhaustively identifies microRNAs and their target genes in jute which will ultimately pave the way for understanding their role in this crop and other crops.

  19. High-Throughput Sequencing Reveals Diverse Sets of Conserved, Nonconserved, and Species-Specific miRNAs in Jute

    PubMed Central

    Islam, Md. Tariqul; Ferdous, Ahlan Sabah; Najnin, Rifat Ara; Sarker, Suprovath Kumar; Khan, Haseena

    2015-01-01

    MicroRNAs play a pivotal role in regulating a broad range of biological processes, acting by cleaving mRNAs or by translational repression. A group of plant microRNAs are evolutionarily conserved; however, others are expressed in a species-specific manner. Jute is an agroeconomically important fibre crop; nonetheless, no practical information is available for microRNAs in jute to date. In this study, Illumina sequencing revealed a total of 227 known microRNAs and 17 potential novel microRNA candidates in jute, of which 164 belong to 23 conserved families and the remaining 63 belong to 58 nonconserved families. Among a total of 81 identified microRNA families, 116 potential target genes were predicted for 39 families and 11 targets were predicted for 4 among the 17 identified novel microRNAs. For understanding better the functions of microRNAs, target genes were analyzed by Gene Ontology and their pathways illustrated by KEGG pathway analyses. The presence of microRNAs identified in jute was validated by stem-loop RT-PCR followed by end point PCR and qPCR for randomly selected 20 known and novel microRNAs. This study exhaustively identifies microRNAs and their target genes in jute which will ultimately pave the way for understanding their role in this crop and other crops. PMID:25861616

  20. Conservation of sequence in recombination signal sequence spacers.

    PubMed Central

    Ramsden, D A; Baetz, K; Wu, G E

    1994-01-01

    The variable domains of immunoglobulins and T cell receptors are assembled through the somatic, site specific recombination of multiple germline segments (V, D, and J segments) or V(D)J rearrangement. The recombination signal sequence (RSS) is necessary and sufficient for cell type specific targeting of the V(D)J rearrangement machinery to these germline segments. Previously, the RSS has been described as possessing both a conserved heptamer and a conserved nonamer motif. The heptamer and nonamer motifs are separated by a 'spacer' that was not thought to possess significant sequence conservation, however the length of the spacer could be either 12 +/- 1 bp or 23 +/- 1 bp long. In this report we have assembled and analyzed an extensive data base of published RSS. We have derived, through extensive consensus comparison, a more detailed description of the RSS than has previously been reported. Our analysis indicates that RSS spacers possess significant conservation of sequence, and that the conserved sequence in 12 bp spacers is similar to the conserved sequence in the first half of 23 bp spacers. PMID:8208601

  1. In vitro homology search array comprehensively reveals highly conserved genes and their functional characteristics in non-sequenced species

    PubMed Central

    2010-01-01

    Background With the increase in genomic and transcriptomic data produced by the recent advancements in next generation sequencers and microarrays, it is now easier than ever to conduct large-scale comparative genomic studies for familiar species. However, there are more than ten million species on earth, and the study of all remaining species is not realistic in terms of cost and time. There have been a number of attempts at using microarrays for cross-species hybridization; however, those approaches only utilized the same probes for each species or different probes designed from orthologous genes. To establish easier and cheaper methods for the large-scale comparative genomic study of non-sequenced species, we developed an in vitro homology search array with the aid of a bioinformatic approach to probe design. Results To perform large-scale genomic comparisons of non-sequenced species, we chose squid, one of the most intelligent species among Protostomes, for comparison with human genes. We designed a microarray using human single copy genes and conducted microarray experiments with mRNAs extracted from the squid. Multi-copy genes could not be detected using the microarray in this study because their sequence similarity caused cross-hybridization. A search for squid homologous genes among human genes revealed that 68% of the human probes tested showed the expression of squid homolog genes and 95 genes were confirmed to be expressed highly in squid. Functional classification analysis showed that these highly expressed genes comprise DNA binding proteins, which are under pressure of DNA level mutation and, consequently, show high similarity at the nucleotide level. Conclusions Our array could detect homologous genes in squids and humans in spite of the distant phylogenic relationships between the species. This experimental method will be useful for identifying homologs in non-sequenced species, for the development of genetic resources and for the collection of

  2. In vitro homology search array comprehensively reveals highly conserved genes and their functional characteristics in non-sequenced species.

    PubMed

    Ogura, Atsushi; Yoshida, Masa-aki; Fukuzaki, Mutsumi; Sese, Jun

    2010-12-02

    With the increase in genomic and transcriptomic data produced by the recent advancements in next generation sequencers and microarrays, it is now easier than ever to conduct large-scale comparative genomic studies for familiar species. However, there are more than ten million species on earth, and the study of all remaining species is not realistic in terms of cost and time. There have been a number of attempts at using microarrays for cross-species hybridization; however, those approaches only utilized the same probes for each species or different probes designed from orthologous genes. To establish easier and cheaper methods for the large-scale comparative genomic study of non-sequenced species, we developed an in vitro homology search array with the aid of a bioinformatic approach to probe design. To perform large-scale genomic comparisons of non-sequenced species, we chose squid, one of the most intelligent species among Protostomes, for comparison with human genes. We designed a microarray using human single copy genes and conducted microarray experiments with mRNAs extracted from the squid. Multi-copy genes could not be detected using the microarray in this study because their sequence similarity caused cross-hybridization. A search for squid homologous genes among human genes revealed that 68% of the human probes tested showed the expression of squid homolog genes and 95 genes were confirmed to be expressed highly in squid. Functional classification analysis showed that these highly expressed genes comprise DNA binding proteins, which are under pressure of DNA level mutation and, consequently, show high similarity at the nucleotide level. Our array could detect homologous genes in squids and humans in spite of the distant phylogenic relationships between the species. This experimental method will be useful for identifying homologs in non-sequenced species, for the development of genetic resources and for the collection of information on biodiversity

  3. Sequence conservation on the Y chromosome

    SciTech Connect

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    1994-09-01

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid pools were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.

  4. Comparison of orthologous and paralogous DNA flanking the wheat high molecular weight glutenin genes: sequence conservation and divergence, transposon distribution, and matrix-attachment regions.

    PubMed

    Anderson, O D; Larka, L; Christoffers, M J; McCue, K F; Gustafson, J P

    2002-04-01

    Extended flanking DNA sequences were characterized for five members of the wheat high molecular weight (HMW) glutenin gene family to understand more of the structure, control, and evolution of these genes. Analysis revealed more sequence conservation among orthologous regions than between paralogous regions, with differences mainly owing to transposition events involving putative retrotransposons and several miniature inverted transposable elements (MITEs). Both gyspy-like long terminal repeat (LTR) and non-LTR retrotransposon sequences are represented in the flanking DNAs. One of the MITEs is a novel class, but another MITE is related to the maize Stowaway family and is widely represented in Triticeae express sequence tags (ESTs). Flanking DNA of the longest sequence, a 20 425-bp fragment including and surrounding the HMW-glutenin Bx7 gene, showed additional cereal gene-like sequences both immediately 5' and 3' to the HMW-glutenin coding region. The transcriptional activities of sequences related to these flanking putative genes and the retrotransposon-related regions were indicated by matches to wheat and other Triticeae ESTs. Predictive analysis of matrix-attachment regions (MARs) of the HMW glutenin and several alpha-, gamma-, and omega-gliadin flanking DNAs indicate potential MARs immediately flanking each of the genes. Matrix binding activity in the predicted regions was confirmed for two of the HMW-glutenin genes.

  5. Conserved Sequence Processing in Primate Frontal Cortex.

    PubMed

    Wilson, Benjamin; Marslen-Wilson, William D; Petkov, Christopher I

    2017-02-01

    An important aspect of animal perception and cognition is learning to recognize relationships between environmental events that predict others in time, a form of relational knowledge that can be assessed using sequence-learning paradigms. Humans are exquisitely sensitive to sequencing relationships, and their combinatorial capacities, most saliently in the domain of language, are unparalleled. Recent comparative research in human and nonhuman primates has obtained behavioral and neuroimaging evidence for evolutionarily conserved substrates involved in sequence processing. The findings carry implications for the origins of domain-general capacities underlying core language functions in humans. Here, we synthesize this research into a 'ventrodorsal gradient' model, where frontal cortex engagement along this axis depends on sequencing complexity, mapping onto the sequencing capacities of different species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. SNPs Occur in Regions with Less Genomic Sequence Conservation

    PubMed Central

    Castle, John C.

    2011-01-01

    Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3′ splice sites, 5′ splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3′ UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation. PMID:21674007

  7. High-Throughput Sequencing and Characterization of the Small RNA Transcriptome Reveal Features of Novel and Conserved MicroRNAs in Panax ginseng

    PubMed Central

    Ma, Yimian; Yuan, Lichai; Lu, Shanfa

    2012-01-01

    microRNAs (miRNAs) play vital regulatory roles in many organisms through direct cleavage of transcripts, translational repression, or chromatin modification. Identification of miRNAs has been carried out in various plant species. However, no information is available for miRNAs from Panax ginseng, an economically significant medicinal plant species. Using the next generation high-throughput sequencing technology, we obtained 13,326,328 small RNA reads from the roots, stems, leaves and flowers of P. ginseng. Analysis of these small RNAs revealed the existence of a large, diverse and highly complicated small RNA population in P. ginseng. We identified 73 conserved miRNAs, which could be grouped into 33 families, and 28 non-conserved ones belonging to 9 families. Characterization of P. ginseng miRNA precursors revealed many features, such as production of two miRNAs from distinct regions of a precursor, clusters of two precursors in a transcript, and generation of miRNAs from both sense and antisense transcripts. It suggests the complexity of miRNA production in P. gingseng. Using a computational approach, we predicted for the conserved and non-conserved miRNA families 99 and 31 target genes, respectively, of which eight were experimentally validated. Among all predicted targets, only about 20% are conserved among various plant species, whereas the others appear to be non-conserved, indicating the diversity of miRNA functions. Consistently, many miRNAs exhibited tissue-specific expression patterns. Moreover, we identified five dehydration- and ten heat-responsive miRNAs and found the existence of a crosstalk among some of the stress-responsive miRNAs. Our results provide the first clue to the elucidation of miRNA functions in P. ginseng. PMID:22962612

  8. High-throughput sequencing and characterization of the small RNA transcriptome reveal features of novel and conserved microRNAs in Panax ginseng.

    PubMed

    Wu, Bin; Wang, Meizhen; Ma, Yimian; Yuan, Lichai; Lu, Shanfa

    2012-01-01

    microRNAs (miRNAs) play vital regulatory roles in many organisms through direct cleavage of transcripts, translational repression, or chromatin modification. Identification of miRNAs has been carried out in various plant species. However, no information is available for miRNAs from Panax ginseng, an economically significant medicinal plant species. Using the next generation high-throughput sequencing technology, we obtained 13,326,328 small RNA reads from the roots, stems, leaves and flowers of P. ginseng. Analysis of these small RNAs revealed the existence of a large, diverse and highly complicated small RNA population in P. ginseng. We identified 73 conserved miRNAs, which could be grouped into 33 families, and 28 non-conserved ones belonging to 9 families. Characterization of P. ginseng miRNA precursors revealed many features, such as production of two miRNAs from distinct regions of a precursor, clusters of two precursors in a transcript, and generation of miRNAs from both sense and antisense transcripts. It suggests the complexity of miRNA production in P. ginseng. Using a computational approach, we predicted for the conserved and non-conserved miRNA families 99 and 31 target genes, respectively, of which eight were experimentally validated. Among all predicted targets, only about 20% are conserved among various plant species, whereas the others appear to be non-conserved, indicating the diversity of miRNA functions. Consistently, many miRNAs exhibited tissue-specific expression patterns. Moreover, we identified five dehydration- and ten heat-responsive miRNAs and found the existence of a crosstalk among some of the stress-responsive miRNAs. Our results provide the first clue to the elucidation of miRNA functions in P. ginseng.

  9. A highly conserved G-rich consensus sequence in hepatitis C virus core gene represents a new anti-hepatitis C target.

    PubMed

    Wang, Shao-Ru; Min, Yuan-Qin; Wang, Jia-Qi; Liu, Chao-Xing; Fu, Bo-Shi; Wu, Fan; Wu, Ling-Yu; Qiao, Zhi-Xian; Song, Yan-Yan; Xu, Guo-Hua; Wu, Zhi-Guo; Huang, Gai; Peng, Nan-Fang; Huang, Rong; Mao, Wu-Xiang; Peng, Shuang; Chen, Yu-Qi; Zhu, Ying; Tian, Tian; Zhang, Xiao-Lian; Zhou, Xiang

    2016-04-01

    G-quadruplex (G4) is one of the most important secondary structures in nucleic acids. Until recently, G4 RNAs have not been reported in any ribovirus, such as the hepatitis C virus. Our bioinformatics analysis reveals highly conserved guanine-rich consensus sequences within the core gene of hepatitis C despite the high genetic variability of this ribovirus; we further show using various methods that such consensus sequences can fold into unimolecular G4 RNA structures, both in vitro and under physiological conditions. Furthermore, we provide direct evidences that small molecules specifically targeting G4 can stabilize this structure to reduce RNA replication and inhibit protein translation of intracellular hepatitis C. Ultimately, the stabilization of G4 RNA in the genome of hepatitis C represents a promising new strategy for anti-hepatitis C drug development.

  10. Hepatitis B virus depicts a high degree of conservation during the immune-tolerant phase in familiarly transmitted chronic hepatitis B infection: deep-sequencing and phylogenetic analysis.

    PubMed

    Sede, M; Lopez-Ledesma, M; Frider, B; Pozzati, M; Campos, R H; Flichman, D; Quarleri, J

    2014-01-01

    When intrafamilial transmission of hepatitis B virus (HBV) occurs, a virus with the same characteristics interacts with diverse hosts' immune systems and may thus result in different mutations to escape immune pressure. In this study, the HBV genomic characterization was assessed longitudinally after intrafamilial transmission using nucleotide sequence data of phylogenetic and mutational analyses, including those obtained by deep-sequencing for the first time. Furthermore, HBeAg-anti-HBe profile and variability of HBV core-derived epitopes were also evaluated. Strong evidence was obtained from intrafamilial transmission of HBV genotype D1 by phylogenetic inferences. HBV isolates exhibited high degree (~99%) of genomic conservation for almost 20 years, when patients were persistently HBeAg positive with normal amino transferase levels. This identity remained high among immune-tolerant siblings. In contrast, it diminished significantly (P = 0.02) when the mother cleared HBeAg (immune clearance phase). By deep-sequencing, the quantitative analysis of the dynamics of basal core promoter (BCP) (A1762T, G1764A; A1766C; T1773C; 8-bp deletion; and other) and precore (G1896A) variants among HBV isolates from family members exhibited differences during the follow-up. However, only those from the mother showed amino acid variations at core protein that would impair their MHC-II binding. Hence, when intrafamilial transmission occurs, HBV was highly conserved under the immune-tolerant phase, but it exhibited mutations more frequently during the immune clearance phase. The analysis of the HBV BCP and precore mutants after intrafamilial HBV transmission contributes to a better understanding of how they evolve over time. © 2013 John Wiley & Sons Ltd.

  11. A highly conserved sequence associated with the HIV gp41 loop region is an immunomodulator of antigen-specific T cells in mice.

    PubMed

    Ashkenazi, Avraham; Faingold, Omri; Kaushansky, Nathali; Ben-Nun, Avraham; Shai, Yechiel

    2013-03-21

    Modulation of T-cell responses by HIV occurs via distinct mechanisms, 1 of which involves inactivation of T cells already at the stage of virus-cell fusion. Hydrophobic portions of the gp41 protein of the viral envelope that contributes to membrane fusion may modulate T-cell responsiveness. Here we found a highly conserved sequence (termed "ISLAD") that is associated with the membranotropic gp41 loop region. We showed that ISLAD has the ability to bind the T-cell membrane and to interact with the T-cell receptor (TCR) complex. Furthermore, ISLAD inhibited T-cell proliferation and interferon-γ secretion that resulted from TCR engagement through antigen-presenting cells. Moreover, administering ISLAD (10 µg per mouse) to an experimental autoimmune encephalomyelitis (EAE) model of multiple sclerosis reduced the severity of the disease. This was related to the inhibition of pathogenic T-cell proliferation and to reduced pro-inflammatory cytokine secretion in the lymph nodes of ISLAD-treated EAE mice. The data suggest that T-cell inactivation by HIV during membrane fusion may lie in part in this conserved sequence associated with the gp41 loop region.

  12. Conserved Noncoding Sequences in the Grasses4

    PubMed Central

    Inada, Dan Choffnes; Bashir, Ali; Lee, Chunghau; Thomas, Brian C.; Ko, Cynthia; Goff, Stephen A.; Freeling, Michael

    2003-01-01

    As orthologous genes from related species diverge over time, some sequences are conserved in noncoding regions. In mammals, large phylogenetic footprints, or conserved noncoding sequences (CNSs), are known to be common features of genes. Here we present the first large-scale analysis of plant genes for CNSs. We used maize and rice, maximally diverged members of the grass family of monocots. Using a local sequence alignment set to deliver only significant alignments, we found one or more CNSs in the noncoding regions of the majority of genes studied. Grass genes have dramatically fewer and much smaller CNSs than mammalian genes. Twenty-seven percent of grass gene comparisons revealed no CNSs. Genes functioning in upstream regulatory roles, such as transcription factors, are greatly enriched for CNSs relative to genes encoding enzymes or structural proteins. Further, we show that a CNS cluster in an intron of the knotted1 homeobox gene serves as a site of negative regulation. We showthat CNSs in the adh1 gene do not correlate with known cis-acting sites. We discuss the potential meanings of CNSs and their value as analytical tools and evolutionary characters. We advance the idea that many CNSs function to lock-in gene regulatory decisions. PMID:12952874

  13. Large-scale nucleotide sequence alignment and sequence variability assessment to identify the evolutionarily highly conserved regions for universal screening PCR assay design: an example of influenza A virus.

    PubMed

    Nagy, Alexander; Jiřinec, Tomáš; Černíková, Lenka; Jiřincová, Helena; Havlíčková, Martina

    2015-01-01

    The development of a diagnostic polymerase chain reaction (PCR) or quantitative PCR (qPCR) assay for universal detection of highly variable viral genomes is always a difficult task. The purpose of this chapter is to provide a guideline on how to align, process, and evaluate a huge set of homologous nucleotide sequences in order to reveal the evolutionarily most conserved positions suitable for universal qPCR primer and hybridization probe design. Attention is paid to the quantification and clear graphical visualization of the sequence variability at each position of the alignment. In addition, specific problems related to the processing of the extremely large sequence pool are highlighted. All of these steps are performed using an ordinary desktop computer without the need for extensive mathematical or computational skills.

  14. Human IgE-binding protein: A soluble lectin exhibiting a highly conserved interspecies sequence and differential recognition of IgE glycoforms

    SciTech Connect

    Robertson, M.W.; Albrandt, K.; Keller, D.; Liu, Fu-Tong )

    1990-09-04

    IgE-binding protein ({epsilon}BP) refers to a protein originally identified in rat basophilic leukemia cells by virtue of its affinity for IgE. It is now known to be a {beta}-galactoside-binding lectin equivalent to carbohydrate-binding protein 35 (CBP 35). More recently, its identity to Mac-2, a macrophage cell-surface protein, has been established. cDNA coding for human {epsilon}BP has been cloned from a human HeLa cell cDNA library and contains an open reading frame of 750 base pairs encoding a 250 amino acid protein. Like the rat and murine counterparts, the human {epsilon}BP amino acid sequence can be divided into two domains with the amino-terminal domain consisting of a highly conserved repetitive sequence (YPGXXXPGA) and the carboxyl-terminal domain containing sequences shared by other S-type lectins. The human {epsilon}BP sequence exhibits extensive homology to murine and rat {epsilon}BP with 84% and 82% identity, respectively. The homology is particularly striking in the carboxyl-terminal domain where 95% identity is found between human and murine sequences in a stretch of over 70 amino acids. A survey of {epsilon}BP mRNA expression from several lymphocyte cell lines revealed that the level of {epsilon}BP transcription may reflect a relationship between cell differentiation and {epsilon}BP expression. Finally, human {epsilon}BP was purified from several human cell lines and shown to possess lactose-binding characteristics and cross-species reactivity to murine IgE. Surprisingly, three different human myeloma IgE proteins did not show reactivity to human {epsilon}BP. However, after neuraminidase treatment of each human IgE, pronounced binding to {epsilon}BP was observed, thereby indicating that only specific IgE glycoforms can be recognized by {epsilon}BP.

  15. Genome-Wide Analysis of Stowaway-Like MITEs in Wheat Reveals High Sequence Conservation, Gene Association, and Genomic Diversification1[C][W

    PubMed Central

    Yaakov, Beery; Ben-David, Smadar; Kashkush, Khalil

    2013-01-01

    The diversity and evolution of wheat (Triticum-Aegilops group) genomes is determined, in part, by the activity of transposable elements that constitute a large fraction of the genome (up to 90%). In this study, we retrieved sequences from publicly available wheat databases, including a 454-pyrosequencing database, and analyzed 18,217 insertions of 18 Stowaway-like miniature inverted-repeat transposable element (MITE) families previously characterized in wheat that together account for approximately 1.3 Mb of sequence. All 18 families showed high conservation in length, sequence, and target site preference. Furthermore, approximately 55% of the elements were inserted in transcribed regions, into or near known wheat genes. Notably, we observed significant correlation between the mean length of the MITEs and their copy number. In addition, the genomic composition of nine MITE families was studied by real-time quantitative polymerase chain reaction analysis in 40 accessions of Triticum spp. and Aegilops spp., including diploids, tetraploids, and hexaploids. The quantitative polymerase chain reaction data showed massive and significant intraspecific and interspecific variation as well as genome-specific proliferation and nonadditive quantities in the polyploids. We also observed significant differences in the methylation status of the insertion sites among MITE families. Our data thus suggest a possible role for MITEs in generating genome diversification and in the establishment of nascent polyploid species in wheat. PMID:23104862

  16. Sequence of cDNAs for mammalian H2A.Z, an evolutionarily diverged but highly conserved basal histone H2A isoprotein species.

    PubMed Central

    Hatch, C L; Bonner, W M

    1988-01-01

    The nucleotide sequences of cDNAs for the evolutionarily diverged but highly conserved basal H2A isoprotein, H2A.Z, have been determined for the rat, cow, and human. As a basal histone, H2A.Z is synthesized throughout the cell cycle at a constant rate, unlinked to DNA replication, and at a much lower rate in quiescent cells. Each of the cDNA isolates encodes the entire H2A.Z polypeptide. The human isolate is about 1.0 kilobases long. It contains a coding region of 387 nucleotides flanked by 106 nucleotides of 5'UTR and 376 nucleotides of 3'UTR, which contains a polyadenylation signal followed by a poly A tail. The bovine and rat cDNAs have 97 and 94% nucleotide positional identity to the human cDNA in the coding region and 98% in the proximal 376 nucleotides of the 3'UTR which includes the polyadenylation signal. A potential stem-forming sequence imbedded in a direct repeat is found centered at 261 nucleotides into the 3'UTR. Each of the cDNA clones could be transcribed and translated in vitro to yield H2A.Z protein. The mammalian H2A.Z cDNA coding sequences are approximately 80% similar to those in chicken and 75% to those in sea urchin. PMID:3344202

  17. Low sequence identity but high structural and functional conservation: The case of Hsp70/Hsp90 organizing protein (Hop/Sti1) of Leishmania braziliensis.

    PubMed

    Batista, Fernanda A H; Seraphim, Thiago V; Santos, Clelton A; Gonzaga, Marisvanda R; Barbosa, Leandro R S; Ramos, Carlos H I; Borges, Júlio C

    2016-06-15

    Parasites belonging to the genus Leishmania are subjected to extensive environmental changes during their life cycle; molecular chaperones/co-chaperones act as protagonists in this scenario to maintain cellular homeostasis. Hop/Sti1 is a co-chaperone that connects the Hsp90 and Hsp70 systems, modulating their ATPase activities and affecting the fate of client proteins because it facilitates their transfer from the Hsp70 to the Hsp90 chaperone. Hop/Sti1 is one of the most prevalent co-chaperones, highlighting its importance despite the relatively low sequence identity among orthologue proteins. This multi-domain protein comprises three tetratricopeptides domains (TPR1, TPR2A and TPR2B) and two Asp/Pro-rich domains. Given the importance of Hop/Sti1 for the chaperone system and for Leishmania protozoa viability, the Leishmania braziliensis Hop (LbHop) and a truncated mutant (LbHop(TPR2AB)) were characterized. Structurally, both proteins are α-helix-rich and highly elongated monomeric proteins. Functionally, they inhibited the ATPase activity of Leishmania braziliensis Hsp90 (LbHsp90) to a similar extent, and the thermodynamic parameters of their interactions with LbHsp90 were similar, indicating that TPR2A-TPR2B forms the functional center for the LbHop interaction with LbHsp90. These results highlight the structural and functional similarity of Hop/Sti1 proteins, despite their low sequence conservation compared to the Hsp70 and Hsp90 systems, which are phylogenetic highly conserved. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. Highly conserved intragenic HSV-2 sequences: Results from next-generation sequencing of HSV-2 UL and US regions from genital swabs collected from 3 continents.

    PubMed

    Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M

    2017-10-01

    Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10(7)log10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (UL_US) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated UL_US segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Whole-genome sequences of Odocoileus hemionus deer adenovirus isolates from deer, moose and elk are highly conserved and support a new species in the genus Atadenovirus.

    PubMed

    Miller, Myrna M; Cornish, Todd E; Creekmore, Terry E; Fox, Karen; Laegreid, Will; McKenna, Jennifer; Vasquez, Marce; Woods, Leslie W

    2017-09-01

    We present the first complete genome sequence of Odocoileus hemionus deer adenovirus 1 (OdAdV-1). This virus can cause sporadic haemorrhagic disease in cervids, although epizootics with high mortality have occurred in California. OdAdV-1 has been placed in the genus Atadenovirus, based on partial hexon, pVIII and fibre genes. Ten field isolates recovered from naturally infected mule deer (Odocoileus hemionus), white-tailed deer (Odocoileus virginiana) and moose (Alces alces) from Wyoming, black-tailed deer (Odocoileus hemionus columbianus) from California, and Rocky Mountain elk (Cervus elaphus nelsoni) from Colorado and Washington state were sequenced. The genome lengths ranged from 30 620 to 30 699 bp, contained the predicted proteins and gene organization typical of members of genus Atadenovirus, and had a high percentage of A/T nucleotides (66.7 %). Phylogenic analysis found that the closest ancestry was with ruminant atadenoviruses, while a divergence of the hexon, polymerase and penton base proteins of more than 15 % supports classification as a new species. Genetic global comparison between the 10 isolates found an overall 99 % identity, but greater divergence was found between those recovered from moose and elk as compared to deer, and a single variable region contained most of these differences. Our findings demonstrate that OdAdV-1 is highly conserved between 10 isolates recovered from multiple related cervid species, but genotypic differences, largely localized to a variable region, define two strains. We propose that the virus type name be changed to cervid adenovirus 1, with the species name Cervid atadenovirus A. Sequence data were used to develop molecular assays for improved detection and genotyping.

  20. Use of genotyping by sequencing data to develop a high-throughput and multifunctional SNP panel for conservation applications in Pacific lamprey.

    PubMed

    Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R

    2015-01-01

    Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.

  1. Properties of Sequence Conservation in Upstream Regulatory and Protein Coding Sequences among Paralogs in Arabidopsis thaliana

    NASA Astrophysics Data System (ADS)

    Richardson, Dale N.; Wiehe, Thomas

    Whole genome duplication (WGD) has catalyzed the formation of new species, genes with novel functions, altered expression patterns, complexified signaling pathways and has provided organisms a level of genetic robustness. We studied the long-term evolution and interrelationships of 5’ upstream regulatory sequences (URSs), protein coding sequences (CDSs) and expression correlations (EC) of duplicated gene pairs in Arabidopsis. Three distinct methods revealed significant evolutionary conservation between paralogous URSs and were highly correlated with microarray-based expression correlation of the respective gene pairs. Positional information on exact matches between sequences unveiled the contribution of micro-chromosomal rearrangements on expression divergence. A three-way rank analysis of URS similarity, CDS divergence and EC uncovered specific gene functional biases. Transcription factor activity was associated with gene pairs exhibiting conserved URSs and divergent CDSs, whereas a broad array of metabolic enzymes was found to be associated with gene pairs showing diverged URSs but conserved CDSs.

  2. HIV-1 conserved-element vaccines: relationship between sequence conservation and replicative capacity.

    PubMed

    Rolland, Morgane; Manocheewa, Siriphan; Swain, J Victor; Lanxon-Cookson, Erinn C; Kim, Moon; Westfall, Dylan H; Larsen, Brendan B; Gilbert, Peter B; Mullins, James I

    2013-05-01

    To overcome the problem of HIV-1 variability, candidate vaccine antigens have been designed to be composed of conserved elements of the HIV-1 proteome. Such candidate vaccines could be improved with a better understanding of both HIV-1 evolutionary constraints and the fitness cost of specific mutations. We evaluated the in vitro fitness cost of 23 mutations engineered in the HIV-1 subtype B Gag-p24 Center-of-Tree (COT) protein through fitness competition assays. While some mutations at conserved sites exacted a high fitness cost, as expected under the assumption that the most conserved residue confers the highest fitness, there was no overall strong relationship between sequence conservation and replicative capacity. By comparing sites that have evolved since the beginning of the epidemic to those that have remain unchanged, we found that sites that have evolved over time were more likely to correspond to HLA-associated sites and that their mutation had limited fitness costs. Our data showed no transcendent link between high conservation and high fitness cost, indicating that merely focusing on conserved segments of HIV-1 would not be sufficient for a successful vaccine strategy. Nonetheless, a subset of sites exacted a high fitness cost upon mutation--these sites have been under selective pressure to change since the beginning of the epidemic but have proved virtually nonmutable and could constitute preferred targets for vaccine design.

  3. A highly conserved N-terminal sequence for teleost vitellogenin with potential value to the biochemistry, molecular biology and pathology of vitellogenesis

    USGS Publications Warehouse

    Folmar, L.D.; Denslow, N.D.; Wallace, R.A.; LaFleur, G.; Gross, T.S.; Bonomelli, S.; Sullivan, C.V.

    1995-01-01

    N-terminal amino acid sequences for vitellogenin (Vtg) from six species of teleost fish (striped bass, mummichog, pinfish, brown bullhead, medaka, yellow perch and the sturgeon) are compared with published N-terminal Vtg sequences for the lamprey, clawed frog and domestic chicken. Striped bass and mummichog had 100% identical amino acids between positions 7 and 21, while pinfish, brown bullhead, sturgeon, lamprey, Xenopus and chicken had 87%, 93%, 60%, 47%, 47-60%) for four transcripts and had 40% identical, respectively, with striped bass for the same positions. Partial sequences obtained for medaka and yellow perch were 100% identical between positions 5 to 10. The potential utility of this conserved sequence for studies on the biochemistry, molecular biology and pathology of vitellogenesis is discussed.

  4. High-Throughput Sequencing Identifies Novel and Conserved Cucumber (Cucumis sativus L.) microRNAs in Response to Cucumber Green Mottle Mosaic Virus Infection

    PubMed Central

    Liang, C. Q.; Jiang, N.; Liu, P. F.; Li, J. Q.

    2015-01-01

    Seedlings of Cucumis sativus L. (cv. 'Zhongnong 16') were artificially inoculated with Cucumber green mottle mosaic virus (CGMMV) at the three-true-leaf stage. Leaf and flower samples were collected at different time points post-inoculation (10, 30 and 50 d), and processed by high throughput sequencing analysis to identify candidate miRNA sequences. Bioinformatic analysis using screening criteria, and secondary structure prediction, indicated that 8 novel and 23 known miRNAs (including 15 miRNAs described for the first time in vivo) were produced by cucumber plants in response to CGMMV infection. Moreover, gene expression profiles (p-value <0.01) validated the expression of 3 of the novel miRNAs and 3 of the putative candidate miRNAs and identified a further 82 conserved miRNAs in CGMMV-infected cucumbers. Gene ontology (GO) analysis revealed that the predicted target genes of these 88 miRNAs, which were screened using the psRNATarget and miRanda algorithms, were involved in three functional categories: 2265 in molecular function, 1362 as cellular components and 276 in biological process. The subsequent Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed that the predicted target genes were frequently involved in metabolic processes (166 pathways) and genetic information processes (40 pathways) and to a lesser degree the biosynthesis of secondary metabolites (12 pathways). These results could provide useful clues to help elucidate host-pathogen interactions in CGMMV and cucumber, as well as for the screening of resistance genes. PMID:26076360

  5. High-Throughput Sequencing Identifies Novel and Conserved Cucumber (Cucumis sativus L.) microRNAs in Response to Cucumber Green Mottle Mosaic Virus Infection.

    PubMed

    Liu, H W; Luo, L X; Liang, C Q; Jiang, N; Liu, P F; Li, J Q

    2015-01-01

    Seedlings of Cucumis sativus L. (cv. 'Zhongnong 16') were artificially inoculated with Cucumber green mottle mosaic virus (CGMMV) at the three-true-leaf stage. Leaf and flower samples were collected at different time points post-inoculation (10, 30 and 50 d), and processed by high throughput sequencing analysis to identify candidate miRNA sequences. Bioinformatic analysis using screening criteria, and secondary structure prediction, indicated that 8 novel and 23 known miRNAs (including 15 miRNAs described for the first time in vivo) were produced by cucumber plants in response to CGMMV infection. Moreover, gene expression profiles (p-value <0.01) validated the expression of 3 of the novel miRNAs and 3 of the putative candidate miRNAs and identified a further 82 conserved miRNAs in CGMMV-infected cucumbers. Gene ontology (GO) analysis revealed that the predicted target genes of these 88 miRNAs, which were screened using the psRNATarget and miRanda algorithms, were involved in three functional categories: 2265 in molecular function, 1362 as cellular components and 276 in biological process. The subsequent Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed that the predicted target genes were frequently involved in metabolic processes (166 pathways) and genetic information processes (40 pathways) and to a lesser degree the biosynthesis of secondary metabolites (12 pathways). These results could provide useful clues to help elucidate host-pathogen interactions in CGMMV and cucumber, as well as for the screening of resistance genes.

  6. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE PAGES

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; ...

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  7. Functionally conserved enhancers with divergent sequences in distant vertebrates

    SciTech Connect

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; Heo, Seok -Jin; Poliakov, Alexander; Ahituv, Nadav; Dubchak, Inna; Boffelli, Dario

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  8. The complete sequence of a Spanish isolate of Broad bean wilt virus 1 (BBWV-1) reveals a high variability and conserved motifs in the genus Fabavirus.

    PubMed

    Ferrer, R M; Guerri, J; Luis-Arteaga, M S; Moreno, P; Rubio, L

    2005-10-01

    The genome of a Spanish isolate of Broad bean wilt virus-1 (BBWV-1) was completely sequenced and compared with available sequences of other isolates of the genus Fabavirus (BBWV-1 and BBWV-2). This consisted of two RNAs of 5814 and 3431 nucleotides, respectively, and their organization was similar to that of other members of the family Comoviridae. Its mean nucleotide identity with a BBWV-1 American isolate was 81.5%, and between 59.8 and 63.5% with seven BBWV-2 isolates. Our analysis showed sequence stretches in the 5' non-coding regions which are conserved in both genomic RNAs and in BBWV-1 and BBWV-2 isolates.

  9. Novel low abundance and transient RNAs in yeast revealed by tiling microarrays and ultra high-throughput sequencing are not conserved across closely related yeast species.

    PubMed

    Lee, Albert; Hansen, Kasper Daniel; Bullard, James; Dudoit, Sandrine; Sherlock, Gavin

    2008-12-01

    A complete description of the transcriptome of an organism is crucial for a comprehensive understanding of how it functions and how its transcriptional networks are controlled, and may provide insights into the organism's evolution. Despite the status of Saccharomyces cerevisiae as arguably the most well-studied model eukaryote, we still do not have a full catalog or understanding of all its genes. In order to interrogate the transcriptome of S. cerevisiae for low abundance or rapidly turned over transcripts, we deleted elements of the RNA degradation machinery with the goal of preferentially increasing the relative abundance of such transcripts. We then used high-resolution tiling microarrays and ultra high-throughput sequencing (UHTS) to identify, map, and validate unannotated transcripts that are more abundant in the RNA degradation mutants relative to wild-type cells. We identified 365 currently unannotated transcripts, the majority presumably representing low abundance or short-lived RNAs, of which 185 are previously unknown and unique to this study. It is likely that many of these are cryptic unstable transcripts (CUTs), which are rapidly degraded and whose function(s) within the cell are still unclear, while others may be novel functional transcripts. Of the 185 transcripts we identified as novel to our study, greater than 80 percent come from regions of the genome that have lower conservation scores amongst closely related yeast species than 85 percent of the verified ORFs in S. cerevisiae. Such regions of the genome have typically been less well-studied, and by definition transcripts from these regions will distinguish S. cerevisiae from these closely related species.

  10. Identification of novel and conserved miRNAs involved in pollen development in Brassica campestris ssp. chinensis by high-throughput sequencing and degradome analysis

    PubMed Central

    2014-01-01

    Background microRNAs (miRNAs) are endogenous, noncoding, small RNAs that have essential regulatory functions in plant growth, development, and stress response processes. However, limited information is available about their functions in sexual reproduction of flowering plants. Pollen development is an important process in the life cycle of a flowering plant and is a major factor that affects the yield and quality of crop seeds. Results This study aims to identify miRNAs involved in pollen development. Two independent small RNA libraries were constructed from the flower buds of the male sterile line (Bcajh97-01A) and male fertile line (Bcajh97-01B) of Brassica campestris ssp. chinensis. The libraries were subjected to high-throughput sequencing by using the Illumina Solexa system. Eight novel miRNAs on the other arm of known pre-miRNAs, 54 new conserved miRNAs, and 8 novel miRNA members were identified. Twenty-five pairs of novel miRNA/miRNA* were found. Among all the identified miRNAs, 18 differentially expressed miRNAs with over two-fold change between flower buds of male sterile line (Bcajh97-01A) and male fertile line (Bcajh97-01B) were identified. qRT-PCR analysis revealed that most of the differentially expressed miRNAs were preferentially expressed in flower buds of the male fertile line (Bcajh97-01B). Degradome analysis showed that a total of 15 genes were predicted to be the targets of seven miRNAs. Conclusions Our findings provide an overview of potential miRNAs involved in pollen development and interactions between miRNAs and their corresponding targets, which may provide important clues on the function of miRNAs in pollen development. PMID:24559317

  11. A Developmental Sequence of Skills Leading to Conservation

    ERIC Educational Resources Information Center

    Walker, Alice A.

    1978-01-01

    Examines the developmental sequence of skills involved in the understanding of relational concepts and in the development of conservation. Fifty kindergarten children participated in the study. (BD/BR)

  12. A Developmental Sequence of Skills Leading to Conservation

    ERIC Educational Resources Information Center

    Walker, Alice A.

    1978-01-01

    Examines the developmental sequence of skills involved in the understanding of relational concepts and in the development of conservation. Fifty kindergarten children participated in the study. (BD/BR)

  13. Coupling DNA-binding and ATP hydrolysis in Escherichia coli RecQ: role of a highly conserved aromatic-rich sequence.

    PubMed

    Zittel, Morgan C; Keck, James L

    2005-01-01

    RecQ enzymes are broadly conserved Superfamily-2 (SF-2) DNA helicases that play critical roles in DNA metabolism. RecQ proteins use the energy of ATP hydrolysis to drive DNA unwinding; however, the mechanisms by which RecQ links ATPase activity to DNA-binding/unwinding are unknown. In many Superfamily-1 (SF-1) DNA helicases, helicase sequence motif III links these activities by binding both single-stranded (ss) DNA and ATP. However, the ssDNA-binding aromatic-rich element in motif III present in these enzymes is missing from SF-2 helicases, raising the question of how these enzymes link ATP hydrolysis to DNA-binding/unwinding. We show that Escherichia coli RecQ contains a conserved aromatic-rich loop in its helicase domain between motifs II and III. Although placement of the RecQ aromatic-rich loop is topologically distinct relative to the SF-1 enzymes, both loops map to similar tertiary structural positions. We examined the functions of the E.coli RecQ aromatic-rich loop using RecQ variants with single amino acid substitutions within the segment. Our results indicate that the aromatic-rich loop in RecQ is critical for coupling ATPase and DNA-binding/unwinding activities. Our studies also suggest that RecQ's aromatic-rich loop might couple ATP hydrolysis to DNA-binding in a mechanistically distinct manner from SF-1 helicases.

  14. Nucleotide sequence conservation in paramyxoviruses; the concept of codon constellation.

    PubMed

    Rima, Bert K

    2015-05-01

    The stability and conservation of the sequences of RNA viruses in the field and the high error rates measured in vitro are paradoxical. The field stability indicates that there are very strong selective constraints on sequence diversity. The nature of these constraints is discussed. Apart from constraints on variation in cis-acting RNA and the amino acid sequences of viral proteins, there are other ones relating to the presence of specific dinucleotides such CpG and UpA as well as the importance of RNA secondary structures and RNA degradation rates. Recent other constraints identified in other RNA viruses, such as effects of secondary RNA structure on protein folding or modification of cellular tRNA complements, are also discussed. Using the family Paramyxoviridae, I show that the codon usage pattern (CUP) is (i) specific for each virus species and (ii) that it is markedly different from the host - it does not vary even in vaccine viruses that have been derived by passage in a number of inappropriate host cells. The CUP might thus be an additional constraint on variation, and I propose the concept of codon constellation to indicate the informational content of the sequences of RNA molecules relating not only to stability and structure but also to the efficiency of translation of a viral mRNA resulting from the CUP and the numbers and position of rare codons.

  15. Maize peroxidase Px5 has a highly conserved sequence in inbreds resistant to mycotoxin producing fungi which enhances fungal and insect resistance.

    PubMed

    Dowd, Patrick F; Johnson, Eric T

    2016-01-01

    Mycotoxin presence in maize causes health and economic issues for humans and animals. Although many studies have investigated expression differences of genes putatively governing resistance to producing fungi, few have confirmed a resistance role, or examined putative resistance gene structure in more than a couple of inbreds. The pericarp expression of maize Px5 has previously been associated with resistance to Aspergillus flavus growth and insects in a set of inbreds. Genes from 14 different inbreds that included ones with resistance and susceptibility to A. flavus, Fusarium proliferatum, F. verticillioides and F. graminearum and/or mycotoxin production were cloned using high fidelity enzymes, and sequenced. The sequence of Px5 from all resistant inbreds was identical, except for a single base change in two inbreds, only one of which affected the amino acid sequence. Conversely, the Px5 sequence from several susceptible inbreds had several base variations, some of which affected amino acid sequence that would potentially alter secondary structure, and thus enzyme function. The sequence of the maize peroxidase Px5 common to inbreds resistant to mycotoxigenic fungi was overexpressed in maize callus. Callus transformants overexpressing the gene caused significant reductions in growth for fall armyworms, corn earworms, and F. graminearum compared to transformant callus with a β-glucuronidase gene. This study demonstrates rarer transcripts of potential resistance genes overlooked by expression screens can be identified by sequence comparisons. A role in pest resistance can be verified by callus expression of the candidate genes, which can thereby justify larger scale transformation and regeneration of transgenic plants expressing the resistance gene for further evaluation.

  16. Polymorphism, monomorphism, and sequences in conserved microsatellites in primate species.

    PubMed

    Blanquer-Maumont, A; Crouau-Roy, B

    1995-10-01

    Dimeric short tandem repeats are a source of highly polymorphic markers in the mammalian genome. Genetic variation at these hypervariable loci is extensively used for linkage analysis, for the identification of individuals, and may be useful for interpopulation and interspecies studies. In this paper, we analyze the variability and the sequences of a segment including three microsatellites, first described in man, in several species of primates (chimpanzee, orangutan, gibbon, and macaque) using the heterologous primers (man primers). This region is located on the human chromosome 6p, near the tumor necrosis factor genes, in the major histocompatibility complex. The fact that these primers work in all species studied indicates that they are conserved throughout the different lineages of the two superfamilies, the Hominoidea and the Cercopithecidea, represented by the macaques. However, the intervening sequence displays intraspecific and interspecific variability. The sites of base substitutions and the insertion/deletion events are not evenly distributed within this region. The data suggest that it is necessary to have a minimal number of repeats to increase the rate of mutation sufficiently to allow the development of polymorphism. In some species, the microsatellites present single base variations which reduce the number of contiguous repeats, thus apparently slowing the rate of additional slippage events. Species with such variations or a low number of repeats are monomorphic. These microsatellite sequences are informative in the comparison of closely related species and reflect the phylogeny of the Old World monkeys, apes, and man.

  17. Localization of the labile disulfide bond between SU and TM of the murine leukemia virus envelope protein complex to a highly conserved CWLC motif in SU that resembles the active-site sequence of thiol-disulfide exchange enzymes.

    PubMed Central

    Pinter, A; Kopelman, R; Li, Z; Kayman, S C; Sanders, D A

    1997-01-01

    Previous studies have indicated that the surface (SU) and transmembrane (TM) subunits of the envelope protein (Env) of murine leukemia viruses (MuLVs) are joined by a labile disulfide bond that can be stabilized by treatment of virions with thiol-specific reagents. In the present study this observation was extended to the Envs of additional classes of MuLV, and the cysteines of SU involved in this linkage were mapped by proteolytic fragmentation analyses to the CWLC sequence present at the beginning of the C-terminal domain of SU. This sequence is highly conserved across a broad range of distantly related retroviruses and resembles the CXXC motif present at the active site of thiol-disulfide exchange enzymes. A model is proposed in which rearrangements of the SU-TM intersubunit disulfide linkage, mediated by the CWLC sequence, play roles in the assembly and function of the Env complex. PMID:9311907

  18. Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

    PubMed Central

    Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

    2016-01-01

    The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608

  19. Refining multiple sequence alignments with conserved core regions

    PubMed Central

    Chakrabarti, Saikat; Lanczycki, Christopher J.; Panchenko, Anna R.; Przytycka, Teresa M.; Thiessen, Paul A.; Bryant, Stephen H.

    2006-01-01

    Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution () and will be incorporated into the next release of the Cn3D structure/alignment viewer. PMID:16707662

  20. Sequence and structure conservation in a protein core.

    PubMed

    Rodionov, M A; Blundell, T L

    1998-11-15

    In order to study structural aspects of sequence conservation in families of homologous proteins, we have analyzed structurally aligned sequences of 585 proteins grouped into 128 homologous families. The conservation of a residue in a family is defined as the average residue similarity in a given position of aligned sequences. The residue similarities were expressed in the form of log-odd substitution tables that take into account the environments of amino acids in three-dimensional structures. The protein core is defined as those residues that have less then 7% solvent accessibility. The density of a protein core is described in terms of atom packing, which is investigated as a criterion for residue substitution and conservation. Although there is no significant correlation between sequence conservation and average atom packing around nonpolar residues such as leucine, valine and isoleucine, a significant correlation is observed for polar residues in the protein core. This may be explained by the hydrogen bonds in which polar residues are involved; the better their protection from water access the more stable should be the structure in that position.

  1. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    PubMed

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to

  2. Unique sequence features of the Human Adenovirus 31 complete genomic sequence are conserved in clinical isolates

    PubMed Central

    2009-01-01

    Background Human adenoviruses (HAdV) are causing a broad spectrum of diseases. One of the most severe forms of adenovirus infection is a disseminated disease resulting in significant morbidity and mortality. Several reports in recent years have identified HAdV-31 from species A (HAdV-A31) as a cause of disseminated disease in children following haematopoetic stem cell transplantation (hSCT) and liver transplantation. We sequenced and analyzed the complete genome of the HAdV-A31 prototype strain to uncover unique sequence motifs associated with its high virulence. Moreover, we sequenced coding regions known to be essential for tropism and virulence (early transcription units E1A, E3, E4, the fiber knob and the penton base) of HAdV-A31 clinical isolates from patients with disseminated disease. Results The genome size of HAdV-A31 is 33763 base pairs (bp) in length with a GC content of 46.36%. Nucleotide alignment to the closely related HAdV-A12 revealed an overall homology of 84.2%. The genome organization into early, intermediate and late regions is similar to HAdV-A12. Sequence analysis of the prototype strain showed unique sequence features such as an immunoglobulin-like domain in the species A specific gene product E3 CR1 beta and a potentially integrin binding RGD motif in the C-terminal region of the protein IX. These features were conserved in all analyzed clinical isolates. Overall, amino acid sequences of clinical isolates were highly conserved compared to the prototype (99.2 to 100%), but a synonymous/non synonymous ratio (S/N) of 2.36 in E3 CR1 beta suggested positive selection. Conclusion Unique sequence features of HAdV-A31 may enhance its ability to escape the host's immune surveillance and may facilitate a promiscuous tropism for various tissues. Moderate evolution of clinical isolates did not indicate the emergence of new HAdV-A31 subtypes in the recent years. PMID:19939241

  3. The highly conserved amino acid sequence motif Tyr-Gly-Asp-Thr-Asp-Ser in alpha-like DNA polymerases is required by phage phi 29 DNA polymerase for protein-primed initiation and polymerization.

    PubMed Central

    Bernad, A; Lázaro, J M; Salas, M; Blanco, L

    1990-01-01

    The alpha-like DNA polymerases from bacteriophage phi 29 and other viruses, prokaryotes and eukaryotes contain an amino acid consensus sequence that has been proposed to form part of the dNTP binding site. We have used site-directed mutants to study five of the six highly conserved consecutive amino acids corresponding to the most conserved C-terminal segment (Tyr-Gly-Asp-Thr-Asp-Ser). Our results indicate that in phi 29 DNA polymerase this consensus sequence, although irrelevant for the 3'----5' exonuclease activity, is essential for initiation and elongation. Based on these results and on its homology with known or putative metal-binding amino acid sequences, we propose that in phi 29 DNA polymerase the Tyr-Gly-Asp-Thr-Asp-Ser consensus motif is part of the dNTP binding site, involved in the synthetic activities of the polymerase (i.e., initiation and polymerization), and that it is involved particularly in the metal binding associated with the dNTP site. Images PMID:2191296

  4. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    PubMed Central

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-01-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment. PMID:28262684

  5. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction.

    PubMed

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-03-06

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.

  6. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    NASA Astrophysics Data System (ADS)

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-03-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.

  7. Protein sequence conservation and stable molecular evolution reveals influenza virus nucleoprotein as a universal druggable target.

    PubMed

    Babar, Mustafeez Mujtaba; Zaidi, Najam-us-Sahar Sadaf

    2015-08-01

    The high mutation rate in influenza virus genome and appearance of drug resistance calls for a constant effort to identify alternate drug targets and develop new antiviral strategies. The internal proteins of the virus can be exploited as a potential target for therapeutic interventions. Among these, the nucleoprotein (NP) is the most abundant protein that provides structural and functional support to the viral replication machinery. The current study aims at analysis of protein sequence polymorphism patterns, degree of molecular evolution and sequence conservation as a function of potential druggability of nucleoprotein. We analyzed a universal set of amino acid sequences, (n=22,000) and, in order to identify and correlate the functionally conserved, druggable regions across different parameters, classified them on the basis of host organism, strain type and continental region of sample isolation. The results indicated that around 95% of the sequence length was conserved, with at least 7 regions conserved across the protein among various classes. Moreover, the highly variable regions, though very limited in number, were found to be positively selected indicating, thereby, the high degree of protein stability against various hosts and spatio-temporal references. Furthermore, on mapping the conserved regions on the protein, 7 drug binding pockets in the functionally important regions of the protein were revealed. The results, therefore, collectively indicate that nucleoprotein is a highly conserved and stable viral protein that can potentially be exploited for development of broadly effective antiviral strategies.

  8. Patterns of sequence conservation in the S-Layer proteins and related sequences in Clostridium difficile.

    PubMed

    Calabi, Emanuela; Fairweather, Neil

    2002-07-01

    Clostridium difficile is the etiological agent of antibiotic-associated diarrhea. Among the factors that may play a role in infection are S-layer proteins (SLPs). Previous work has shown these to consist mainly of two components, resulting from the cleavage of a precursor encoded by the slpA gene. The high-molecular-weight (MW) subunit is related both to amidases from B. subtilis and to at least another 28 gene products in C. difficile strain 630. To gain insight into the functions of the SLPs and related proteins, we have further investigated the pattern of variability both at the slpA locus and at six nearby paralogs. Sequencing of the slpA gene from an S-layer group II strain and a variant S-layer group strain confirms a high degree of divergence in the low-MW SLP, which may result from diversifying selection. A highly conserved motif, however, is found at the C terminus in all low-MW subunits and may be essential for SlpA precursor cleavage. In strain 167, a variant cleavage product is present, suggesting a secondary processing site. Southern blotting analysis shows slpA-like open reading frames (ORFs) 2 to 7 to be conserved in all nine strains tested, with one exception: ORF2, which encodes a 66-kDa polypeptide coextracted at low pH with the main SLPs in strain 630, may be partially deleted in strain 167. Polymorphism within the slpA-ORF7 cluster may be more pronounced in the region proximal to the slpA gene. Unexpectedly, a high-MW subunit probe cross hybridizes to sequences outside the slpA locus, which appear to vary in number in different strains.

  9. Sequencing Conservation Actions Through Threat Assessments in the Southeastern United States

    Treesearch

    Robert D. Sutter; Christopher C. Szell

    2006-01-01

    The identification of conservation priorities is one of the leading issues in conservation biology. We present a project of The Nature Conservancy, called Sequencing Conservation Actions, which prioritizes conservation areas and identifies foci for crosscutting strategies at various geographic scales. We use the term “Sequencing” to mean an ordering of actions over...

  10. Identification of a conserved sequence in the non-coding regions of many human genes.

    PubMed Central

    Donehower, L A; Slagle, B L; Wilde, M; Darlington, G; Butel, J S

    1989-01-01

    We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome. Images PMID:2536922

  11. Conservation of MHC class II DOA sequences among carnivores.

    PubMed

    Soll, S J; Stewart, B S; Lehman, N

    2005-03-01

    We obtained the nucleotide sequence for most of the major histocompatibility complex (MHC) class II DOA locus for Weddell, leopard, northern elephant, and southern elephant seals and from the coyote and compared them to all known DOA data available to date. We found generally low levels of interspecific polymorphisms, providing further support for stabilizing selection acting on the DOA locus. This suggests that DO gene products play a substantial functional role in the regulation of antigen presentation. A seven-amino-acid motif of VWRLPEF was found to be conserved across all DOA sequences and may be a DO-specific recognition element.

  12. Sequence and structural conservation in RNA ribose zippers

    SciTech Connect

    Tamura, Makio; Holbrook, Stephen R.

    2002-03-01

    The ribose zipper, an important element of RNA tertiary structure, is characterized by consecutive hydrogen-bonding interactions between ribose 20-hydroxyls from different regions of an RNA chain or between RNA chains. These tertiary contacts have previously been observed to also involve base backbone and base base interactions (A-minor type). We searched for ribose zipper tertiary interactions in the crystal structures of the large ribosomal subunit RNAs of Haloarcula marismortui and Deinococcus radiodurans, and the small ribosomal subunit RNA of Thermus thermophilus and identified a total of 97 ribose zippers. Of these, 20 were found in T. thermophilus 16 S rRNA, 44 in H. marismortui 23 S rRNA (plus 2 bridging 5 S and 23 S rRNAs) and 30 in D. radiodurans 23 S rRNA (plus 1 bridging 5 S and 23 S rRNAs). These were analyzed in terms of sequence conservation, structural conservation and stability, location in secondary structure, and phylogenetic conservation. Eleven types of ribose zippers were defined based on ribose base interactions. Of these 11, seven were observed in the ribosomal RNAs. The most common of these is the canonical ribose zipper, originally observed in the P4 P6 group I intron fragment. All ribose zippers were formed by antiparallel chain interactions and only a single example extended beyond two residues, forming an overlapping ribose zipper of three consecutive residues near the small subunit A-site. Almost all ribose zippers link stem (Watson Crick duplex) or stem-like (base-paired), with loop (external, internal, or junction) chain segments. About two-thirds of the observed ribose zippers interact with ribosomal proteins. Most of these ribosomal proteins bridge the ribose zipper chain segments with basic amino acid residues hydrogen bonding to the RNA backbone. Proteins involved in crucial ribosome function and in early stages of ribosomal assembly also stabilize ribose zipper interactions. All ribose zippers show strong sequence conservation

  13. Local function conservation in sequence and structure space.

    PubMed

    Weinhold, Nils; Sander, Oliver; Domingues, Francisco S; Lengauer, Thomas; Sommer, Ingolf

    2008-07-04

    We assess the variability of protein function in protein sequence and structure space. Various regions in this space exhibit considerable difference in the local conservation of molecular function. We analyze and capture local function conservation by means of logistic curves. Based on this analysis, we propose a method for predicting molecular function of a query protein with known structure but unknown function. The prediction method is rigorously assessed and compared with a previously published function predictor. Furthermore, we apply the method to 500 functionally unannotated PDB structures and discuss selected examples. The proposed approach provides a simple yet consistent statistical model for the complex relations between protein sequence, structure, and function. The GOdot method is available online (http://godot.bioinf.mpi-inf.mpg.de).

  14. Internal epitope tagging informed by relative lack of sequence conservation

    PubMed Central

    Burg, Leonard; Zhang, Karen; Bonawitz, Tristan; Grajevskaja, Viktorija; Bellipanni, Gianfranco; Waring, Richard; Balciunas, Darius

    2016-01-01

    Many experimental techniques rely on specific recognition and stringent binding of proteins by antibodies. This can readily be achieved by introducing an epitope tag. We employed an approach that uses a relative lack of evolutionary conservation to inform epitope tag site selection, followed by integration of the tag-coding sequence into the endogenous locus in zebrafish. We demonstrate that an internal epitope tag is accessible for antibody binding, and that tagged proteins retain wild type function. PMID:27892520

  15. Conservation patterns in different functional sequence categoriesof divergent Drosophila species

    SciTech Connect

    Papatsenko, Dmitri; Kislyuk, Andrey; Levine, Michael; Dubchak, Inna

    2005-10-01

    We have explored the distributions of fully conservedungapped blocks in genome-wide pairwise alignments of recently completedspecies of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilisand D.mojavensis. Based on these distributions we have found that nearlyevery functional sequence category possesses its own distinctiveconservation pattern, sometimes independent of the overall sequenceconservation level. In the coding and regulatory regions, the ungappedblocks were longer than in introns, UTRs and non-functional sequences. Atthe same time, the blocks in the coding regions carried 3N+2 signaturecharacteristic to synonymic substitutions in the 3rd codon positions.Larger block sizes in transcription regulatory regions can be explainedby the presence of conserved arrays of binding sites for transcriptionfactors. We also have shown that the longest ungapped blocks, or'ultraconserved' sequences, are associated with specific gene groups,including those encoding ion channels and components of the cytoskeleton.We discussed how restrained conservation patterns may help in mappingfunctional sequence categories and improving genomeannotation.

  16. Identification of a conserved sequence in the non-coding regions of many human genes

    SciTech Connect

    Donehower, L.A.; Slagle, B.L.; Wilde, M.; Darlington, G.; Butel, J.S. )

    1989-01-25

    The authors have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.

  17. Conservative Patch Algorithm and Mesh Sequencing for PAB3D

    NASA Technical Reports Server (NTRS)

    Pao, S. P.; Abdol-Hamid, K. S.

    2005-01-01

    A mesh-sequencing algorithm and a conservative patched-grid-interface algorithm (hereafter Patch Algorithm ) have been incorporated into the PAB3D code, which is a computer program that solves the Navier-Stokes equations for the simulation of subsonic, transonic, or supersonic flows surrounding an aircraft or other complex aerodynamic shapes. These algorithms are efficient, flexible, and have added tremendously to the capabilities of PAB3D. The mesh-sequencing algorithm makes it possible to perform preliminary computations using only a fraction of the grid cells (provided the original cell count is divisible by an integer) along any grid coordinate axis, independently of the other axes. The patch algorithm addresses another critical need in multi-block grid situation where the cell faces of adjacent grid blocks may not coincide, leading to errors in calculating fluxes of conserved physical quantities across interfaces between the blocks. The patch algorithm, based on the Stokes integral formulation of the applicable conservation laws, effectively matches each of the interfacial cells on one side of the block interface to the corresponding fractional cell area pieces on the other side. This approach is comprehensive and unified such that all interface topology is automatically processed without user intervention. This algorithm is implemented in a preprocessing code that creates a cell-by-cell database that will maintain flux conservation at any level of full or reduced grid density as the user may choose by way of the mesh-sequencing algorithm. These two algorithms have enhanced the numerical accuracy of the code, reduced the time and effort for grid preprocessing, and provided users with the flexibility of performing computations at any desired full or reduced grid resolution to suit their specific computational requirements.

  18. Sequence conservation predicts T cell reactivity against ragweed allergens

    PubMed Central

    Pham, John; Oseroff, Carla; Hinz, Denise; Sidney, John; Paul, Sinu; Greenbaum, Jason; Vita, Randi; Phillips, Elizabeth; Mallal, Simon; Peters, Bjoern; Sette, Alessandro

    2016-01-01

    Background Ragweed is a major cause of seasonal allergy, affecting millions of people worldwide. Several allergens have been defined based on IgE reactivity, but their relative immunogenicity in terms of T cell responses has not been studied. Objective We comprehensively characterized T cell responses from atopic, ragweed-allergic subjects to Amb a 1, Amb a 3, Amb a 4, Amb a 5, Amb a 6, Amb a 8, Amb a 9, Amb a 10, Amb a 11, and Amb p 5, and examined their correlation with serological reactivity and sequence conservation in other allergens. Methods Peripheral blood mononuclear cells (PBMCs) from donors positive for IgE toward ragweed extracts after in vitro expansion for secretion of IL-5 (a representative Th2 cytokine) and IFNγ (Th1) in response to a panel of overlapping peptides spanning the above listed allergens. Results Three previously identified dominant T cell epitopes (Amb a 1 176–191, 200–215, and 344–359) were confirmed and three novel dominant epitopes (Amb a 1 280–295, 304–319, and 320–335) were identified. Amb a 1, the dominant IgE allergen, was also the dominant T cell allergen, but dominance patterns for T cell and IgE responses for the other ragweed allergens did not correlate. Dominance for T cell responses correlated with conservation of ragweed epitopes with sequences of other well-known allergens. Conclusion and clinical relevance These results provide the first assessment of the hierarchy of T cell reactivity in ragweed allergens, which is distinct from that observed for IgE reactivity and influenced by T cell epitope sequence conservation. The results suggest that ragweed allergens associated with lesser IgE reactivity and significant T cell reactivity may be targeted for T cell immunotherapy, and further support the development of immunotherapies against epitopes conserved across species to generate broad reactivity against many common allergens. PMID:27359111

  19. Conservation patterns in angiosperm rDNA ITS2 sequences.

    PubMed Central

    Hershkovitz, M A; Zimmer, E A

    1996-01-01

    The two internal transcribed spacers (ITS1 and ITS2) of nuclear ribosomal DNA have become commonly exploited sources of informative variation for interspecific-/intergeneric-level phylogenetic analyses among angiosperms and other eukaryotes. We present an alignment in which one-third to one-half of the ITS2 sequence is alignable above the family level in angiosperms and a phenetic analysis showing that ITS2 contains information sufficient to diagnose lineages at several hierarchical levels. Base compositional analysis shows that angiosperm ITS2 is inherently GC-rich, and that the proportion of T is much more variable than that for other bases. We propose a general model of angiosperm ITS2 secondary structure that shows common pairing relationships for most of the conserved sequence tracts. Variations in our secondary structure predictions for sequences from different taxa indicate that compensatory mutation is not limited to paired positions. PMID:8760866

  20. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  1. In Vivo Enhancer Analysis Chromosome 16 Conserved NoncodingSequences

    SciTech Connect

    Pennacchio, Len A.; Ahituv, Nadav; Moses, Alan M.; Nobrega,Marcelo; Prabhakar, Shyam; Shoukry, Malak; Minovitsky, Simon; Visel,Axel; Dubchak, Inna; Holt, Amy; Lewis, Keith D.; Plajzer-Frick, Ingrid; Akiyama, Jennifer; De Val, Sarah; Afzal, Veena; Black, Brian L.; Couronne, Olivier; Eisen, Michael B.; Rubin, Edward M.

    2006-02-01

    The identification of enhancers with predicted specificitiesin vertebrate genomes remains a significant challenge that is hampered bya lack of experimentally validated training sets. In this study, weleveraged extreme evolutionary sequence conservation as a filter toidentify putative gene regulatory elements and characterized the in vivoenhancer activity of human-fish conserved and ultraconserved1 noncodingelements on human chromosome 16 as well as such elements from elsewherein the genome. We initially tested 165 of these extremely conservedsequences in a transgenic mouse enhancer assay and observed that 48percent (79/165) functioned reproducibly as tissue-specific enhancers ofgene expression at embryonic day 11.5. While driving expression in abroad range of anatomical structures in the embryo, the majority of the79 enhancers drove expression in various regions of the developingnervous system. Studying a set of DNA elements that specifically droveforebrain expression, we identified DNA signatures specifically enrichedin these elements and used these parameters to rank all ~;3,400human-fugu conserved noncoding elements in the human genome. The testingof the top predictions in transgenic mice resulted in a three-foldenrichment for sequences with forebrain enhancer activity. These datadramatically expand the catalogue of in vivo-characterized human geneenhancers and illustrate the future utility of such training sets for avariety of iological applications including decoding the regulatoryvocabulary of the human genome.

  2. Conservation of sequence and function in fertilization of the cortical granule serine protease in echinoderms

    PubMed Central

    Oulhen, Nathalie; Xu, Dongdong; Wessel, Gary M.

    2014-01-01

    Conservation of the cortical granule serine protease during fertilization in echinoderms was tested both functionally in sea stars, and computationally throughout the echinoderm phylum. We find that the inhibitor of serine protease (soybean trypsin inhibitor) effectively blocks proper transition of the sea star fertilization envelope into a protective sperm repellent, whereas inhibitors of the other main types of proteases had no effect. Scanning the transcriptomes of 15 different echinoderm ovaries revealed sequences of high conservation to the originally identified sea urchin cortical serine protease, CGSP1. These conserved sequences contained the catalytic triad necessary for enzymatic activity, and the tandemly repeated LDLr-like repeats. We conclude that the protease involved in the slow block to polyspermy is an essential and conserved element of fertilization in echinoderms, and may provide an important reagent for identification and testing of the cell surface proteins in eggs necessary for sperm binding. PMID:24878526

  3. Conservation of sequence and function in fertilization of the cortical granule serine protease in echinoderms.

    PubMed

    Oulhen, Nathalie; Xu, Dongdong; Wessel, Gary M

    2014-08-01

    Conservation of the cortical granule serine protease during fertilization in echinoderms was tested both functionally in sea stars, and computationally throughout the echinoderm phylum. We find that the inhibitor of serine protease (soybean trypsin inhibitor) effectively blocks proper transition of the sea star fertilization envelope into a protective sperm repellent, whereas inhibitors of the other main types of proteases had no effect. Scanning the transcriptomes of 15 different echinoderm ovaries revealed sequences of high conservation to the originally identified sea urchin cortical serine protease, CGSP1. These conserved sequences contained the catalytic triad necessary for enzymatic activity, and the tandemly repeated LDLr-like repeats. We conclude that the protease involved in the slow block to polyspermy is an essential and conserved element of fertilization in echinoderms, and may provide an important reagent for identification and testing of the cell surface proteins in eggs necessary for sperm binding.

  4. 77 FR 74167 - Information Collection Request: Highly Erodible Land Conservation and Wetland Conservation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-12-13

    ... Farm Service Agency Information Collection Request: Highly Erodible Land Conservation and Wetland Conservation AGENCIES: Farm Service Agency, USDA. ACTION: Notice; request for comments. SUMMARY: In accordance... associated with Highly Erodible Land Conservation and Wetland Conservation certification requirements....

  5. Genetic mapping of legume orthologs reveals high conservation of synteny between lentil species and the sequenced genomes of Medicago and chickpea

    PubMed Central

    Gujaria-Verma, Neha; Vail, Sally L.; Carrasquilla-Garcia, Noelia; Penmetsa, R. Varma; Cook, Douglas R.; Farmer, Andrew D.; Vandenberg, Albert; Bett, Kirstin E.

    2014-01-01

    Lentil (Lens culinaris Medik.) is a global food crop with increasing importance for food security in south Asia and other regions. Lens ervoides, a wild relative of cultivated lentil, is an important source of agronomic trait variation. Lens is a member of the galegoid clade of the Papilionoideae family, which includes other important dietary legumes such as chickpea (Cicer arietinum) and pea (Pisum sativum), and the sequenced model legume Medicago truncatula. Understanding the genetic structure of Lens spp. in relation to more fully sequenced legumes would allow leveraging of genomic resources. A set of 1107 TOG-based amplicons were identified in L. ervoides and a subset thereof used to design SNP markers for mapping. A map of L. ervoides consisting of 377 SNP markers spread across seven linkage groups was developed using a GoldenGate genotyping array and single SNP marker assays. Comparison with maps of M. truncatula and L. culinaris documented considerable shared synteny and led to the identification of a few major translocations and a major inversion that distinguish Lens from M. truncatula, as well as a translocation that distinguishes L. culinaris from L. ervoides. The identification of chromosome-level differences among Lens spp. will aid in the understanding of introgression of genes from L. ervoides into cultivated L. culinaris, furthering genetic research and breeding applications in lentil. PMID:25538716

  6. Genome-Wide Identification and Comparative Analysis of Conserved and Novel MicroRNAs in Grafted Watermelon by High-Throughput Sequencing

    PubMed Central

    Liu, Na; Yang, Jinghua; Guo, Shaogui; Xu, Yong; Zhang, Mingfang

    2013-01-01

    MicroRNAs (miRNAs) are a class of endogenous small non-coding RNAs involved in the post-transcriptional gene regulation and play a critical role in plant growth, development and stresses response. However less is known about miRNAs involvement in grafting behaviors, especially with the watermelon (Citrullus lanatus L.) crop, which is one of the most important agricultural crops worldwide. Grafting method is commonly used in watermelon production in attempts to improve its adaptation to abiotic and biotic stresses, in particular to the soil-borne fusarium wilt disease. In this study, Solexa sequencing has been used to discover small RNA populations and compare miRNAs on genome-wide scale in watermelon grafting system. A total of 11,458,476, 11,614,094 and 9,339,089 raw reads representing 2,957,751, 2,880,328 and 2,964,990 unique sequences were obtained from the scions of self-grafted watermelon and watermelon grafted on-to bottle gourd and squash at two true-leaf stage, respectively. 39 known miRNAs belonging to 30 miRNA families and 80 novel miRNAs were identified in our small RNA dataset. Compared with self-grafted watermelon, 20 (5 known miRNA families and 15 novel miRNAs) and 47 (17 known miRNA families and 30 novel miRNAs) miRNAs were expressed significantly different in watermelon grafted on to bottle gourd and squash, respectively. MiRNAs expressed differentially when watermelon was grafted onto different rootstocks, suggesting that miRNAs might play an important role in diverse biological and metabolic processes in watermelon and grafting may possibly by changing miRNAs expressions to regulate plant growth and development as well as adaptation to stresses. The small RNA transcriptomes obtained in this study provided insights into molecular aspects of miRNA-mediated regulation in grafted watermelon. Obviously, this result would provide a basis for further unravelling the mechanism on how miRNAs information is exchanged between scion and rootstock in grafted

  7. AN OUTLINE FOR TEACHING CONSERVATION HIGH SCHOOLS.

    ERIC Educational Resources Information Center

    Department of Agriculture, Washington, DC.

    THIS OUTLINE HAS BEEN ORGANIZED IN A FORM WHICH PERMITS THE TEACHING OF CONSERVATION TO THE GREATEST NUMBER OF STUDENTS, BY INTERWEAVING THE SUBJECT WITH THE PHYSICAL AND SOCIAL SCIENCES COMMONLY TAUGHT IN HIGH SCHOOLS. THE CONSERVATION OF NATURAL RESOURCES IS AN INTEGRAL PART OF THESE SCIENCES AND BECOMES MORE MEANINGFUL TO STUDENTS WHEN THE…

  8. Plasmodium vivax Cell Traversal Protein for Ookinetes and Sporozoites (PvCelTOS) gene sequence and potential epitopes are highly conserved among isolates from different regions of Brazilian Amazon

    PubMed Central

    Bitencourt Chaves, Lana; Perce-da-Silva, Daiana de Souza; Rodrigues-da-Silva, Rodrigo Nunes; Martins da Silva, João Hermínio; Cassiano, Gustavo Capatti; Machado, Ricardo Luiz Dantas; Pratt-Riccio, Lilian Rose; Banic, Dalma Maria

    2017-01-01

    The Plasmodium vivax Cell-traversal protein for ookinetes and sporozoites (PvCelTOS) plays an important role in the traversal of host cells. Although essential to PvCelTOS progress as a vaccine candidate, its genetic diversity remains uncharted. Therefore, we investigated the PvCelTOS genetic polymorphism in 119 field isolates from five different regions of Brazilian Amazon (Manaus, Novo Repartimento, Porto Velho, Plácido de Castro and Oiapoque). Moreover, we also evaluated the potential impact of non-synonymous mutations found in the predicted structure and epitopes of PvCelTOS. The field isolates showed high similarity (99.3% of bp) with the reference Sal-1 strain, presenting only four Single-Nucleotide Polymorphisms (SNP) at positions 24A, 28A, 109A and 352C. The frequency of synonymous C109A (82%) was higher than all others (p<0.0001). However, the non-synonymous G28A and G352C were observed in 9.2% and 11.7% isolates. The great majority of the isolates (79.8%) revealed complete amino acid sequence homology with Sal-1, 10.9% presented complete homology with Brazil I and two undescribed PvCelTOS sequences were observed in 9.2% field isolates. Concerning the prediction analysis, the N-terminal substitution (Gly10Ser) was predicted to be within a B-cell epitope (PvCelTOS Accession Nos. AB194053.1) and exposed at the protein surface, while the Val118Leu substitution was not a predicted epitope. Therefore, our data suggest that although G28A SNP might interfere in potential B-cell epitopes at PvCelTOS N-terminal region the gene sequence is highly conserved among the isolates from different geographic regions, which is an important feature to be taken into account when evaluating its potential as a vaccine candidate. PMID:28158176

  9. The identification of conserved interactions within the SH3 domain by alignment of sequences and structures.

    PubMed Central

    Larson, S. M.; Davidson, A. R.

    2000-01-01

    The SH3 domain, comprised of approximately 60 residues, is found within a wide variety of proteins, and is a mediator of protein-protein interactions. Due to the large number of SH3 domain sequences and structures in the databases, this domain provides one of the best available systems for the examination of sequence and structural conservation within a protein family. In this study, a large and diverse alignment of SH3 domain sequences was constructed, and the pattern of conservation within this alignment was compared to conserved structural features, as deduced from analysis of eighteen different SH3 domain structures. Seventeen SH3 domain structures solved in the presence of bound peptide were also examined to identify positions that are consistently most important in mediating the peptide-binding function of this domain. Although residues at the two most conserved positions in the alignment are directly involved in peptide binding, residues at most other conserved positions play structural roles, such as stabilizing turns or comprising the hydrophobic core. Surprisingly, several highly conserved side-chain to main-chain hydrogen bonds were observed in the functionally crucial RT-Src loop between residues with little direct involvement in peptide binding. These hydrogen bonds may be important for maintaining this region in the precise conformation necessary for specific peptide recognition. In addition, a previously unrecognized yet highly conserved beta-bulge was identified in the second beta-strand of the domain, which appears to provide a necessary kink in this strand, allowing it to hydrogen bond to both sheets comprising the fold. PMID:11152127

  10. Significance of satellite DNA revealed by conservation of a widespread repeat DNA sequence among angiosperms.

    PubMed

    Mehrotra, Shweta; Goel, Shailendra; Raina, Soom Nath; Rajpal, Vijay Rani

    2014-08-01

    The analysis of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of plant nuclear DNA. In the present study, we analyzed the nature of pCtKpnI-I and pCtKpnI-II tandem repeated sequences, reported earlier in Carthamus tinctorius. Interestingly, homolog of pCtKpnI-I repeat sequence was also found to be present in widely divergent families of angiosperms. pCtKpnI-I showed high sequence similarity but low copy number among various taxa of different families of angiosperms analyzed. In comparison, pCtKpnI-II was specific to the genus Carthamus and was not present in any other taxa analyzed. The molecular structure of pCtKpnI-I was analyzed in various unrelated taxa of angiosperms to decipher the evolutionary conserved nature of the sequence and its possible functional role.

  11. Studying RNA homology and conservation with Infernal: from single sequences to RNA families

    PubMed Central

    Barquist, Lars; Burge, Sarah W.; Gardner, Paul P.

    2016-01-01

    Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remains difficult. This protocol introduces methods developed by the Rfam database for identifying “families” of homologous ncRNAs starting from single “seed” sequences using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs, then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. PMID:27322404

  12. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families.

    PubMed

    Barquist, Lars; Burge, Sarah W; Gardner, Paul P

    2016-06-20

    Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.

  13. Sense-antisense gene pairs: sequence, transcription, and structure are not conserved between human and mouse

    PubMed Central

    Wood, Emily J.; Chin-Inmanu, Kwanrutai; Jia, Hui; Lipovich, Leonard

    2013-01-01

    Previous efforts to characterize conservation between the human and mouse genomes focused largely on sequence comparisons. These studies are inherently limited because they don't account for gene structure differences, which may exist despite genomic sequence conservation. Recent high-throughput transcriptome studies have revealed widespread and extensive overlaps between genes, and transcripts, encoded on both strands of the genomic sequence. This overlapping gene organization, which produces sense-antisense (SAS) gene pairs, is capable of effecting regulatory cascades through established mechanisms. We present an evolutionary conservation assessment of SAS pairs, on three levels: genomic, transcriptomic, and structural. From a genome-wide dataset of human SAS pairs, we first identified orthologous loci in the mouse genome, then assessed their transcription in the mouse, and finally compared the genomic structures of SAS pairs expressed in both species. We found that approximately half of human SAS loci have single orthologous locations in the mouse genome; however, only half of those orthologous locations have SAS transcriptional activity in the mouse. This suggests that high human-mouse gene conservation overlooks widespread distinctions in SAS pair incidence and expression. We compared gene structures at orthologous SAS loci, finding frequent differences in gene structure between human and orthologous mouse SAS pair members. Our categorization of human SAS pairs with respect to mouse conservation of expression as well as structure points to limitations of mouse models. Gene structure differences, including at SAS loci, may account for some of the phenotypic distinctions between primates and rodents. Genes in non-conserved SAS pairs may contribute to evolutionary lineage-specific regulatory outcomes. PMID:24133500

  14. A highly conserved pericentromeric domain in human and gorilla chromosomes.

    PubMed

    Pita, M; Gosálvez, J; Gosálvez, A; Nieddu, M; López-Fernández, C; Mezzanotte, R

    2009-01-01

    Significant similarity between human and gorilla genomes has been found in all chromosome arms, but not in centromeres, using whole-comparative genomic hybridization (W-CGH). In human chromosomes, centromeric regions, generally containing highly repetitive DNAs, are characterized by the presence of specific human DNA sequences and an absence of homology with gorilla DNA sequences. The only exception is the pericentromeric area of human chromosome 9, which, in addition to a large block of human DNA, also contains a region of homology with gorilla DNA sequences; the localization of these sequences coincides with that of human satellite III. Since highly repetitive DNAs are known for their high mutation frequency, we hypothesized that the chromosome 9 pericentromeric DNA conserved in human chromosomes and deriving from the gorilla genome may thus play some important functional role.

  15. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation.

    PubMed

    Zhou, Jia; Sears, Renee L; Xing, Xiaoyun; Zhang, Bo; Li, Daofeng; Rockweiler, Nicole B; Jang, Hyo Sik; Choudhary, Mayank N K; Lee, Hyung Joo; Lowdon, Rebecca F; Arand, Jason; Tabers, Brianne; Gu, C Charles; Cicero, Theodore J; Wang, Ting

    2017-09-12

    Uncovering mechanisms of epigenome evolution is an essential step towards understanding the evolution of different cellular phenotypes. While studies have confirmed DNA methylation as a conserved epigenetic mechanism in mammalian development, little is known about the conservation of tissue-specific genome-wide DNA methylation patterns. Using a comparative epigenomics approach, we identified and compared the tissue-specific DNA methylation patterns of rat against those of mouse and human across three shared tissue types. We confirmed that tissue-specific differentially methylated regions are strongly associated with tissue-specific regulatory elements. Comparisons between species revealed that at a minimum 11-37% of tissue-specific DNA methylation patterns are conserved, a phenomenon that we define as epigenetic conservation. Conserved DNA methylation is accompanied by conservation of other epigenetic marks including histone modifications. Although a significant amount of locus-specific methylation is epigenetically conserved, the majority of tissue-specific DNA methylation is not conserved across the species and tissue types that we investigated. Examination of the genetic underpinning of epigenetic conservation suggests that primary sequence conservation is a driving force behind epigenetic conservation. In contrast, evolutionary dynamics of tissue-specific DNA methylation are best explained by the maintenance or turnover of binding sites for important transcription factors. Our study extends the limited literature of comparative epigenomics and suggests a new paradigm for epigenetic conservation without genetic conservation through analysis of transcription factor binding sites.

  16. Conservation among HSP60 sequences in relation to structure, function, and evolution.

    PubMed Central

    Brocchieri, L.; Karlin, S.

    2000-01-01

    The chaperonin HSP60 (GroEL) proteins are essential in eubacterial genomes and in eukaryotic organelles. Functional regions inferred from mutation studies and the Escherichia coli GroEL 3D crystal complexes are evaluated in a multiple alignment across 43 diverse HSP60 sequences, centering on ATP/ADP and Mg2+ binding sites, on residues interacting with substrate, on GroES contact positions, on interface regions between monomers and domains, and on residues important in allosteric conformational changes. The most evolutionary conserved residues relate to the ATP/ADP and Mg2+ binding sites. Hydrophobic residues that contribute in substrate binding are also significantly conserved. A large number of charged residues line the central cavity of the GroEL-GroES complex in the substrate-releasing conformation. These span statistically significant intra- and inter-monomer three-dimensional (3D) charge clusters that are highly conserved among sequences and presumably play an important role interacting with the substrate. Unaligned short segments between blocks of alignment are generally exposed at the outside wall of the Anfinsen cage complex. The multiple alignment reveals regions of divergence common to specific evolutionary groups. For example, rickettsial sequences diverge in the ATP/ADP binding domain and gram-positive sequences diverge in the allosteric transition domain. The evolutionary information of the multiple alignment proffers attractive sites for mutational studies. PMID:10752609

  17. Energy Conservation Featured in Illinois High School

    ERIC Educational Resources Information Center

    Modern Schools, 1976

    1976-01-01

    The William Fremd High School in Palatine, Illinois, scheduled to open in 1977, is being built with energy conservation uppermost in mind. In this system, 70 heat pumps will heat and cool 300,000 square feet of educational facilities. (Author/MLF)

  18. Sequence-related human proteins cluster by degree of evolutionary conservation

    NASA Astrophysics Data System (ADS)

    Mrowka, Ralf; Patzak, Andreas; Herzel, Hanspeter; Holste, Dirk

    2004-11-01

    Gene duplication followed by adaptive evolution is thought to be a central mechanism for the emergence of novel genes. To illuminate the contribution of duplicated protein-coding sequences to the complexity of the human genome, we study the connectivity of pairwise sequence-related human proteins and construct a network (N) of linked protein sequences with shared similarities. We find that (i) the connectivity distribution P(k) for k sequence-related proteins decays as a power law P(k)˜k-γ with γ≈1.2 , (ii) the top rank of N consists of a single large cluster of proteins (≈70%) , while bottom ranks consist of multiple isolated clusters, and (iii) structural characteristics of N show both a high degree of clustering and an intermediate connectivity (“small-world” features). We gain further insight into structural properties of N by studying the relationship between the connectivity distribution and the phylogenetic conservation of proteins in bacteria, plants, invertebrates, and vertebrates. We find that (iv) the proportion of sequence-related proteins increases with increasing extent of evolutionary conservation. Our results support that small-world network properties constitute a footprint of an evolutionary mechanism and extend the traditional interpretation of protein families.

  19. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  20. Discovering conserved insect microRNAs from expressed sequence tags.

    PubMed

    Jia, Qidong; Lin, Kejian; Liang, Jingdong; Yu, Lun; Li, Fei

    2010-12-01

    MicroRNAs (miRNA) participate in regulating diverse biological pathways by translational repression in animals. They have attracted increasing attention recently. However, little work has been done on the miRNA genes in agriculturally important pests. Because the transcripts of most miRNA genes are the products of type-II RNA polymerase, pri-miRNA has a poly(A) tail and appears in expressed sequence tags (EST). We developed a computational pipeline to identify miRNA genes from insect ESTs. First, 980,697 ESTs from 63 insects were collected and used to search the nr database. The ESTs which did not share significant similarities with any known protein-coding genes were treated as non-coding ESTs. Next, known mature miRNAs were used to align with non-coding ESTs. The ESTs which contain the sequence of mature miRNA were treated as candidate ESTs. Finally, putative precursors were extracted flanking the mature miRNA region in candidate ESTs and evaluated by the Triplet-SVM algorithm. As a result, 86 miRNAs from 30 insect species were found based on a strict criterion while 330 miRNAs from 51 species were found based on a loose criterion. Evolution analysis indicated that mir-467, mir-297 and mir-466 were the highest conserved miRNA families in insects. To confirm the reliability of putative insect miRNAs, the expression profile of nine predicted miRNAs in Locusta migratoria was investigated. Eight miRNAs were successfully detected by RT-PCR. Most miRNAs were expressed ubiquitously at all examined tissues and developmental stages whereas Lmi-mir-509 was specifically expressed in the thorax of the 2nd, 4th and 5th instars and adult locust. In all, our work reported an efficient computational strategy for predicting miRNA genes from insect ESTs and presented tens of miRNAs in diverse insect species which are expected to participate in many important physiological processes.

  1. High resolution schemes for hyperbolic conservation laws

    NASA Technical Reports Server (NTRS)

    Harten, A.

    1983-01-01

    A class of new explicit second order accurate finite difference schemes for the computation of weak solutions of hyperbolic conservation laws is presented. These highly nonlinear schemes are obtained by applying a nonoscillatory first order accurate scheme to an appropriately modified flux function. The so-derived second order accurate schemes achieve high resolution while preserving the robustness of the original nonoscillatory first order accurate scheme. Numerical experiments are presented to demonstrate the performance of these new schemes.

  2. High-bay Lighting Energy Conservation Measures

    SciTech Connect

    Ian Metzger, Jesse Dean

    2010-12-31

    This software requires inputs of simple high-bay lighting system inventory information and calculates the energy and cost benefits of various retrofit opportunities. This tool includes energy conservation measures for: 1000 Watt to 750 Watt High-pressure Sodium lighting retrofit, 400 Watt to 360 Watt High Pressure Sodium lighting retrofit, High Intensity Discharge to T5 lighting retrofit, High Intensity Discharge to T8 lighting retrofit, and Daylighting. This tool calculates energy savings, demand reduction, cost savings, building life cycle costs including: simple payback, discounted payback, net-present value, and savings to investment ratio. In addition this tool also displays the environmental benefits of a project.

  3. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

    PubMed Central

    Tatusov, R L; Altschul, S F; Koonin, E V

    1994-01-01

    We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance. Images PMID:7991589

  4. Divergence of conserved non-coding sequences: rate estimates and relative rate tests.

    PubMed

    Wagner, Günter P; Fried, Claudia; Prohaska, Sonja J; Stadler, Peter F

    2004-11-01

    In many eukaryotic genomes only a small fraction of the DNA codes for proteins, but the non-protein coding DNA harbors important genetic elements directing the development and the physiology of the organisms, like promoters, enhancers, insulators, and micro-RNA genes. The molecular evolution of these genetic elements is difficult to study because their functional significance is hard to deduce from sequence information alone. Here we propose an approach to the study of the rate of evolution of functional non-coding sequences at a macro-evolutionary scale. We identify functionally important non-coding sequences as Conserved Non-Coding Nucleotide (CNCN) sequences from the comparison of two outgroup species. The CNCN sequences so identified are then compared to their homologous sequences in a pair of ingroup species, and we monitor the degree of modification these sequences suffered in the two ingroup lineages. We propose a method to test for rate differences in the modification of CNCN sequences among the two ingroup lineages, as well as a method to estimate their rate of modification. We apply this method to the full sequences of the HoxA clusters from six gnathostome species: a shark, Heterodontus francisci; a basal ray finned fish, Polypterus senegalus; the amphibian, Xenopus tropicalis; as well as three mammalian species, human, rat and mouse. The results show that the evolutionary rate of CNCN sequences is not distinguishable among the three mammalian lineages, while the Xenopus lineage has a significantly increased rate of evolution. Furthermore the estimates of the rate parameters suggest that in the stem lineage of mammals the rate of CNCN sequence evolution was more than twice the rate observed within the placental amniotes clade, suggesting a high rate of evolution of cis-regulatory elements during the origin of amniotes and mammals. We conclude that the proposed methods can be used for testing hypotheses about the rate and pattern of evolution of putative

  5. Identification of conserved and novel microRNAs in Aquilaria sinensis based on small RNA sequencing and transcriptome sequence data.

    PubMed

    Gao, Zhi-Hui; Wei, Jian-He; Yang, Yun; Zhang, Zheng; Xiong, Huan-Ying; Zhao, Wen-Ting

    2012-08-15

    Agarwood is in great demand for its high value in medicine, incense, and perfume across Asia, Middle East, and Europe. As agarwood is formed only when the Aquilaria trees are wounded or infected by some microbes, overharvesting and habitat loss are threatening some populations of agarwood-producing species. Aquilaria sinensis is such a significant economic tree species. To promote the production efficiency and protect the resource of A. sinensis, it would be critical to reveal the regulation mechanisms of stress-induced agarwood formation. MicroRNAs (miRNAs), a key gene expression regulator involved in various plant stress response and metabolic processes, might function in agarwood formation, but no report concerning miRNAs in Aquilaria is available. In this study, the small RNA high-throughput sequencing and 454 transcriptome data were adopted to identify both conserved and novel miRNAs in A. sinensis. Deep sequencing showed that the small RNA (sRNA) population of A. sinensis was complex and the length of sRNAs varied. By in silico analysis of the small RNA deep sequencing data and transcriptome data, we discovered 27 novel miRNAs in A. sinensis. Based on the mature miRNA sequence conservation, we identified 74 putative conserved miRNAs from A. sinensis and 10 of them were confirmed with hairpin forming precursor. Interestingly, a novel miRNA sequence was determined to be the miRNA of asi-miR408, but with accumulation much higher than asi-miR408. The expression levels of ten stress-responsive miRNAs were examined during the time-course after wound treatment. Eight were shown to be wound-responsive. This not only shows the existence of miRNAs in this Asian economically significant tree species but also indicated its critical role in stress-induced agarwood formation. The highly accumulated miRNA of asi-miR408 implied miRNAs would be functional as well as miRNAs in plants.

  6. Deletion of conserved sequences in IG-DMR at Dlk1-Gtl2 locus suggests their involvement in expression of paternally expressed genes in mice.

    PubMed

    Saito, Takeshi; Hara, Satoshi; Tamano, Moe; Asahara, Hiroshi; Takada, Shuji

    2017-02-16

    Expression regulation of the Dlk1-Dio3 imprinted domain by the intergenic differentially methylated region (IG-DMR) is essential for normal embryonic development in mammals. In this study, we investigated conserved IG-DMR genomic sequences in eutherians to elucidate their role in genomic imprinting of the Dlk1-Dio3 domain. Using a comparative genomics approach, we identified three highly conserved sequences in IG-DMR. To elucidate the functions of these sequences in vivo, we generated mutant mice lacking each of the identified highly conserved sequences using the CRISPR/Cas9 system. Although mutant mice did not exhibit the gross phenotype, deletions of the conserved sequences altered the expression levels of paternally expressed imprinted genes in the mutant embryos without skewing imprinting status. These results suggest that the conserved sequences in IG-DMR are involved in the expression regulation of some of the imprinted genes in the Dlk1-Dio3 domain.

  7. Deletion of conserved sequences in IG-DMR at Dlk1-Gtl2 locus suggests their involvement in expression of paternally expressed genes in mice

    PubMed Central

    SAITO, Takeshi; HARA, Satoshi; TAMANO, Moe; ASAHARA, Hiroshi; TAKADA, Shuji

    2016-01-01

    Expression regulation of the Dlk1-Dio3 imprinted domain by the intergenic differentially methylated region (IG-DMR) is essential for normal embryonic development in mammals. In this study, we investigated conserved IG-DMR genomic sequences in eutherians to elucidate their role in genomic imprinting of the Dlk1-Dio3 domain. Using a comparative genomics approach, we identified three highly conserved sequences in IG-DMR. To elucidate the functions of these sequences in vivo, we generated mutant mice lacking each of the identified highly conserved sequences using the CRISPR/Cas9 system. Although mutant mice did not exhibit the gross phenotype, deletions of the conserved sequences altered the expression levels of paternally expressed imprinted genes in the mutant embryos without skewing imprinting status. These results suggest that the conserved sequences in IG-DMR are involved in the expression regulation of some of the imprinted genes in the Dlk1-Dio3 domain. PMID:27904015

  8. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison.

    PubMed

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-07-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features.

  9. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  10. Forest conservation delivers highly variable coral reef conservation outcomes.

    PubMed

    Klein, Carissa J; Jupiter, Stacy D; Selig, Elizabeth R; Watts, Matthew E; Halpern, Benjamin S; Kamal, Muhammad; Roelfsema, Chris; Possingham, Hugh P

    2012-06-01

    Coral reefs are threatened by human activities on both the land (e.g., deforestation) and the sea (e.g., overfishing). Most conservation planning for coral reefs focuses on removing threats in the sea, neglecting management actions on the land. A more integrated approach to coral reef conservation, inclusive of land-sea connections, requires an understanding of how and where terrestrial conservation actions influence reefs. We address this by developing a land-sea planning approach to inform fine-scale spatial management decisions and test it in Fiji. Our aim is to determine where the protection of forest can deliver the greatest return on investment for coral reef ecosystems. To assess the benefits of conservation to coral reefs, we estimate their relative condition as influenced by watershed-based pollution and fishing. We calculate the cost-effectiveness of protecting forest and find that investments deliver rapidly diminishing returns for improvements to relative reef condition. For example, protecting 2% of forest in one area is almost 500 times more beneficial than protecting 2% in another area, making prioritization essential. For the scenarios evaluated, relative coral reef condition could be improved by 8-58% if all remnant forest in Fiji were protected rather than deforested. Finally, we determine the priority of each coral reef for implementing a marine protected area when all remnant forest is protected for conservation. The general results will support decisions made by the Fiji Protected Area Committee as they establish a national protected area network that aims to protect 20% of the land and 30% of the inshore waters by 2020. Although challenges remain, we can inform conservation decisions around the globe by tackling the complex issues relevant to integrated land-sea planning.

  11. Sequence Conservation, Radial Distance and Packing Density in Spherical Viral Capsids

    PubMed Central

    Lee, Chi-Wen; Huang, Tsun-Tsao; Shih, Chung-Shiuan; Hwang, Jenn-Kang

    2015-01-01

    The conservation level of a residue is a useful measure about the importance of that residue in protein structure and function. Much information about sequence conservation comes from aligning homologous sequences. Profiles showing the variation of the conservation level along the sequence are usually interpreted in evolutionary terms and dictated by site similarities of a proper set of homologous sequences. Here, we report that, of the viral icosahedral capsids, the sequence conservation profile can be determined by variations in the distances between residues and the centroid of the capsid – with a direct inverse proportionality between the conservation level and the centroid distance – as well as by the spatial variations in local packing density. Examining both the centroid and the packing density models against a dataset of 51 crystal structures of nonhomologous icosahedral capsids, we found that many global patterns and minor features derived from the viral structures are consistent with those present in the sequence conservation profiles. The quantitative link between the level of conservation and structural features like centroid-distance or packing density allows us to look at residue conservation from a structural viewpoint as well as from an evolutionary viewpoint. PMID:26132081

  12. High Throughput Sequencing: An Overview of Sequencing Chemistry.

    PubMed

    Ambardar, Sheetal; Gupta, Rikita; Trakroo, Deepika; Lal, Rup; Vakhlu, Jyoti

    2016-12-01

    In the present century sequencing is to the DNA science, what gel electrophoresis was to it in the last century. From 1977 to 2016 three generation of the sequencing technologies of various types have been developed. Second and third generation sequencing technologies referred commonly to as next generation sequencing technology, has evolved significantly with increase in sequencing speed, decrease in sequencing cost, since its inception in 2004. GS FLX by 454 Life Sciences/Roche diagnostics, Genome Analyzer, HiSeq, MiSeq and NextSeq by Illumina, Inc., SOLiD by ABI, Ion Torrent by Life Technologies are various type of the sequencing platforms available for second generation sequencing. The platforms available for the third generation sequencing are Helicos™ Genetic Analysis System by SeqLL, LLC, SMRT Sequencing by Pacific Biosciences, Nanopore sequencing by Oxford Nanopore's, Complete Genomics by Beijing Genomics Institute and GnuBIO by BioRad, to name few. The present article is an overview of the principle and the sequencing chemistry of these high throughput sequencing technologies along with brief comparison of various types of sequencing platforms available.

  13. Polyclonal antibody against conserved sequences of mce1A protein blocks MTB infection in macrophages.

    PubMed

    Sivagnanam, Sasikala; Namasivayam, Nalini; Chellam, Rajamanickam

    2012-03-01

    The pathogenesis of Mycobacterium tuberculosis is largely due to its ability to enter and survive within human macrophages. It is suggested that a specific protein namely mammalian cell entry protein is involved in the pathogenesis and the specific gene for this protein mce1A has been identified in several pathogenic organisms such as Rickettsia, Shigella, Escherichia coli, Helicobacter, Streptomyces, Klebsiella, Vibrio, Neisseria, Rhodococcus, Nocardioides, Saccharopolyspora erthyrae, and Pseudomonas. Analysis of mce1 operons in the above mentioned organisms through bioinformatics tools has revealed the presence of unique sequences (conserved regions) suggesting that these sequences may be involved in the process of infection. Presently, the mce1A full-length (1,365 bp) region from Mycobacterium bovis and its conserved regions (303 bp) were cloned in to an expression vector and the purified expressed proteins of molecular weight ~47 and ~11 kDa, respectively, were injected to rabbits to raise the polyclonal antibodies. The purified polyclonal antibodies were checked for their ability to inhibit the Mycobacterium infection in cultured human macrophages. In macrophage invasion assay, when antibody added at high concentration, decrease in viable counts was observed in all cell cultures within the first 5 days after infection, where the intracellular bacterial CFU obtained from the infected MTB increased by the 3rd day at low concentration of antibody. The macrophage invasion assay has indicated that the purified antibodies of mce1A conserved region can inhibit the infection of Mycobacterium.

  14. Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

    PubMed Central

    2011-01-01

    Background Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the attention focused on the non-coding displacement ("D") loop. We used massively parallel multiplexed sequencing to sequence complete mitochondrial genomes from 40 fishers, a threatened carnivore that possesses low mitogenomic diversity. This allowed us to test a key assumption of conservation genetics, specifically, that the D-loop accurately reflects genealogical relationships and variation of the larger mitochondrial genome. Results Overall mitogenomic divergence in fishers is exceedingly low, with 66 segregating sites and an average pairwise distance between genomes of 0.00088 across their aligned length (16,290 bp). Estimates of variation and genealogical relationships from the displacement (D) loop region (299 bp) are contradicted by the complete mitochondrial genome, as well as the protein coding fraction of the mitochondrial genome. The sources of this contradiction trace primarily to the near-absence of mutations marking the D-loop region of one of the most divergent lineages, and secondarily to independent (recurrent) mutations at two nucleotide position in the D-loop amplicon. Conclusions Our study has two important implications. First, inferred genealogical reconstructions based on the fisher D-loop region contradict inferences based on the entire mitogenome to the point that the populations of greatest conservation concern cannot be accurately resolved. Whole-genome analysis identifies Californian haplotypes from the northern-most populations as highly distinctive, with a significant excess of amino acid changes that may be indicative of molecular adaptation; D-loop sequences fail

  15. Inference of transcriptional networks in Arabidopsis through conserved noncoding sequence analysis.

    PubMed

    Van de Velde, Jan; Heyndrickx, Ken S; Vandepoele, Klaas

    2014-07-01

    Transcriptional regulation plays an important role in establishing gene expression profiles during development or in response to (a)biotic stimuli. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity, and the identification of individual TFBS in genome sequences is a major goal to inferring regulatory networks. We have developed a phylogenetic footprinting approach for the identification of conserved noncoding sequences (CNSs) across 12 dicot plants. Whereas both alignment and non-alignment-based techniques were applied to identify functional motifs in a multispecies context, our method accounts for incomplete motif conservation as well as high sequence divergence between related species. We identified 69,361 footprints associated with 17,895 genes. Through the integration of known TFBS obtained from the literature and experimental studies, we used the CNSs to compile a gene regulatory network in Arabidopsis thaliana containing 40,758 interactions, of which two-thirds act through binding events located in DNase I hypersensitive sites. This network shows significant enrichment toward in vivo targets of known regulators, and its overall quality was confirmed using five different biological validation metrics. Finally, through the integration of detailed expression and function information, we demonstrate how static CNSs can be converted into condition-dependent regulatory networks, offering opportunities for regulatory gene annotation.

  16. New insights into SRY regulation through identification of 5' conserved sequences

    PubMed Central

    Ross, Diana GF; Bowles, Josephine; Koopman, Peter; Lehnert, Sigrid

    2008-01-01

    Background SRY is the pivotal gene initiating male sex determination in most mammals, but how its expression is regulated is still not understood. In this study we derived novel SRY 5' flanking genomic sequence data from bovine and caprine genomic BAC clones. Results We identified four intervals of high homology upstream of SRY by comparison of human, bovine, pig, goat and mouse genomic sequences. These conserved regions contain putative binding sites for a large number of known transcription factor families, including several that have been implicated previously in sex determination and early gonadal development. Conclusion Our results reveal potentially important SRY regulatory elements, mutations in which might underlie cases of idiopathic human XY sex reversal. PMID:18851760

  17. Global geno-proteomic analysis reveals cross-continental sequence conservation and druggable sites among influenza virus polymerases.

    PubMed

    Babar, Mustafeez Mujtaba; Zaidi, Najam-us-Sahar Sadaf; Tahir, Muhammad

    2014-12-01

    Influenza virus is one of the major causes of mortality and morbidity associated with respiratory diseases. The high rate of mutation in the viral proteome provides it with the ability to survive in a variety of host species. This property helps it in maintaining and developing its pathogenicity, transmission and drug resistance. Alternate drug targets, particularly the internal proteins, can potentially be exploited for addressing the resistance issues. In the current analysis, the degree of conservation of influenza virus polymerases has been studied as one of the essential elements for establishing its candidature as a potential target of antiviral therapy. We analyzed more than 130,000 nucleotide and amino acid sequences by classifying them on the basis of continental presence of host organisms. Computational analyses including genetic polymorphism study, mutation pattern determination, molecular evolution and geophylogenetic analysis were performed to establish the high degree of conservation among the sequences. These studies lead to establishing the polymerases, in particular PB1, as highly conserved proteins. Moreover, we mapped the conservation percentage on the tertiary structures of proteins to identify the conserved, druggable sites. The research study, hence, revealed that the influenza virus polymerases are highly conserved (95-99%) proteins with a very slow mutation rate. Potential drug binding sites on various polymerases have also been reported. A scheme for drug target candidate development that can be employed to rapidly mutating proteins has been presented. Moreover, the research output can help in designing new therapeutic molecules against the identified targets.

  18. Conservation of repetitive DNA sequences in deer species studied by southern blot transfer.

    PubMed

    Lima-de-Faria, A; Arnason, U; Widegren, B; Essen-Möller, J; Isaksson, M; Olsson, E; Jaworska, H

    1984-01-01

    The Cervidae show one of the largest variations in chromosome number found within a mammalian family. The five species of the deer family which are the subject of this study vary in chromosome number from 2n = 70 to 2n = 6. Digestion with the restriction enzymes EcoRI, HpaII, HaeIII and MspI reveals that there is a series of highly repetitive sequences forming similar band patterns in the different species. To obtain information on the degree of homology among these conserved sequences we isolated a HpaII restriction fragment of approximately 990 base pairs from reindeer DNA. This DNA sequence was 32P-labelled and hybridized by the Southern blot technique to DNAs cleaved with HpaII and HaeIII from the reindeer and four other Cervidae species. Hybridization to specific restriction fragments was recorded in all species. The patterns of hybridization showed a higher degree of similarity between reindeer, elk and roe deer than between reindeer and the Asiatic species (fallow deer and muntjac). Homologies are still present between the highly repetitive sequences of the five species despite the drastic reorganization that led to a change in chromosome number from 6 to 70.

  19. DG-CST (Disease Gene Conserved Sequence Tags), a database of human–mouse conserved elements associated to disease genes

    PubMed Central

    Boccia, Angelo; Petrillo, Mauro; di Bernardo, Diego; Guffanti, Alessandro; Mignone, Flavio; Confalonieri, Stefano; Luzi, Lucilla; Pesole, Graziano; Paolella, Giovanni; Ballabio, Andrea; Banfi, Sandro

    2005-01-01

    The identification and study of evolutionarily conserved genomic sequences that surround disease-related genes is a valuable tool to gain insight into the functional role of these genes and to better elucidate the pathogenetic mechanisms of disease. We created the DG-CST (Disease Gene Conserved Sequence Tags) database for the identification and detailed annotation of human–mouse conserved genomic sequences that are localized within or in the vicinity of human disease-related genes. CSTs are defined as sequences that show at least 70% identity between human and mouse over a length of at least 100 bp. The database contains CST data relative to over 1088 genes responsible for monogenetic human genetic diseases or involved in the susceptibility to multifactorial/polygenic diseases. DG-CST is accessible via the internet at http://dgcst.ceinge.unina.it/ and may be searched using both simple and complex queries. A graphic browser allows direct visualization of the CSTs and related annotations within the context of the relative gene and its transcripts. PMID:15608249

  20. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    PubMed Central

    2012-01-01

    Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678

  1. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution.

    PubMed

    Purves, Joanne; Blades, Matthew; Arafat, Yasrab; Malik, Salman A; Bayliss, Christopher D; Morrissey, Julie A

    2012-09-28

    Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis.

  2. Conserved Plasmid Hydrogen-Uptake (hup)-Specific Sequences within Hup+Rhizobium leguminosarum Strains

    PubMed Central

    Leyva, Antonio; Palacios, José M.; Ruiz-Argüeso, Tomás

    1987-01-01

    Thirteen Rhizobium leguminosarum strains previously reported as H2-uptake hydrogenase positive (Hup+) or negative (Hup−) were analyzed for the presence and conservation of DNA sequences homologous to cloned Bradyrhizobium japonicum hup-specific DNA from cosmid pHU1 (M. A. Cantrell, R. A. Haugland, and H. J. Evans, Proc. Natl. Acad. Sci. USA 80:181-185, 1983). The Hup phenotype of these strains was reexamined by determining hydrogenase activity induced in bacteroids from pea nodules. Five strains, including H2 oxidation-ATP synthesis-coupled and -uncoupled strains, induced significant rates of H2-uptake hydrogenase activity and contained DNA sequences homologous to three probe DNA fragments (5.9-kilobase [kb] HindIII, 2.9-kb EcoRI, and 5.0-kb EcoRI) from pHU1. The pattern of genomic DNA HindIII and EcoRI fragments with significant homology to each of the three probes was identical in all five strains regardless of the H2-dependent ATP generation trait. The restriction fragments containing the homology totalled about 22 kb of DNA common to the five strains. In all instances the putative hup sequences were located on a plasmid that also contained nif genes. The molecular sizes of the identified hup-sym plasmids ranged between 184 and 212 megadaltons. No common DNA sequences homologous to B. japonicum hup DNA were found in genomic DNA from any of the eight remaining strains showing no significant hydrogenase activity in pea bacteroids. These results suggest that the identified DNA region contains genes essential for hydrogenase activity in R. leguminosarum and that its organization is highly conserved within Hup+ strains in this symbiotic species. Images PMID:16347471

  3. Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences

    PubMed Central

    Xu, Zhenjiang; Mathews, David H.

    2011-01-01

    Motivation: With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure–function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure. Results: A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences. Availability: The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu Contact: david_mathews@urmc.rochester.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21193521

  4. Conservation analysis predicts in vivo occupancy of glucocorticoid receptor-binding sequences at glucocorticoid-induced genes.

    PubMed

    So, Alex Yick-Lun; Cooper, Samantha B; Feldman, Brian J; Manuchehri, Mitra; Yamamoto, Keith R

    2008-04-15

    The glucocorticoid receptor (GR) interacts with specific GR-binding sequences (GBSs) at glucocorticoid response elements (GREs) to orchestrate transcriptional networks. Although the sequences of the GBSs are highly variable among different GREs, the precise sequence within an individual GRE is highly conserved. In this study, we examined whether sequence conservation of sites resembling GBSs is sufficient to predict GR occupancy of GREs at genes responsive to glucocorticoids. Indeed, we found that the level of conservation of these sites at genes up-regulated by glucocorticoids in mouse C3H10T1/2 mesenchymal stem-like cells correlated directly with the extent of occupancy by GR. In striking contrast, we failed to observe GR occupancy of GBSs at genes repressed by glucocorticoids, despite the occurrence of these sites at a frequency similar to that of the induced genes. Thus, GR occupancy of the GBS motif correlates with induction but not repression, and GBS conservation alone is sufficient to predict GR occupancy and GRE function at induced genes.

  5. Widespread position-specific conservation of synonymous rare codons within coding sequences

    PubMed Central

    Steele, Aaron; Carmichael, Rory; Rodriguez, Anabel; Specht, Alicia T.; Ngo, Kim; Emrich, Scott

    2017-01-01

    Synonymous rare codons are considered to be sub-optimal for gene expression because they are translated more slowly than common codons. Yet surprisingly, many protein coding sequences include large clusters of synonymous rare codons. Rare codons at the 5’ terminus of coding sequences have been shown to increase translational efficiency. Although a general functional role for synonymous rare codons farther within coding sequences has not yet been established, several recent reports have identified rare-to-common synonymous codon substitutions that impair folding of the encoded protein. Here we test the hypothesis that although the usage frequencies of synonymous codons change from organism to organism, codon rarity will be conserved at specific positions in a set of homologous coding sequences, for example to tune translation rate without altering a protein sequence. Such conservation of rarity–rather than specific codon identity–could coordinate co-translational folding of the encoded protein. We demonstrate that many rare codon cluster positions are indeed conserved within homologous coding sequences across diverse eukaryotic, bacterial, and archaeal species, suggesting they result from positive selection and have a functional role. Most conserved rare codon clusters occur within rather than between conserved protein domains, challenging the view that their primary function is to facilitate co-translational folding after synthesis of an autonomous structural unit. Instead, many conserved rare codon clusters separate smaller protein structural motifs within structural domains. These smaller motifs typically fold faster than an entire domain, on a time scale more consistent with translation rate modulation by synonymous codon usage. While proteins with conserved rare codon clusters are structurally and functionally diverse, they are enriched in functions associated with organism growth and development, suggesting an important role for synonymous codon usage in

  6. Evaluation of conserved and ultra-conserved non-genic sequences in chromosome 15q15-linked periodic catatonia.

    PubMed

    Schanze, Denny; Ekici, Arif B; Pfuhlmann, Bruno; Reis, André; Stöber, Gerald

    2012-01-01

    Conserved and ultra-conserved non-genic sequence elements (CNGs, UCEs) between human and other mammalian genomes seem to constitute a heterogeneous group of functional sequences which likely have important biological function. To determine whether variation in CNGs and UCEs contributes to risk for the schizophrenic subphenotype of periodic catatonia (according to K. Leonhard; OMIM 605419), we evaluated non-coding elements at a critical 7.35 Mb interval on chromosome 15q15 in 8 unrelated cases with periodic catatonia (derived from pedigrees compatible with linkage to chromosome 15q15) and 8 controls, followed by association studies in a cohort of 510 cases and controls. Among 65 CNGs (≥100 bp, 100% identity; human-mouse comparison), 7 CNGs matched criteria for UCE (≥200  bp, 100% identity). A hot spot of 62/65 CNGs (95%) appeared at the MEIS2 locus, which implicates functional importance of associated (ultra-)conserved elements to this early developmental gene, which is present in the human fetal neocortex and associated with metabolic side effects to antipsychotic drugs. Further CNGs were identified at the PLCB2 and DLL4 locus or located intergenic between TYRO3 and MAPKBP1. Automated sequencing revealed genetic variation in 12.3% of CNGs, but frequencies were low (MAF: 0.06-0.4) in cases. Three variants located inside CNGs/UCEs were found in cases only. In a case-control association study we could not confirm a significant association of these three CNG-variants with periodic catatonia. Our results suggest genetic variation in (ultra-)conserved non-genic sequence elements which might alter functional properties. The identified variants are genetically not associated with the phenotype of periodic catatonia. Copyright © 2011 Wiley Periodicals, Inc.

  7. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  8. A conserved C-terminal sequence of high-risk cutaneous beta-human papillomavirus E6 proteins alters localization and signalling of β1-integrin to promote cell migration.

    PubMed

    Holloway, Amy; Storey, Alan

    2014-01-01

    Beta-human papillomaviruses (β-HPV) infect cutaneous epithelia, and accumulating evidence suggests that the virus may act as a co-factor with UV-induced DNA damage in the development and progression of non-melanoma skin cancer, although the molecular mechanisms involved are poorly understood. The E6 protein of cutaneous β-HPV types encodes functions consistent with a role in tumorigenesis, and E6 expression can result in papilloma formation in transgenic animals. The E6 proteins of high-risk α-HPV types, which are associated with the development of anogenital cancers, have a conserved 4 aa motif at their extreme C terminus that binds to specific PDZ domain-containing proteins to promote cell invasion. Likewise, the high-risk β-HPVs HPV5 and HPV8 E6 proteins also share a conserved C-terminal motif, but this is markedly different from that of α-HPV types, implying functional differences. Using binding and functional studies, we have shown that β-HPV E6 proteins target β1-integrin using this C-terminal motif. E6 expression reduced membrane localization of β1-integrin, but increased overall levels of β1-integrin protein and its downstream effector focal adhesion kinase in human keratinocytes. Altered β1-integrin localization due to E6 expression was associated with actin cytoskeleton rearrangement and increased cell migration that was abolished by point mutations in the C-terminal motif of E6. We concluded that modulation of β1-integrin signalling by E6 proteins may contribute towards the pathogenicity of these β-HPV types.

  9. Remarkable intron and exon sequence conservation in human and mouse homeobox Hox 1. 3 genes

    SciTech Connect

    Tournier-Lasserve, E.; Odenwald, W.F.; Garbern, J.; Trojanowski, J.; Lazzarini, R.A.

    1989-05-01

    A high degree of conservation exists between the Hox 1.3 homeobox genes of mice and humans. The two genes occupy the same relative positions in their respective Hox 1 gene clusters, they show extensive sequence similarities in their coding and noncoding portions, and both are transcribed into multiple transcripts of similar sizes. The predicted human Hox 1.3 protein differs from its murine counterpart in only 7 of 270 amino acids. The sequence similarity in the 250 base pairs upstream of the initiation codon is 98%, the similarity between the two introns, both 960 base pairs long, is 72%, and the similarity in the 3' noncoding region from termination codon to polyadenylation signal is 90%. Both mouse and human Hox 1.3 introns contain a sequence with homology to a mating-type-controlled cis element of the yeast Ty1 transposon. DNA-binding studies with a recombinant mouse Hox 1.3 protein identified two binding sites in the intron, both of which were within the region of shared homology with this Ty1 cis element.

  10. The complete nucleotide sequence of the Crossostoma lacustre mitochondrial genome: conservation and variations among vertebrates.

    PubMed Central

    Tzeng, C S; Hui, C F; Shen, S C; Huang, P C

    1992-01-01

    The complete mitochondrial (mt) genome of Crossostoma lacustre, a freshwater loach from mountain stream of Taiwan, has been cloned and sequenced. This fish mt genome, consisting of 16558 base-pairs, encodes genes for 13 proteins, two rRNAs, and 22 tRNAs, in addition to a regulatory sequence for replication and transcription (D-loop), is similar to those of the other vertebrates in both the order and orientation of these genes. The protein-coding and ribosomal RNA genes are highly homologous both in size and composition, to their counterparts in mammals, birds, amphibians, and invertebrates, and using essentially the same set of codons, including both the initiation and termination signals, and the tRNAs. Differences do exist, however, in the lengths and sequences of the D-loop regions, and in space between genes, which account for the variations in total lengths of the genomes. Our observations provide evidence for the first time for the conservation of genetic information in the fish mitochondrial genome, especially among the vertebrates. PMID:1408800

  11. Sequence conservation in avian CR1: an interspersed repetitive DNA family evolving under functional constraints.

    PubMed Central

    Chen, Z Q; Ritzel, R G; Lin, C C; Hodgetts, R B

    1991-01-01

    CR1 is a short interspersed repetitive DNA element originally identified in the domestic chicken (Gallus gallus). However, unlike virtually all other such sequences described to date, CR1 is not confined to one or a few closely related species. It is probably a ubiquitous component of the avian genome, having been detected in representatives of nine orders encompassing a wide spectrum of the class Aves. This identification was made possible by using the polymerase chain reaction (PCR), which revealed interspecific similarities not detected by conventional Southern analysis. DNA sequence comparisons between a CR1 element isolated from a sarus crane (Grus antigone) and those isolated from an emu (Dromaius novaehollandiae) showed that two short highly conserved regions are present. These are included within two regions previously characterized in the CR1 units of domestic fowl. One of these behaves as a transcriptional silencer and the other is a binding site for a nuclear protein. Our observations suggest that CR1 has evolved under functional constraints and that interspersed repetitive sequences as a class may constitute a more significant component of the eukaryotic genome than is generally acknowledged. Images PMID:1829530

  12. Accelerated Evolution of Conserved Noncoding Sequences in theHuman Genome

    SciTech Connect

    Prambhakar, Shyam; Noonan, James P.; Paabo, Svante; Rubin, EdwardM.

    2006-07-06

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detect"cryptic" functional elements, which are too weakly conserved amongmammals to distinguish from nonfunctional DNA. To address this problem,we explored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  13. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  14. Spatial clustering of binding motifs and charges reveals conserved functional features in disordered nucleoporin sequences

    NASA Astrophysics Data System (ADS)

    Ando, David; Colvin, Michael; Rexach, Michael; Gopinathan, Ajay

    2013-03-01

    The Nuclear Pore Complex (NPC) gates the only channel through which cells exchange material between the nucleus and cytoplasm. Traffic is regulated by transport receptors bound to cargo which interact with numerous of disordered phenylalanine glycine (FG) repeat containing proteins (FG nups) that line this channel. The precise physical mechanism of transport regulation has remained elusive primarily due to the difficulty in understanding the structure and dynamics of such a large assembly of interacting disordered proteins. Here we have performed a comprehensive bioinformatic analysis, specifically tailored towards disordered proteins, on thousands of nuclear pore proteins from a variety of species revealing a set of highly conserved features in the sequence structure among FG nups. Contrary to the general perception that these proteins are functionally equivalent to homogeneous polymers, we show that biophysically important features within individual nups like the separation, spatial localization and ordering along the chain of FG and charge domains are highly conserved. Our current understanding of NPC structure and function should therefore be revised to account for these common features that are functionally relevant for the underlying physical mechanism of NPC gating.

  15. Conservation genetics of high elevation five-needle white pines

    Treesearch

    Andrew D. Bower; Sierra C. McLane; Andrew Eckert; Stacy Jorgensen; Anna Schoettle; Sally Aitken

    2011-01-01

    Conservation genetics examines the biophysical factors influencing genetic processes and uses that information to conserve and maintain the evolutionary potential of species and populations. Here we review published and unpublished literature on the conservation genetics of seven North American high-elevation five-needle pines. Although these species are widely...

  16. Readings in Wildlife and Fish Conservation, High School Conservation Curriculum Project.

    ERIC Educational Resources Information Center

    Ensminger, Jack

    This publication is a tentative edition of readings on Wildlife and Fish Conservation in Louisiana, and as such it forms part of one of the four units of study designed for an experimental high school course, the "High School Conservation Curriculum Project." The other three units are concerned with Forest Conervation, Soil and Water…

  17. The words of the regulatory code are arranged in a variable manner in highly conserved enhancers.

    PubMed

    Rastegar, Sepand; Hess, Isabell; Dickmeis, Thomas; Nicod, Jean Christophe; Ertzer, Raymond; Hadzhiev, Yavor; Thies, Wolf-Gerolf; Scherer, Gerd; Strähle, Uwe

    2008-06-15

    The cis-regulatory regions of many developmental regulators and transcription factors are believed to be highly conserved in the genomes of vertebrate species, suggesting specific regulatory mechanisms for these gene classes. We functionally characterized five notochord enhancers, whose sequence is highly conserved, and systematically mutated two of them. Two subregions were identified to be essential for expression in the notochord of the zebrafish embryo. Synthetic enhancers containing the two essential regions in front of a TATA-box drive expression in the notochord while concatemerization of the subregions alone is not sufficient, indicating that the combination of the two sequence elements is required for notochord expression. Both regions are present in the five functionally characterized notochord enhancers. However, the position, the distance and relative orientation of the two sequence motifs can vary substantially within the enhancer sequences. This suggests that the regulatory grammar itself does not dictate the high evolutionary conservation between these orthologous cis-regulatory sequences. Rather, it represents a less well-conserved layer of sequence organization within these sequences.

  18. Conservation of the human telomere sequence (TTAGGG)n among vertebrates.

    PubMed Central

    Meyne, J; Ratliff, R L; Moyzis, R K

    1989-01-01

    To determine the evolutionary origin of the human telomere sequence (TTAGGG)n, biotinylated oligodeoxynucleotides of this sequence were hybridized to metaphase spreads from 91 different species, including representative orders of bony fish, reptiles, amphibians, birds, and mammals. Under stringent hybridization conditions, fluorescent signals were detected at the telomeres of all chromosomes, in all 91 species. The conservation of the (TTAGGG)n sequence and its telomeric location, in species thought to share a common ancestor over 400 million years ago, strongly suggest that this sequence is the functional vertebrate telomere. Images PMID:2780561

  19. Complete nucleotide sequence of the Actinomyces viscosus T14V sialidase gene: presence of a conserved repeating sequence among strains of Actinomyces spp.

    PubMed Central

    Yeung, M K

    1993-01-01

    The nucleotide sequence of the Actinomyces viscosus T14V sialidase gene (nanH) and flanking regions was determined. An open reading frame of 2,703 nucleotides that encodes a predominately hydrophobic protein of 901 amino acids (M(r), 92,871) was identified. The amino acid sequence at the amino terminus of the predicted protein exhibited properties characteristic of a typical leader peptide. Five 12-amino-acid units that shared between 33 and 67% sequence identity were noted within the central domain of the protein. Each unit contained the sequence Ser-X-Asp-X-Gly-X-Thr-Trp, which is conserved among other bacterial and trypanosoma sp. sialidases. Thus, the A. viscosus T14V nanH gene and the other prokaryotic and eukaryotic sialidase genes evolved from a common ancestor. Southern hybridization analyses under conditions of high stringency revealed the existence of DNA sequences homologous to A. viscosus T14V nanH in the genomes of 18 strains of five Actinomyces species that expressed various levels of sialidase activity. The data demonstrate that the sialidase genes from divergent groups of Actinomyces spp. are highly conserved. Images PMID:8418033

  20. The BsaHI restriction-modification system: cloning, sequencing and analysis of conserved motifs.

    PubMed

    Neely, Robert K; Roberts, Richard J

    2008-05-14

    Restriction and modification enzymes typically recognise short DNA sequences of between two and eight bases in length. Understanding the mechanism of this recognition represents a significant challenge that we begin to address for the BsaHI restriction-modification system, which recognises the six base sequence GRCGYC. The DNA sequences of the genes for the BsaHI methyltransferase, bsaHIM, and restriction endonuclease, bsaHIR, have been determined (GenBank accession #EU386360), cloned and expressed in E. coli. Both the restriction endonuclease and methyltransferase enzymes share significant similarity with a group of 6 other enzymes comprising the restriction-modification systems HgiDI and HgiGI and the putative HindVP, NlaCORFDP, NpuORFC228P and SplZORFNP restriction-modification systems. A sequence alignment of these homologues shows that their amino acid sequences are largely conserved and highlights several motifs of interest. We target one such conserved motif, reading SPERRFD, at the C-terminal end of the bsaHIR gene. A mutational analysis of these amino acids indicates that the motif is crucial for enzymatic activity. Sequence alignment of the methyltransferase gene reveals a short motif within the target recognition domain that is conserved among enzymes recognising the same sequences. Thus, this motif may be used as a diagnostic tool to define the recognition sequences of the cytosine C5 methyltransferases. We have cloned and sequenced the BsaHI restriction and modification enzymes. We have identified a region of the R. BsaHI enzyme that is crucial for its activity. Analysis of the amino acid sequence of the BsaHI methyltransferase enzyme led us to propose two new motifs that can be used in the diagnosis of the recognition sequence of the cytosine C5-methyltransferases.

  1. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  2. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence.

    PubMed

    Gordon, Kacy L; Arthur, Robert K; Ruvinsky, Ilya

    2015-05-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements.

  3. Position-specific prediction of methylation sites from sequence conservation based on information theory.

    PubMed

    Shi, Yinan; Guo, Yanzhi; Hu, Yayun; Li, Menglong

    2015-07-23

    Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

  4. The nucleotide sequence of the human int-1 mammary oncogene; evolutionary conservation of coding and non-coding sequences.

    PubMed Central

    van Ooyen, A; Kwee, V; Nusse, R

    1985-01-01

    The mouse mammary tumor virus can induce mammary tumors in mice by proviral activation of an evolutionarily conserved cellular oncogene called int-1. Here we present the nucleotide sequence of the human homologue of int-1, and compare it with the mouse gene. Like the mouse gene, the human homologue contains a reading frame of 370 amino acids, with only four substitutions. The amino acid changes are all in the hydrophobic leader domain of the int-1 encoded protein, and do not significantly alter its hydropathic index. The conservation between the mouse and the human int-1 genes is not restricted to exons; extensive parts of the introns are also homologous. Thus, int-1 ranks among the most conserved genes known, a property shared with other oncogenes. PMID:2998762

  5. Two distinct nuclear factors bind the conserved regulatory sequences of a rabbit major histocompatibility complex class II gene.

    PubMed Central

    Sittisombut, N

    1988-01-01

    The constitutive coexpression of the major histocompatibility complex (MHC) class II genes in B lymphocytes requires positive, trans-acting transcriptional factors. The need for these trans-acting factors has been suggested by the reversion of the MHC class II-negative phenotype of rare B-lymphocyte mutants through somatic cell fusion with B cells or T-cell lines. The mechanism by which the trans-acting factors exert their effect on gene transcription is unknown. The possibility that two highly conserved DNA sequences, located 90 to 100 base pairs (bp) (the A sequence) and 60 to 70 bp (the B sequence) upstream of the transcription start site of the class II genes, are recognized by the trans-acting factors was investigated in this study. By using the gel electrophoresis retardation assay, a minimum of two proteins which specifically bound the conserved A or B sequence of a rabbit DP beta gene were identified in murine nuclear extracts of a B-lymphoma cell line, A20-2J. Fractionation of nuclear extract through a heparin-agarose column allowed the identification of one protein, designated NF-MHCIIB, which bound an oligonucleotide containing the B sequence and protected the entire B sequence in the DNase I protection analysis. Another protein, designated NF-MHCIIA, which bound an oligonucleotide containing the A sequence and partially protected the 3' half of this sequence, was also identified. NF-MHCIIB did not protect a CCAAT sequence located 17 bp downstream of the B sequence. The possible relationship between these DNA-binding factors and the trans-acting factors identified in the cell fusion experiments is discussed. Images PMID:3133552

  6. Inter-specific sequence conservation and intra-individual sequence variation in a spider silk gene.

    PubMed

    Tai, Pei-Ling; Hwang, Guang-Yuh; Tso, I-Min

    2004-10-01

    Currently, studies on major ampullate spidroin 1 (MaSp1) genes of non-orb weaving spiders are few, and it is not clear whether genes of these organisms exhibit the same characteristics as those of orb-weavers. In addition, many studies have proposed that MaSp1 might be a single gene with allelic variants, but supporting evidence is still lacking. In this study, we compared partial DNA and amino acid sequences of MaSp1 cloned from different spider guilds. We also cloned partial MaSp1 sequences from genomic DNA and cDNA of the same individuals of spiders using the same primer combination to see if different molecular forms existed. In the repetitive region of partial MaSp1 sequences obtained, GGX, GA and poly-A motifs were present in all Araneomorphae and Mygalomorpae species examined. An extreme similarity in MaSp1 non-repetitive portions was found in sequences of ecribellate, cribellate and Mygalomorphae web-builders and such a result suggested that this sequence might exhibit an important function. A comparison of sequences amplified from the same individual showed that substitutions in amino acids occurred in both repetitive and non-repetitive regions, with a much higher variation in the former. These results suggest that the MaSp1 of Araneomorphae spiders exhibits several forms in an individual spider and it might be either a multiple gene or a single gene with a multiple exon/intron organization.

  7. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    PubMed Central

    2010-01-01

    Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect

  8. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  9. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  10. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  11. The complete mitochondrial genome sequence of the liverwort Pleurozia purpurea reveals extremely conservative mitochondrial genome evolution in liverworts.

    PubMed

    Wang, Bin; Xue, Jiayu; Li, Libo; Liu, Yang; Qiu, Yin-Long

    2009-12-01

    Plant mitochondrial genomes have been known to be highly unusual in their large sizes, frequent intra-genomic rearrangement, and generally conservative sequence evolution. Recent studies show that in early land plants the mitochondrial genomes exhibit a mixed mode of conservative yet dynamic evolution. Here, we report the completely sequenced mitochondrial genome from the liverwort Pleurozia purpurea. The circular genome has a size of 168,526 base pairs, containing 43 protein-coding genes, 3 rRNA genes, 25 tRNA genes, and 31 group I or II introns. It differs from the Marchantia polymorpha mitochondrial genome, the only other liverwort chondriome that has been sequenced, in lacking two genes (trnRucg and trnTggu) and one intron (rrn18i1065gII). The two genomes have identical gene orders and highly similar sequences in exons, introns, and intergenic spacers. Finally, a comparative analysis of duplicated trnRucu and other trnR genes from the two liverworts and several other organisms identified the recent lateral origin of trnRucg in Marchantia mtDNA through modification of a duplicated trnRucu. This study shows that the mitochondrial genomes evolve extremely slowly in liverworts, the earliest-diverging lineage of extant land plants, in stark contrast to what is known of highly dynamic evolution of mitochondrial genomes in seed plants.

  12. Nucleotide sequence conservation of novel and established cis-regulatory sites within the tyrosine hydroxylase gene promoter

    PubMed Central

    Wang, Meng; Banerjee, Kasturi; Baker, Harriet; Cave, John W.

    2015-01-01

    Tyrosine hydroxylase (TH) is the rate-limiting enzyme in catecholamine biosynthesis and its gene proximal promoter ( < 1 kb upstream from the transcription start site) is essential for regulating transcription in both the developing and adult nervous systems. Several putative regulatory elements within the TH proximal promoter have been reported, but evolutionary conservation of these elements has not been thoroughly investigated. Since many vertebrate species are used to model development, function and disorders of human catecholaminergic neurons, identifying evolutionarily conserved transcription regulatory mechanisms is a high priority. In this study, we align TH proximal promoter nucleotide sequences from several vertebrate species to identify evolutionarily conserved motifs. This analysis identified three elements (a TATA box, cyclic AMP response element (CRE) and a 5′-GGTGG-3′ site) that constitute the core of an ancient vertebrate TH promoter. Focusing on only eutherian mammals, two regions of high conservation within the proximal promoter were identified: a ∼250 bp region adjacent to the transcription start site and a ∼85 bp region located approximately 350 bp further upstream. Within both regions, conservation of previously reported cis-regulatory motifs and human single nucleotide variants was evaluated. Transcription reporter assays in a TH -expressing cell line demonstrated the functionality of highly conserved motifs in the proximal promoter regions and electromobility shift assays showed that brain-region specific complexes assemble on these motifs. These studies also identified a non-canonical CRE binding (CREB) protein recognition element in the proximal promoter. Together, these studies provide a detailed analysis of evolutionary conservation within the TH promoter and identify potential cis-regulatory motifs that underlie a core set of regulatory mechanisms in mammals. PMID:25774193

  13. The linked conservation of structure and function in a family of high diversity: the monomeric cupredoxins.

    PubMed

    Gough, Julian; Chothia, Cyrus

    2004-06-01

    The monomeric cupredoxins are a highly divergent family of copper binding electron transport proteins that function in photosynthesis and respiration. To determine how function and structure are conserved in the context of large sequence differences, we have carried out a detailed analysis of the cupredoxins of known structure and their sequence homologs. The common structure of the cupredoxins is formed by a sandwich of two beta sheets which support a copper binding site. The structure of the deeply buried core is intimately coupled to the binding site on the surface of the protein; in each protein the conserved regions form one continuous substructure that extends from the surface active site and through the center of the molecule. Residues around the active site are conserved for functional reasons, while those deeper in the structure will be conserved for structural reasons. Together the two sets support each other.

  14. Sequence and domain conservation of the coelacanth Hsp40 and Hsp90 chaperones suggests conservation of function.

    PubMed

    Bishop, Özlem Tastan; Edkins, Adrienne Lesley; Blatch, Gregory Lloyd

    2014-09-01

    Molecular chaperones and their associated co-chaperones play an important role in preserving and regulating the active conformational state of cellular proteins. The chaperone complement of the Indonesian Coelacanth, Latimeria menadoensis, was elucidated using transcriptomic sequences. Heat shock protein 90 (Hsp90) and heat shock protein 40 (Hsp40) chaperones, and associated co-chaperones were focused on, and homologous human sequences were used to search the sequence databases. Coelacanth homologs of the cytosolic, mitochondrial and endoplasmic reticulum (ER) homologs of human Hsp90 were identified, as well as all of the major co-chaperones of the cytosolic isoform. Most of the human Hsp40s were found to have coelacanth homologs, and the data suggested that all of the chaperone machinery for protein folding at the ribosome, protein translocation to cellular compartments such as the ER and protein degradation were conserved. Some interesting similarities and differences were identified when interrogating human, mouse, and zebrafish homologs. For example, DnaJB13 is predicted to be a non-functional Hsp40 in humans, mouse, and zebrafish due to a corrupted histidine-proline-aspartic acid (HPD) motif, while the coelacanth homolog has an intact HPD. These and other comparisons enabled important functional and evolutionary questions to be posed for future experimental studies.

  15. The most deeply conserved noncoding sequences in plants serve similar functions to those in vertebrates despite large differences in evolutionary rates.

    PubMed

    Burgess, Diane; Freeling, Michael

    2014-03-01

    In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing-associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates.

  16. Increased sequence coverage through combined targeting of variant and conserved epitopes correlates with control of HIV replication.

    PubMed

    Sunshine, Justine; Kim, Moon; Carlson, Jonathan M; Heckerman, David; Czartoski, Julie; Migueles, Stephen A; Maenza, Janine; McElrath, M Juliana; Mullins, James I; Frahm, Nicole

    2014-01-01

    A major challenge in the development of an HIV vaccine is that of contending with the extensive sequence variability found in circulating viruses. Induction of HIV-specific T-cell responses targeting conserved regions and induction of HIV-specific T-cell responses recognizing a high number of epitope variants have both been proposed as strategies to overcome this challenge. We addressed the ability of cytotoxic T lymphocytes from 30 untreated HIV-infected subjects with and without control of virus replication to recognize all clade B Gag sequence variants encoded by at least 5% of the sequences in the Los Alamos National Laboratory HIV database (1,300 peptides) using gamma interferon and interleukin-2 (IFN-γ/IL-2) FluoroSpot analysis. While targeting of conserved regions was similar in the two groups (P = 0.47), we found that subjects with control of virus replication demonstrated marginally lower recognition of Gag epitope variants than subjects with normal progression (P = 0.05). In viremic controllers and progressors, we found variant recognition to be associated with viral load (r = 0.62, P = 0.001). Interestingly, we show that increased overall sequence coverage, defined as the overall proportion of HIV database sequences targeted through the Gag-specific repertoire, is inversely associated with viral load (r = -0.38, P = 0.03). Furthermore, we found that sequence coverage, but not variant recognition, correlated with increased recognition of a panel of clade B HIV founder viruses (r = 0.50, P = 0.004). We propose sequence coverage by HIV Gag-specific immune responses as a possible correlate of protection that may contribute to control of virus replication. Additionally, sequence coverage serves as a valuable measure by which to evaluate the protective potential of future vaccination strategies.

  17. Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

    PubMed Central

    Hemberg, Martin; Gray, Jesse M.; Cloonan, Nicole; Kuersten, Scott; Grimmond, Sean; Greenberg, Michael E.; Kreiman, Gabriel

    2012-01-01

    More than 98% of a typical vertebrate genome does not code for proteins. Although non-coding regions are sprinkled with short (<200 bp) islands of evolutionarily conserved sequences, the function of most of these unannotated conserved islands remains unknown. One possibility is that unannotated conserved islands could encode non-coding RNAs (ncRNAs); alternatively, unannotated conserved islands could serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers. Here we assess these possibilities by comparing unannotated conserved islands in the human and mouse genomes to transcribed regions and to RFBSs, relying on a detailed case study of one human and one mouse cell type. We define transcribed regions by applying a novel transcript-calling algorithm to RNA-Seq data obtained from total cellular RNA, and we define RFBSs using ChIP-Seq and DNAse-hypersensitivity assays. We find that unannotated conserved islands are four times more likely to coincide with RFBSs than with unannotated ncRNAs. Thousands of conserved RFBSs can be categorized as insulators based on the presence of CTCF or as enhancers based on the presence of p300/CBP and H3K4me1. While many unannotated conserved RFBSs are transcriptionally active to some extent, the transcripts produced tend to be unspliced, non-polyadenylated and expressed at levels 10 to 100-fold lower than annotated coding or ncRNAs. Extending these findings across multiple cell types and tissues, we propose that most conserved non-coding genomic DNA in vertebrate genomes corresponds to promoter-distal regulatory elements. PMID:22684627

  18. Conservation of Tubulin-Binding Sequences in TRPV1 throughout Evolution

    PubMed Central

    Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan

    2012-01-01

    Background Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Methodology and Principal Findings Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Conclusions and Significance Our analysis identifies the regions of TRPV1, which are important for structure – function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol

  19. Rice pseudomolecule-anchored cross-species DNA sequence alignments indicate regional genomic variation in expressed sequence conservation

    PubMed Central

    Armstead, Ian; Huang, Lin; King, Julie; Ougham, Helen; Thomas, Howard; King, Ian

    2007-01-01

    Background Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences. Results A comparison of the frequency of sequence alignments, determined by MegaBLAST, between rice coding sequences in TIGR pseudomolecules and annotations vs 4.0 and comprehensive transcript-assembly and methylation-filtered databases from Lolium perenne (ryegrass), Zea mays (maize), Hordeum vulgare (barley), Glycine max (soybean) and Arabidopsis thaliana (thale cress) was undertaken. Each rice pseudomolecule was divided into 10 segments, each containing 10% of the functionally annotated, expressed genes. This indicated a correlation between relative segment position in the rice genome and numbers of alignments with all the queried monocot and dicot plant databases. Colour-coded moving windows of 100 functionally annotated, expressed genes along each pseudomolecule were used to generate 'heat-maps'. These revealed consistent intra- and inter-pseudomolecule variation in the relative concentrations of significant alignments with the tested plant databases. Analysis of the annotations and derived putative expression patterns of rice genes from 'hot-spots' and 'cold-spots' within the heat maps indicated possible functional differences. A similar comparison relating to ancestral duplications of the rice genome indicated that duplications were often associated with 'hot-spots'. Conclusion Physical positions of expressed genes in the rice genome are correlated with the degree of conservation of similar sequences in the transcriptomes of other plant species. This relative conservation is associated with the distribution of different sized gene families and segmentally duplicated loci and may have functional and evolutionary implications. PMID:17708759

  20. The penicillin gene cluster is amplified in tandem repeats linked by conserved hexanucleotide sequences.

    PubMed Central

    Fierro, F; Barredo, J L; Díez, B; Gutierrez, S; Fernández, F J; Martín, J F

    1995-01-01

    The penicillin biosynthetic genes (pcbAB, pcbC, penDE) of Penicillium chrysogenum AS-P-78 were located in a 106.5-kb DNA region that is amplified in tandem repeats (five or six copies) linked by conserved TTTACA sequences. The wild-type strains P. chrysogenum NRRL 1951 and Penicillium notatum ATCC 9478 (Fleming's isolate) contain a single copy of the 106.5-kb region. This region was bordered by the same TTTACA hexanucleotide found between tandem repeats in strain AS-P-78. A penicillin overproducer strain, P. chrysogenum E1, contains a large number of copies in tandem of a 57.9-kb DNA fragment, linked by the same hexanucleotide or its reverse complementary TGTAAA sequence. The deletion mutant P. chrysogenum npe10 showed a deletion of 57.9 kb that corresponds exactly to the DNA fragment that is amplified in E1. The conserved hexanucleotide sequence was reconstituted at the deletion site. The amplification has occurred within a single chromosome (chromosome I). The tandem reiteration and deletion appear to arise by mutation-induced site-specific recombination at the conserved hexanucleotide sequences. Images Fig. 3 PMID:7597101

  1. The penicillin gene cluster is amplified in tandem repeats linked by conserved hexanucleotide sequences.

    PubMed

    Fierro, F; Barredo, J L; Díez, B; Gutierrez, S; Fernández, F J; Martín, J F

    1995-06-20

    The penicillin biosynthetic genes (pcbAB, pcbC, penDE) of Penicillium chrysogenum AS-P-78 were located in a 106.5-kb DNA region that is amplified in tandem repeats (five or six copies) linked by conserved TTTACA sequences. The wild-type strains P. chrysogenum NRRL 1951 and Penicillium notatum ATCC 9478 (Fleming's isolate) contain a single copy of the 106.5-kb region. This region was bordered by the same TTTACA hexanucleotide found between tandem repeats in strain AS-P-78. A penicillin overproducer strain, P. chrysogenum E1, contains a large number of copies in tandem of a 57.9-kb DNA fragment, linked by the same hexanucleotide or its reverse complementary TGTAAA sequence. The deletion mutant P. chrysogenum npe10 showed a deletion of 57.9 kb that corresponds exactly to the DNA fragment that is amplified in E1. The conserved hexanucleotide sequence was reconstituted at the deletion site. The amplification has occurred within a single chromosome (chromosome I). The tandem reiteration and deletion appear to arise by mutation-induced site-specific recombination at the conserved hexanucleotide sequences.

  2. Primary structure of the merozoite surface antigen 1 of Plasmodium vivax reveals sequences conserved between different Plasmodium species.

    PubMed Central

    del Portillo, H A; Longacre, S; Khouri, E; David, P H

    1991-01-01

    Merozoite surface antigen 1 (MSA1) of several species of plasmodia has been shown to be a promising candidate for a vaccine directed against the asexual blood stages of malaria. We report the cloning and characterization of the MSA1 gene of the human malaria parasite Plasmodium vivax. This gene, which we call Pv200, encodes a polypeptide of 1726 amino acids and displays features described for MSA1 genes of other species, such as signal peptide and anchoring sequences, conserved cysteine residues, number of potential N-glycosylation sites, and repeats consisting here of 23 glutamine residues in a row. When the nucleotide and deduced amino acid sequences of the MSA1 of P. vivax are compared to those of another human malaria parasite, Plasmodium falciparum, and to those of the rodent parasite Plasmodium yoelii, 10 regions of high amino acid similarity are observed despite the very different dG + dC contents of the corresponding genes. All of the interspecies conserved regions reside within the conserved or semiconserved blocks delimited by the sequences of different alleles of the MSA1 gene of P. falciparum. Images PMID:2023952

  3. Structure and sequence conservation of hao cluster genes of autotrophic ammonia-oxidizing bacteria: evidence for their evolutionary history.

    PubMed

    Bergmann, David J; Hooper, Alan B; Klotz, Martin G

    2005-09-01

    Comparison of the organization and sequence of the hao (hydroxylamine oxidoreductase) gene clusters from the gammaproteobacterial autotrophic ammonia-oxidizing bacterium (aAOB) Nitrosococcus oceani and the betaproteobacterial aAOB Nitrosospira multiformis and Nitrosomonas europaea revealed a highly conserved gene cluster encoding the following proteins: hao, hydroxylamine oxidoreductase; orf2, a putative protein; cycA, cytochrome c(554); and cycB, cytochrome c(m)(552). The deduced protein sequences of HAO, c(554), and c(m)(552) were highly similar in all aAOB despite their differences in species evolution and codon usage. Phylogenetic inference revealed a broad family of multi-c-heme proteins, including HAO, the pentaheme nitrite reductase, and tetrathionate reductase. The c-hemes of this group also have a nearly identical geometry of heme orientation, which has remained conserved during divergent evolution of function. High sequence similarity is also seen within a protein family, including cytochromes c(m)(552), NrfH/B, and NapC/NirT. It is proposed that the hydroxylamine oxidation pathway evolved from a nitrite reduction pathway involved in anaerobic respiration (denitrification) during the radiation of the Proteobacteria. Conservation of the hydroxylamine oxidation module was maintained by functional pressure, and the module expanded into two separate narrow taxa after a lateral gene transfer event between gamma- and betaproteobacterial ancestors of extant aAOB. HAO-encoding genes were also found in six non-aAOB, either singly or tandemly arranged with an orf2 gene, whereas a c(554) gene was lacking. The conservation of the hao gene cluster in general and the uniqueness of the c(554) gene in particular make it a suitable target for the design of primers and probes useful for molecular ecology approaches to detect aAOB.

  4. Structure and Sequence Conservation of hao Cluster Genes of Autotrophic Ammonia-Oxidizing Bacteria: Evidence for Their Evolutionary History

    PubMed Central

    Bergmann, David J.; Hooper, Alan B.; Klotz, Martin G.

    2005-01-01

    Comparison of the organization and sequence of the hao (hydroxylamine oxidoreductase) gene clusters from the gammaproteobacterial autotrophic ammonia-oxidizing bacterium (aAOB) Nitrosococcus oceani and the betaproteobacterial aAOB Nitrosospira multiformis and Nitrosomonas europaea revealed a highly conserved gene cluster encoding the following proteins: hao, hydroxylamine oxidoreductase; orf2, a putative protein; cycA, cytochrome c554; and cycB, cytochrome cm552. The deduced protein sequences of HAO, c554, and cm552 were highly similar in all aAOB despite their differences in species evolution and codon usage. Phylogenetic inference revealed a broad family of multi-c-heme proteins, including HAO, the pentaheme nitrite reductase, and tetrathionate reductase. The c-hemes of this group also have a nearly identical geometry of heme orientation, which has remained conserved during divergent evolution of function. High sequence similarity is also seen within a protein family, including cytochromes cm552, NrfH/B, and NapC/NirT. It is proposed that the hydroxylamine oxidation pathway evolved from a nitrite reduction pathway involved in anaerobic respiration (denitrification) during the radiation of the Proteobacteria. Conservation of the hydroxylamine oxidation module was maintained by functional pressure, and the module expanded into two separate narrow taxa after a lateral gene transfer event between gamma- and betaproteobacterial ancestors of extant aAOB. HAO-encoding genes were also found in six non-aAOB, either singly or tandemly arranged with an orf2 gene, whereas a c554 gene was lacking. The conservation of the hao gene cluster in general and the uniqueness of the c554 gene in particular make it a suitable target for the design of primers and probes useful for molecular ecology approaches to detect aAOB. PMID:16151127

  5. Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha).

    PubMed

    Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

    2014-03-01

    Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure.

  6. Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha)

    PubMed Central

    Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

    2014-01-01

    Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338

  7. PCR-based study of conserved and variable DNA sequences of Tritrichomonas foetus isolates from Saskatchewan, Canada.

    PubMed Central

    Riley, D E; Wagner, B; Polley, L; Krieger, J N

    1995-01-01

    The protozoan parasite Tritrichomonas foetus causes infertility and spontaneous abortion in cattle. In Saskatchewan, Canada, the culture prevalence of trichomonads was 65 of 1,048 (6%) among 1,048 bulls tested within a 1-year period ending in April 1994. Saskatchewan was previously thought to be free of the parasite. To confirm the culture results, possible T. foetus DNA presence was determined by the PCR. All of the 16 culture-positive isolates tested were PCR positive by a single-band test, but one PCR product was weak. DNA fingerprinting by both T17 PCR and randomly amplified polymorphic DNA PCR revealed genetic variation or polymorphism among the T. foetus isolates. T17 PCR also revealed conserved loci that distinguished these T. foetus isolates from Trichomonas vaginalis, from a variety of other protozoa, and from prokaryotes. TCO-1 PCR, a PCR test designed to sample DNA sequence homologous to the 5' flank of a highly conserved cell division control gene, detected genetic polymorphism at low stringency and a conserved, single locus at higher stringency. These findings suggested that T. foetus isolates exhibit both conserved genetic loci and polymorphic loci detectable by independent PCR methods. Both conserved and polymorphic genetic loci may prove useful for improved clinical diagnosis of T. foetus. The polymorphic loci detected by PCR suggested either a long history of infection or multiple lines of T. foetus infection in Saskatchewan. Polymorphic loci detected by PCR may provide data for epidemiologic studies of T. foetus. PMID:7615746

  8. PCR-based study of conserved and variable DNA sequences of Tritrichomonas foetus isolates from Saskatchewan, Canada.

    PubMed

    Riley, D E; Wagner, B; Polley, L; Krieger, J N

    1995-05-01

    The protozoan parasite Tritrichomonas foetus causes infertility and spontaneous abortion in cattle. In Saskatchewan, Canada, the culture prevalence of trichomonads was 65 of 1,048 (6%) among 1,048 bulls tested within a 1-year period ending in April 1994. Saskatchewan was previously thought to be free of the parasite. To confirm the culture results, possible T. foetus DNA presence was determined by the PCR. All of the 16 culture-positive isolates tested were PCR positive by a single-band test, but one PCR product was weak. DNA fingerprinting by both T17 PCR and randomly amplified polymorphic DNA PCR revealed genetic variation or polymorphism among the T. foetus isolates. T17 PCR also revealed conserved loci that distinguished these T. foetus isolates from Trichomonas vaginalis, from a variety of other protozoa, and from prokaryotes. TCO-1 PCR, a PCR test designed to sample DNA sequence homologous to the 5' flank of a highly conserved cell division control gene, detected genetic polymorphism at low stringency and a conserved, single locus at higher stringency. These findings suggested that T. foetus isolates exhibit both conserved genetic loci and polymorphic loci detectable by independent PCR methods. Both conserved and polymorphic genetic loci may prove useful for improved clinical diagnosis of T. foetus. The polymorphic loci detected by PCR suggested either a long history of infection or multiple lines of T. foetus infection in Saskatchewan. Polymorphic loci detected by PCR may provide data for epidemiologic studies of T. foetus.

  9. Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences

    PubMed Central

    Hughes, Jim R.; Cheng, Jan-Fang; Ventress, Nicki; Prabhakar, Shyam; Clark, Kevin; Anguita, Eduardo; De Gobbi, Marco; de Jong, Pieter; Rubin, Eddy; Higgs, Douglas R.

    2005-01-01

    An important step toward improving the annotation of the human genome is to identify cis-acting regulatory elements from primary DNA sequence. One approach is to compare sequences from multiple, divergent species. This approach distinguishes multispecies conserved sequences (MCS) in noncoding regions from more rapidly evolving neutral DNA. Here, we have analyzed a region of ≈238kb containing the human α globin cluster that was sequenced and/or annotated across the syntenic region in 22 species spanning 500 million years of evolution. Using a variety of bioinformatic approaches and correlating the results with many aspects of chromosome structure and function in this region, we were able to identify and evaluate the importance of 24 individual MCSs. This approach sensitively and accurately identified previously characterized regulatory elements but also discovered unidentified promoters, exons, splicing, and transcriptional regulatory elements. Together, these studies demonstrate an integrated approach by which to identify, subclassify, and predict the potential importance of MCSs. PMID:15998734

  10. Human U1 small nuclear RNA genes: extensive conservation of flanking sequences suggests cycles of gene amplification and transposition.

    PubMed Central

    Bernstein, L B; Manser, T; Weiner, A M

    1985-01-01

    The DNA immediately flanking the 164-base-pair U1 RNA coding region is highly conserved among the approximately 30 human U1 genes. The U1 multigene family also contains many U1 pseudogenes (designated class I) with striking although imperfect flanking homology to the true U1 genes. Using cosmid vectors, we now have cloned, characterized, and partially sequenced three 35-kilobase (kb) regions of the human genome spanning U1 homologies. Two clones contain one true U1 gene each, and the third bears two class I pseudogenes 9 kb apart in the opposite orientation. We show by genomic blotting and by direct DNA sequence determination that the conserved sequences surrounding U1 genes are much more extensive than previously estimated: nearly perfect sequence homology between many true U1 genes extends for at least 24 kb upstream and at least 20 kb downstream from the U1 coding region. In addition, the sequences of the two new pseudogenes provide evidence that class I U1 pseudogenes are more closely related to each other than to true genes. Finally, it is demonstrated elsewhere (Lindgren et al., Mol. Cell. Biol. 5:2190-2196, 1985) that both true U1 genes and class I U1 pseudogenes map to chromosome 1, but in separate clusters located far apart on opposite sides of the centromere. Taken together, these results suggest a model for the evolution of the U1 multigene family. We speculate that the contemporary family of true U1 genes was derived from a more ancient family of U1 genes (now class I U1 pseudogenes) by gene amplification and transposition. Gene amplification provides the simplest explanation for the clustering of both U1 genes and class I pseudogenes and for the conservation of at least 44 kb of DNA flanking the U1 coding region in a large fraction of the 30 true U1 genes. Images PMID:3837185

  11. Conservation of symbiotic nitrogen fixation gene sequences in Rhizobium japonicum and Bradyrhizobium japonicum.

    PubMed Central

    Masterson, R V; Prakash, R K; Atherly, A G

    1985-01-01

    Southern hybridization with nif (nitrogen fixation) and nod (nodulation) DNA probes from Rhizobium meliloti against intact plasmid DNA of Rhizobium japonicum and Bradyrhizobium japonicum strains indicated that both nif and nod sequences are on plasmid DNA in most R. japonicum strains. An exception is found with R. japonicum strain USDA194 and all B. japonicum strains where nif and nod sequences are on the chromosome. In R. japonicum strains, with the exception of strain USDA205, both nif and nod sequences are on the same plasmid. In strain USDA205, the nif genes are on a 112-megadalton plasmid, and nod genes are on a 195-megadalton plasmid. Hybridization to EcoRI digests of total DNA to nif and nod probes from R. meliloti show that the nif and nod sequences are conserved in both R. japonicum and B. japonicum strains regardless of the plasmid or chromosomal location of these genes. In addition, nif DNA hybridization patterns were identical among all R. japonicum strains and with most of the B. japonicum strains examined. Similarly, many of the bands that hybridize to the nodulation probe isolated from R. meliloti were found to be common among R. japonicum strains. Under reduced hybridization stringency conditions, strong conservation of nodulation sequences was observed in strains of B. japonicum. We have also found that the plasmid pRjaUSDA193, which possess nif and nod sequences, does not possess sequence homology with any plasmid of USDA194, but is homologous to parts of the chromosome of USDA194. Strain USDA194 is unique, since nif and nod sequences are present on the chromosome instead of on a plasmid as observed with all other strains examined. Images PMID:4008441

  12. Predicting RNA-binding residues from evolutionary information and sequence conservation

    PubMed Central

    2010-01-01

    Abstract Background RNA-binding proteins (RBPs) play crucial roles in post-transcriptional control of RNA. RBPs are designed to efficiently recognize specific RNA sequences after it is derived from the DNA sequence. To satisfy diverse functional requirements, RNA binding proteins are composed of multiple blocks of RNA-binding domains (RBDs) presented in various structural arrangements to provide versatile functions. The ability to computationally predict RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. Results The proposed prediction framework named “ProteRNA” combines a SVM-based classifier with conserved residue discovery by WildSpan to identify the residues that interact with RNA in a RNA-binding protein. Although these conserved residues can be either functionally conserved residues or structurally conserved residues, they provide clues on the important residues in a protein sequence. In the independent testing dataset, ProteRNA has been able to deliver overall accuracy of 89.78%, MCC of 0.2628, F-score of 0.3075, and F0.5-score of 0.3546. Conclusions This article presents the design of a sequence-based predictor aiming to identify the RNA-binding residues in a RNA-binding protein by combining machine learning and pattern mining approaches. RNA-binding proteins have diverse functions while interacting with different categories of RNAs because these proteins are composed of multiple copies of RNA-binding domains presented in various structural arrangements to expand the functional repertoire of RNA-binding proteins. Furthermore, predicting RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. PMID:21143803

  13. Genomic Sequence Analysis of Fugu rubripes CFTR and Flanking Genes in a 60 kb Region Conserving Synteny with 800 kb of Human Chromosome 7

    PubMed Central

    Davidson, Heather; Taylor, Martin S.; Doherty, Ann; Boyd, A. Christopher; Porteous, David J.

    2000-01-01

    To define control elements that regulate tissue-specific expression of the cystic fibrosis transmembrane regulator (CFTR), we have sequenced 60 kb of genomic DNA from the puffer fish Fugu rubripes (Fugu) that includes the CFTR gene. This region of the Fugu genome shows conservation of synteny with 800-kb sequence of the human genome encompassing the WNT2, CFTR, Z43555, and CBP90 genes. Additionally, the genomic structure of each gene is conserved. In a multiple sequence alignment of human, mouse, and Fugu, the putative WNT2 promoter sequence is shown to contain highly conserved elements that may be transcription factor or other regulatory binding sites. We have found two putative ankyrin repeat-containing genes that flank the CFTR gene. Overall sequence analysis suggests conservation of intron/exon boundaries between Fugu and human CFTR and revealed extensive homology between functional protein domains. However, the immediate 5′ regions of human and Fugu CFTR are highly divergent with few conserved sequences apart from those resembling diminished cAMP response elements (CRE) and CAAT box elements. Interestingly, the polymorphic polyT tract located upstream of exon 9 is present in human and Fugu but absent in mouse. Similarly, an intron 1 and intron 9 element common to human and Fugu is absent in mouse. The euryhaline killifish CFTR coding sequence is highly homologous to the Fugu sequence, suggesting that upregulation of CFTR in that species in response to salinity may be regulated transcriptionally. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AJ271361, for the combined cosmids 159C9, 146H13, 6M15, and 145M20.] PMID:10958637

  14. NASP: a parallel program for identifying evolutionarily conserved nucleic acid secondary structures from nucleotide sequence alignments.

    PubMed

    Semegni, J Y; Wamalwa, M; Gaujoux, R; Harkins, G W; Gray, A; Martin, D P

    2011-09-01

    Many natural nucleic acid sequences have evolutionarily conserved secondary structures with diverse biological functions. A reliable computational tool for identifying such structures would be very useful in guiding experimental analyses of their biological functions. NASP (Nucleic Acid Structure Predictor) is a program that takes into account thermodynamic stability, Boltzmann base pair probabilities, alignment uncertainty, covarying sites and evolutionary conservation to identify biologically relevant secondary structures within multiple sequence alignments. Unique to NASP is the consideration of all this information together with a recursive permutation-based approach to progressively identify and list the most conserved probable secondary structures that are likely to have the greatest biological relevance. By focusing on identifying only evolutionarily conserved structures, NASP forgoes the prediction of complete nucleotide folds but outperforms various other secondary structure prediction methods in its ability to selectively identify actual base pairings. Downloable and web-based versions of NASP are freely available at http://web.cbio.uct.ac.za/~yves/nasp_portal.php yves@cbio.uct.ac.za Supplementary data are available at Bioinformatics online.

  15. Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans.

    PubMed

    Awan, Ali R; Manfredo, Amanda; Pleiss, Jeffrey A

    2013-07-30

    Alternative splicing is a potent regulator of gene expression that vastly increases proteomic diversity in multicellular eukaryotes and is associated with organismal complexity. Although alternative splicing is widespread in vertebrates, little is known about the evolutionary origins of this process, in part because of the absence of phylogenetically conserved events that cross major eukaryotic clades. Here we describe a lariat-sequencing approach, which offers high sensitivity for detecting splicing events, and its application to the unicellular fungus, Schizosaccharomyces pombe, an organism that shares many of the hallmarks of alternative splicing in mammalian systems but for which no previous examples of exon-skipping had been demonstrated. Over 200 previously unannotated splicing events were identified, including examples of regulated alternative splicing. Remarkably, an evolutionary analysis of four of the exons identified here as subject to skipping in S. pombe reveals high sequence conservation and perfect length conservation with their homologs in scores of plants, animals, and fungi. Moreover, alternative splicing of two of these exons have been documented in multiple vertebrate organisms, making these the first demonstrations of identical alternative-splicing patterns in species that are separated by over 1 billion y of evolution.

  16. Evolution, homology conservation, and identification of unique sequence signatures in GH19 family chitinases.

    PubMed

    Udaya Prakash, N A; Jayanthi, M; Sabarinathan, R; Kangueane, P; Mathew, Lazar; Sekar, K

    2010-05-01

    The discovery of GH (Glycoside Hydrolase) 19 chitinases in Streptomyces sp. raises the possibility of the presence of these proteins in other bacterial species, since they were initially thought to be confined to higher plants. The present study mainly concentrates on the phylogenetic distribution and homology conservation in GH19 family chitinases. Extensive database searches are performed to identify the presence of GH19 family chitinases in the three major super kingdoms of life. Multiple sequence alignment of all the identified GH19 chitinase family members resulted in the identification of globally conserved residues. We further identified conserved sequence motifs across the major sub groups within the family. Estimation of evolutionary distance between the various bacterial and plant chitinases are carried out to better understand the pattern of evolution. Our study also supports the horizontal gene transfer theory, which states that GH19 chitinase genes are transferred from higher plants to bacteria. Further, the present study sheds light on the phylogenetic distribution and identifies unique sequence signatures that define GH19 chitinase family of proteins. The identified motifs could be used as markers to delineate uncharacterized GH19 family chitinases. The estimation of evolutionary distance between chitinase identified in plants and bacteria shows that the flowering plants are more related to chitinase in actinobacteria than that of identified in purple bacteria. We propose a model to elucidate the natural history of GH19 family chitinases.

  17. A phylogenetically conserved sequence within viral 3' untranslated RNA pseudoknots regulates translation.

    PubMed Central

    Leathers, V; Tanguay, R; Kobayashi, M; Gallie, D R

    1993-01-01

    Both the 68-base 5' leader (omega) and the 205-base 3' untranslated region (UTR) of tobacco mosaic virus (TMV) promote efficient translation. A 35-base region within omega is necessary and sufficient for the regulation. Within the 3' UTR, a 52-base region, composed of two RNA pseudoknots, is required for regulation. These pseudoknots are phylogenetically conserved among seven viruses from two different viral groups and one satellite virus. The pseudoknots contained significant conservation at the secondary and tertiary levels and at several positions at the primary sequence level. Mutational analysis of the sequences determined that the primary sequence in several conserved positions, particularly within the third pseudoknot, was essential for function. The higher-order structure of the pseudoknots was also required. Both the leader and the pseudoknot region were specifically recognized by, and competed for, the same proteins in extracts made from carrot cell suspension cells and wheat germ. Binding of the proteins is much stronger to omega than the pseudoknot region. Synergism was observed between the TMV 3' UTR and the cap and to a lesser extent between omega and the 3' UTR. The functional synergism and the protein binding data suggest that the cap, TMV 5' leader, and 3' UTR interact to establish an efficient level of translation. Images PMID:8355685

  18. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

    PubMed

    Capra, John A; Laskowski, Roman A; Thornton, Janet M; Singh, Mona; Funkhouser, Thomas A

    2009-12-01

    Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).

  19. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish

    PubMed Central

    Chew, Guo-Liang; Pauli, Andrea; Schier, Alexander F.

    2016-01-01

    Upstream open reading frames (uORFs) are ubiquitous repressive genetic elements in vertebrate mRNAs. While much is known about the regulation of individual genes by their uORFs, the range of uORF-mediated translational repression in vertebrate genomes is largely unexplored. Moreover, it is unclear whether the repressive effects of uORFs are conserved across species. To address these questions, we analyse transcript sequences and ribosome profiling data from human, mouse and zebrafish. We find that uORFs are depleted near coding sequences (CDSes) and have initiation contexts that diminish their translation. Linear modelling reveals that sequence features at both uORFs and CDSes modulate the translation of CDSes. Moreover, the ratio of translation over 5′ leaders and CDSes is conserved between human and mouse, and correlates with the number of uORFs. These observations suggest that the prevalence of vertebrate uORFs may be explained by their conserved role in repressing CDS translation. PMID:27216465

  20. Detecting Remote Sequence Homology in Disordered Proteins: Discovery of Conserved Motifs in the N-Termini of Mononegavirales phosphoproteins

    PubMed Central

    Karlin, David; Belshaw, Robert

    2012-01-01

    Paramyxovirinae are a large group of viruses that includes measles virus and parainfluenza viruses. The viral Phosphoprotein (P) plays a central role in viral replication. It is composed of a highly variable, disordered N-terminus and a conserved C-terminus. A second viral protein alternatively expressed, the V protein, also contains the N-terminus of P, fused to a zinc finger. We suspected that, despite their high variability, the N-termini of P/V might all be homologous; however, using standard approaches, we could previously identify sequence conservation only in some Paramyxovirinae. We now compared the N-termini using sensitive sequence similarity search programs, able to detect residual similarities unnoticeable by conventional approaches. We discovered that all Paramyxovirinae share a short sequence motif in their first 40 amino acids, which we called soyuz1. Despite its short length (11–16aa), several arguments allow us to conclude that soyuz1 probably evolved by homologous descent, unlike linear motifs. Conservation across such evolutionary distances suggests that soyuz1 plays a crucial role and experimental data suggest that it binds the viral nucleoprotein to prevent its illegitimate self-assembly. In some Paramyxovirinae, the N-terminus of P/V contains a second motif, soyuz2, which might play a role in blocking interferon signaling. Finally, we discovered that the P of related Mononegavirales contain similarly overlooked motifs in their N-termini, and that their C-termini share a previously unnoticed structural similarity suggesting a common origin. Our results suggest several testable hypotheses regarding the replication of Mononegavirales and suggest that disordered regions with little overall sequence similarity, common in viral and eukaryotic proteins, might contain currently overlooked motifs (intermediate in length between linear motifs and disordered domains) that could be detected simply by comparing orthologous proteins. PMID:22403617

  1. Assembly of transmembrane helices of simple polytopic membrane proteins from sequence conservation patterns.

    PubMed

    Park, Yungki; Helms, Volkhard

    2006-09-01

    The transmembrane (TM) domains of most membrane proteins consist of helix bundles. The seemingly simple task of TM helix bundle assembly has turned out to be extremely difficult. This is true even for simple TM helix bundle proteins, i.e., those that have the simple form of compact TM helix bundles. Herein, we present a computational method that is capable of generating native-like structural models for simple TM helix bundle proteins having modest numbers of TM helices based on sequence conservation patterns. Thus, the only requirement for our method is the presence of more than 30 homologous sequences for an accurate extraction of sequence conservation patterns. The prediction method first computes a number of representative well-packed conformations for each pair of contacting TM helices, and then a library of tertiary folds is generated by overlaying overlapping TM helices of the representative conformations. This library is scored using sequence conservation patterns, and a subsequent clustering analysis yields five final models. Assuming that neighboring TM helices in the sequence contact each other (but not that TM helices A and G contact each other), the method produced structural models of Calpha atom root-mean-square deviation (CA RMSD) of 3-5 A from corresponding crystal structures for bacteriorhodopsin, halorhodopsin, sensory rhodopsin II, and rhodopsin. In blind predictions, this type of contact knowledge is not available. Mimicking this, predictions were made for the rotor of the V-type Na(+)-adenosine triphosphatase without such knowledge. The CA RMSD between the best model and its crystal structure is only 3.4 A, and its contact accuracy reaches 55%. Furthermore, the model correctly identifies the binding pocket for sodium ion. These results demonstrate that the method can be readily applied to ab initio structure prediction of simple TM helix bundle proteins having modest numbers of TM helices.

  2. Septal localization by membrane targeting sequences and a conserved sequence essential for activity at the COOH-terminus of Bacillus subtilis cardiolipin synthase.

    PubMed

    Kusaka, Jin; Shuto, Satoshi; Imai, Yukiko; Ishikawa, Kazuki; Saito, Tomo; Natori, Kohei; Matsuoka, Satoshi; Hara, Hiroshi; Matsumoto, Kouji

    2016-04-01

    The acidic phospholipid cardiolipin (CL) is localized on polar and septal membranes and plays an important physiological role in Bacillus subtilis cells. ClsA, the enzyme responsible for CL synthesis, is also localized on septal membranes. We found that GFP fusion proteins of the enzyme with NH2-terminal and internal deletions retained septal localization. However, derivatives with deletions starting from the COOH-terminus (Leu482) ceased to localize to the septum once the deletion passed the Ile residue at 448, indicating that the sequence responsible for septal localization is confined within a short distance from the COOH-terminus. Two sequences, Ile436-Leu450 and Leu466-Leu478, are predicted to individually form an amphipathic α-helix. This configuration is known as a membrane targeting sequence (MTS) and we therefore refer to them as MTS2 and MTS1, respectively. Either one has the ability to affect septal localization, and each of these sequences by itself localizes to the septum. Membrane association of the constructs of this enzyme containing the MTSs was verified by subcellular fractionation of the cells. CL synthesis, in contrast, was abolished after deleting just the last residue, Leu482, in the COOH-terminal four amino acid residue sequence, Ser-Pro-Ile-Leu, which is highly conserved among bacterial CL synthases.

  3. The complete mitochondrial genome sequence of the tubeworm Lamellibrachia satsuma and structural conservation in the mitochondrial genome control regions of Order Sabellida.

    PubMed

    Patra, Ajit Kumar; Kwon, Yong Min; Kang, Sung Gyun; Fujiwara, Yoshihiro; Kim, Sang-Jin

    2016-04-01

    The control region of the mitochondrial genomes shows high variation in conserved sequence organizations, which follow distinct evolutionary patterns in different species or taxa. In this study, we sequenced the complete mitochondrial genome of Lamellibrachia satsuma from the cold-seep region of Kagoshima Bay, as a part of whole genome study and extensively studied the structural features and patterns of the control region sequences. We obtained 15,037 bp of mitochondrial genome using Illumina sequencing and identified the non-coding AT-rich region or control region (354 bp, AT=83.9%) located between trnH and trnR. We found 7 conserved sequence blocks (CSB), scattered throughout the control region of L. satsuma and other taxa of Annelida. The poly-TA stretches, which commonly form the stem of multiple stem-loop structures, are most conserved in the CSB-I and CSB-II regions. The mitochondrial genome of L. satsuma encodes a unique repetitive sequence in the control region, which forms a unique secondary structure in comparison to Lamellibrachia luymesi. Phylogenetic analyses of all protein-coding genes indicate that L. satsuma forms a monophyletic clade with L. luymesi along with other tubeworms found in cold-seep regions (genera: Lamellibrachia, Escarpia, and Seepiophila). In general, the control region sequences of Annelida could be aligned with certainty within each genus, and to some extent within the family, but with a higher rate of variation in conserved regions.

  4. Computational analysis of conserved coil functional residues in the mitochondrial genomic sequences of dermatophytes

    PubMed Central

    Gupta, Bulbul; Kaur, Jaspreet

    2016-01-01

    Dermatophyte is a group of closely related fungi that have the capacity to invade keratinized tissue of humans and other animals. The infection known as dermatophytosis, caused by members of the genera Microsporum, Trichophyton, and Epidermophyton includes infection to the groin (tinea cruris), beard (tinea barbae), scalp (tinea capitis), feet (tinea pedis), glabrous skin (tinea corporis), nail (tinea unguium), and hand (tinea manuum). The identification of evolutionary relationship between these three genera of dermatophyte is epidemiologically important to understand their pathogenicity. Mitochondrial DNA evolves more rapidly than a nuclear DNA due to higher rate of mutation but is very less affected by genetic recombination, making it an important tool for phylogenetic studies. Thus, here we present a novel scheme to identify the conserved coil functional residues of Trichophyton rubrum, Trichophyton mentagrophytes, Epidermophyton floccosum and Microsporum canis. Protein coding sequences of the mitochondrial genome were aligned for their similar sequences and homology modelling was performed for structure and pocket identification. The results obtained from comparative analysis of the protein sequences revealed the presence of functionally active sites in all the species of the genera Trichophyton and Microsporum. However in Epidermophyton floccosum it was observed in three protein sequences of the five studied. The absence of these conserved coil functional residues in E. floccusum may be correlated with lesser infectivity of this organism. The functional residues identified in the present study could be responsible for the disease and thus can act as putative target sites for drug designing. PMID:28149055

  5. Caraparu virus (group C Orthobunyavirus): sequencing and phylogenetic analysis based on the conserved region 3 of the RNA polymerase gene.

    PubMed

    de Brito Magalhães, Cintia Lopes; Quinan, Bárbara Resende; Novaes, Renata Franco Vianna; dos Santos, João Rodrigues; Kroon, Erna Geessien; Bonjardim, Cláudio Antônio; Ferreira, Paulo César Peregrino

    2007-12-01

    Here, for the first time, we report the nucleotide sequence of Caraparu virus (CARV) L segment and the analysis of the RNA polymerase region 3 encoded by this segment. The 1,404 bp nucleotide sequence shares the highest identity with Bunyamwera, La Crosse, Oropouche, and Akabane virus sequences. The amino acid sequence was deduced and aligned with sequences from members of the Bunyaviridae family and used for phylogenetic analysis. The CARV clustered in the Orthobunyavirus genus. The premotif A and motifs A-E are present in the region 3 of the Bunyaviridae family, were also conserved in CARV L protein, as well as other conserved regions among Orthobunyavirus genus.

  6. Massive microRNA sequence conservation and prevalence in human and chimpanzee introns.

    PubMed

    Hill, Aubrey E; Sorscher, Eric J

    2013-06-01

    Human and chimpanzee introns contain numerous sequences strongly related to known microRNA hairpin structures. The relative frequency is precisely maintained across all chromosomes, suggesting the possible co-evolution of gene networks dependent upon microRNA regulation and with origins corresponding to the advent of primate transposable elements (TEs). While the motifs are known to be derived from transposable elements, the most common are far more numerous than expected from the number of TEs and their paralogous sequences, and exhibit striking conservation in comparison to the surrounding TE sequence context. Several of these motifs also exhibit structural complimentarity to each other, suggesting a pairing function at the level of DNA or RNA. These "pseudomicroRNAs," in semblance to pseudogenes, include hundreds of thousands of vestigial paralogs of primate microRNAs, many of which may have functioned historically or remain active today.

  7. GC Content Heterogeneity Transition of Conserved Noncoding Sequences Occurred at the Emergence of Vertebrates

    PubMed Central

    Hettiarachchi, Nilmini; Saitou, Naruya

    2016-01-01

    Conserved non-coding sequences (CNSs) of Eukaryotes are known to be significantly enriched in regulatory sequences. CNSs of diverse lineages follow different patterns in abundance, sequence composition, and location. Here, we report a thorough analysis of CNSs in diverse groups of Eukaryotes with respect to GC content heterogeneity. We examined 24 fungi, 19 invertebrates, and 12 non-mammalian vertebrates so as to find lineage specific features of CNSs. We found that fungi and invertebrate CNSs are predominantly GC rich as in plants we previously observed, whereas vertebrate CNSs are GC poor. This result suggests that the CNS GC content transition occurred from the ancestral GC rich state of Eukaryotes to GC poor in the vertebrate lineage due to the enrollment of GC poor transcription factor binding sites that are lineage specific. CNS GC content is closely linked with the nucleosome occupancy that determines the location and structural architecture of DNAs. PMID:28040773

  8. Bacillus anthracis pXO1 Plasmid Sequence Conservation among Closely Related Bacterial Species

    PubMed Central

    Pannucci, James; Okinaka, Richard T.; Sabin, Robert; Kuske, Cheryl R.

    2002-01-01

    The complete sequencing and annotation of the 181.7-kb Bacillus anthracis virulence plasmid pXO1 predicted 143 genes but could only assign putative functions to 45. Hybridization assays, PCR amplification, and DNA sequencing were used to determine whether pXO1 open reading frame (ORF) sequences were present in other bacilli and more distantly related bacterial genera. Eighteen Bacillus species isolates and four other bacterial species were tested for the presence of 106 pXO1 ORFs. Three ORFs were conserved in most of the bacteria tested. Many of the pXO1 ORFs were detected in closely related Bacillus species, and some were detected only in B. anthracis isolates. Three isolates, Bacillus cereus D-17, B. cereus 43881, and Bacillus thuringiensis 33679, contained sequences that were similar to more than one-half of the pXO1 ORF sequences examined. The majority of the DNA fragments that were amplified by PCR from these organisms had DNA sequences between 80 and 98% similar to that of pXO1. Pulsed-field gel electrophoresis revealed large potential plasmids present in both B. cereus 43881 (341 kb) and B. thuringiensis ATCC 33679 (327 kb) that hybridized with a DNA probe composed of six pXO1 ORFs. PMID:11741853

  9. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    PubMed

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  10. Reptiles and Mammals Have Differentially Retained Long Conserved Noncoding Sequences from the Amniote Ancestor

    PubMed Central

    Janes, D.E.; Chapus, C.; Gondo, Y.; Clayton, D.F.; Sinha, S.; Blatti, C.A.; Organ, C.L.; Fujita, M.K.; Balakrishnan, C.N.; Edwards, S.V.

    2011-01-01

    Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation. PMID:21183607

  11. Reptiles and mammals have differentially retained long conserved noncoding sequences from the amniote ancestor.

    PubMed

    Janes, D E; Chapus, C; Gondo, Y; Clayton, D F; Sinha, S; Blatti, C A; Organ, C L; Fujita, M K; Balakrishnan, C N; Edwards, S V

    2011-01-01

    Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation.

  12. Protein engineering of selected residues from conserved sequence regions of a novel Anoxybacillus α-amylase

    PubMed Central

    Ranjani, Velayudhan; Janeček, Štefan; Chai, Kian Piaw; Shahir, Shafinaz; Rahman, Raja Noor Zaliha Raja Abdul; Chan, Kok-Gan; Goh, Kian Mau

    2014-01-01

    The α-amylases from Anoxybacillus species (ASKA and ADTA), Bacillus aquimaris (BaqA) and Geobacillus thermoleovorans (GTA, Pizzo and GtamyII) were proposed as a novel group of the α-amylase family GH13. An ASKA yielding a high percentage of maltose upon its reaction on starch was chosen as a model to study the residues responsible for the biochemical properties. Four residues from conserved sequence regions (CSRs) were thus selected, and the mutants F113V (CSR-I), Y187F and L189I (CSR-II) and A161D (CSR-V) were characterised. Few changes in the optimum reaction temperature and pH were observed for all mutants. Whereas the Y187F (t1/2 43 h) and L189I (t1/2 36 h) mutants had a lower thermostability at 65°C than the native ASKA (t1/2 48 h), the mutants F113V and A161D exhibited an improved t1/2 of 51 h and 53 h, respectively. Among the mutants, only the A161D had a specific activity, kcat and kcat/Km higher (1.23-, 1.17- and 2.88-times, respectively) than the values determined for the ASKA. The replacement of the Ala-161 in the CSR-V with an aspartic acid also caused a significant reduction in the ratio of maltose formed. This finding suggests the Ala-161 may contribute to the high maltose production of the ASKA. PMID:25069018

  13. A functional survey of the enhancer activity of conserved non-coding sequences from vertebrate Iroquois cluster gene deserts

    PubMed Central

    de la Calle-Mustienes, Elisa; Feijóo, Cármen Gloria; Manzanares, Miguel; Tena, Juan J.; Rodríguez-Seguel, Elisa; Letizia, Annalisa; Allende, Miguel L.; Gómez-Skarmeta, José Luis

    2005-01-01

    Recent studies of the genome architecture of vertebrates have uncovered two unforeseen aspects of its organization. First, large regions of the genome, called gene deserts, are devoid of protein-coding sequences and have no obvious biological role. Second, comparative genomics has highlighted the existence of an array of highly conserved non-coding regions (HCNRs) in all vertebrates. Most surprisingly, these structural features are strongly associated with genes that have essential functions during development. Among these, the vertebrate Iroquois (Irx) genes stand out on both fronts. Mammalian Irx genes are organized in two clusters (IrxA and IrxB) that span >1 Mb each with no other genes interspersed. Additionally, a large number of HCNRs exist within Irx clusters. We have systematically examined the enhancer activity of HCNRs from the IrxB cluster using transgenic Xenopus and zebrafish embryos. Most of these HCNRs are active in subdomains of endogenous Irx expression, and some are candidates to contain shared enhancers of neighboring genes, which could explain the evolutionary conservation of Irx clusters. Furthermore, HCNRs present in tetrapod IrxB but not in fish may be responsible for novel Irx expression domains that appeared after their divergence. Finally, we have performed a more detailed analysis on two IrxB ultraconserved non-coding regions (UCRs) duplicated in IrxA clusters in similar relative positions. These four regions share a core region highly conserved among all of them and drive expression in similar domains. However, inter-species conserved sequences surrounding the core, specific for each of these UCRs, are able to modulate their expression. PMID:16024824

  14. 76 FR 82075 - Highly Erodible Land and Wetland Conservation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-30

    ... Secretary 7 CFR Part 12 RIN 0560-AH97 Highly Erodible Land and Wetland Conservation AGENCY: Office of the... agricultural commodities are planted on highly erodible land or a converted wetland, or the production of... ``good faith'' provisions in the USDA regulations allow violators of highly erodible land...

  15. Conservation of the sizes of 53 introns and over 100 intronic sequences for the binding of common transcription factors in the human and mouse genes for type II procollagen (COL2A1).

    PubMed Central

    Ala-Kokko, L; Kvist, A P; Metsäranta, M; Kivirikko, K I; de Crombrugghe, B; Prockop, D J; Vuorio, E

    1995-01-01

    Over 11,000 bp of previously undefined sequences of the human COL2A1 gene were defined. The results made it possible to compare the intron structures of a highly complex gene from man and mouse. Surprisingly, the sizes of the 53 introns of the two genes were highly conserved with a mean difference of 13%. After alignment of the sequences, 69% of the intron sequences were identical. The introns contained consensus sequences for the binding of over 100 different transcription factors that were conserved in the introns of the two genes. The first intron of the gene contained 80 conserved consensus sequences and the remaining 52 introns of the gene contained 106 conserved sequences for the binding of transcription factors. The 5'-end of intron 2 in both genes had a potential for forming a stem loop in RNA transcripts. Images Figure 4 PMID:8948452

  16. Lack of evidence of conserved lentiviral sequences in pigs with post weaning multisystemic wasting syndrome.

    PubMed Central

    Bratanich, A; Lairmore, M; Heneine, W; Konoby, C; Harding, J; West, K; Vasquez, G; Allan, G; Ellis, J

    1999-01-01

    In order to investigate the role of retroviruses in the recently described porcine postweaning multisystemic wasting syndrome (PMWS) serum and leukocytes were screened for reverse transcriptase (RT) activity, and tissues were examined for the presence of conserved lentiviral sequences using degenerate primers in a polymerase chain reaction (PCR). Serum and stimulated leukocytes from the blood and lymph nodes from pigs with PMWS, as well as from control pigs had RT activity that was detected by the sensitive Amp-RT assay. A 257-bp fragment was amplified from DNA from the blood and bone marrow of pigs with PMWS. This fragment was identical in size to conserved lentiviral sequences that were amplified from plasmids containing DNA from several lentiviruses. Cloning and sequencing of the fragment from affected pigs, however, did not reveal homology with the recognized lentiviruses. Together the results of these analyses suggest that the RT activity present in tissues from control and affected pigs is the result of endogenous retrovirus expression, and that a lentivirus is not a primary pathogen in PMWS. Images Figure 1. Figure 2. PMID:10480463

  17. Identification of evolutionarily conserved non-AUG-initiated N-terminal extensions in human coding sequences

    PubMed Central

    Ivanov, Ivaylo P.; Firth, Andrew E.; Michel, Audrey M.; Atkins, John F.; Baranov, Pavel V.

    2011-01-01

    In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5′ cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized—both for increased coding capacity and potentially also for novel regulatory mechanisms—remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5′ untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data. PMID:21266472

  18. Modular Architecture of Metabolic Pathways Revealed by Conserved Sequences of Reactions

    PubMed Central

    2013-01-01

    The metabolic network is both a network of chemical reactions and a network of enzymes that catalyze reactions. Toward better understanding of this duality in the evolution of the metabolic network, we developed a method to extract conserved sequences of reactions called reaction modules from the analysis of chemical compound structure transformation patterns in all known metabolic pathways stored in the KEGG PATHWAY database. The extracted reaction modules are repeatedly used as if they are building blocks of the metabolic network and contain chemical logic of organic reactions. Furthermore, the reaction modules often correspond to traditional pathway modules defined as sets of enzymes in the KEGG MODULE database and sometimes to operon-like gene clusters in prokaryotic genomes. We identified well-conserved, possibly ancient, reaction modules involving 2-oxocarboxylic acids. The chain extension module that appears as the tricarboxylic acid (TCA) reaction sequence in the TCA cycle is now shown to be used in other pathways together with different types of modification modules. We also identified reaction modules and their connection patterns for aromatic ring cleavages in microbial biodegradation pathways, which are most characteristic in terms of both distinct reaction sequences and distinct gene clusters. The modular architecture of biodegradation modules will have a potential for predicting degradation pathways of xenobiotic compounds. The collection of these and many other reaction modules is made available as part of the KEGG database. PMID:23384306

  19. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.

    PubMed

    Ma, Xin; Guo, Jing; Liu, Hong-De; Xie, Jian-Ming; Sun, Xiao

    2012-01-01

    The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew’s correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.

  20. Structure-Related Roles for the Conservation of the HIV-1 Fusion Peptide Sequence Revealed by Nuclear Magnetic Resonance.

    PubMed

    Serrano, Soraya; Huarte, Nerea; Rujas, Edurne; Andreu, David; Nieva, José L; Jiménez, María Angeles

    2017-09-29

    Despite extensive characterization of the human immunodeficiency virus type 1 (HIV-1) hydrophobic fusion peptide (FP), the structure-function relationships underlying its extraordinary degree of conservation remain poorly understood. Specifically, the fact that the tandem repeat of the FLGFLG tripeptide is absolutely conserved suggests that high hydrophobicity may not suffice to unleash FP function. Here, we have compared the nuclear magnetic resonance (NMR) structures adopted in nonpolar media by two FP surrogates, wtFP-tag and scrFP-tag, which had equal hydrophobicity but contained wild-type and scrambled core sequences LFLGFLG and FGLLGFL, respectively. In addition, these peptides were tagged at their C-termini with an epitope sequence that folded independently, thereby allowing Western blot detection without interfering with FP structure. We observed similar α-helical FP conformations for both specimens dissolved in the low-polarity medium 25% (v/v) 1,1,1,3,3,3-hexafluoro-2-propanol (HFIP), but important differences in contact with micelles of the membrane mimetic dodecylphosphocholine (DPC). Thus, whereas wtFP-tag preserved a helix displaying a Gly-rich ridge, the scrambled sequence lost in great part the helical structure upon being solubilized in DPC. Western blot analyses further revealed the capacity of wtFP-tag to assemble trimers in membranes, whereas membrane oligomers were not observed in the case of the scrFP-tag sequence. We conclude that, beyond hydrophobicity, preserving sequence order is an important feature for defining the secondary structures and oligomeric states adopted by the HIV FP in membranes.

  1. A Collection of Conserved Noncoding Sequences to Study Gene Regulation in Flowering Plants1[OPEN

    PubMed Central

    2016-01-01

    Transcription factors (TFs) regulate gene expression by binding cis-regulatory elements, of which the identification remains an ongoing challenge owing to the prevalence of large numbers of nonfunctional TF binding sites. Powerful comparative genomics methods, such as phylogenetic footprinting, can be used for the detection of conserved noncoding sequences (CNSs), which are functionally constrained and can greatly help in reducing the number of false-positive elements. In this study, we applied a phylogenetic footprinting approach for the identification of CNSs in 10 dicot plants, yielding 1,032,291 CNSs associated with 243,187 genes. To annotate CNSs with TF binding sites, we made use of binding site information for 642 TFs originating from 35 TF families in Arabidopsis (Arabidopsis thaliana). In three species, the identified CNSs were evaluated using TF chromatin immunoprecipitation sequencing data, resulting in significant overlap for the majority of data sets. To identify ultraconserved CNSs, we included genomes of additional plant families and identified 715 binding sites for 501 genes conserved in dicots, monocots, mosses, and green algae. Additionally, we found that genes that are part of conserved mini-regulons have a higher coherence in their expression profile than other divergent gene pairs. All identified CNSs were integrated in the PLAZA 3.0 Dicots comparative genomics platform (http://bioinformatics.psb.ugent.be/plaza/versions/plaza_v3_dicots/) together with new functionalities facilitating the exploration of conserved cis-regulatory elements and their associated genes. The availability of this data set in a user-friendly platform enables the exploration of functional noncoding DNA to study gene regulation in a variety of plant species, including crops. PMID:27261064

  2. Conservation.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    This set of teaching aids consists of seven Audubon Nature Bulletins, providing the teacher and student with informational reading on various topics in conservation. The bulletins have these titles: Plants as Makers of Soil, Water Pollution Control, The Ground Water Table, Conservation--To Keep This Earth Habitable, Our Threatened Air Supply,…

  3. Conservation.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    This set of teaching aids consists of seven Audubon Nature Bulletins, providing the teacher and student with informational reading on various topics in conservation. The bulletins have these titles: Plants as Makers of Soil, Water Pollution Control, The Ground Water Table, Conservation--To Keep This Earth Habitable, Our Threatened Air Supply,…

  4. Nucleotide sequence of the capsid protein gene of two serotypes of San Miguel sea lion virus: identification of conserved and non-conserved amino acid sequences among calicivirus capsid proteins.

    PubMed

    Neill, J D

    1992-07-01

    The San Miguel sea lion viruses, members of the calicivirus family, are closely related to the vesicular disease of swine viruses which can cause severe disease in swine. In order to begin the molecular characterization of these viruses, the nucleotide sequence of the capsid protein gene of two San Miguel sea lion viruses (SMSV), serotypes 1 and 4, was determined. The coding sequences for the capsid precursor protein were located within the 3' terminal 2620 bases of the genomic RNAs of both viruses. The encoded capsid precursor proteins were 79,500 and 77,634 Da for SMSV 1 and SMSV 4, respectively. The SMSV 1 protein was 47.7% and SMSV 4 was 48.6% homologous to the feline calicivirus (FCV) capsid precursor protein while the two SMSV capsid precursors were 73% homologous to each other. Six distinct regions within the capsid precursors (denoted as regions A-F) were identified based on amino acid sequence alignment analysis of the two SMSV serotypes with FCV and the rabbit hemorrhagic disease virus (RHDV) capsid protein. Three regions showed similarity among all four viruses (regions B, D and F) and one region showed a very high degree of homology between the SMSV serotypes but only limited similarity with FCV (region A). RHDV contained only a truncated region A. A fifth region, consisting of approximately 100 residues, was not conserved among any of the viruses (region E) and, in SMSV, may contain the serotype-specific determinants. Another small region (region C) contained between 15 and 27 amino acids and showed little sequence conservation. Region B showed the highest degree of conservation among the four viruses and contained the residues which had homology to the picornavirus VP3 structural protein. An open reading frame, found in the 3' terminal 514 bases of the SMSV genomes, encoded small proteins (12,575 and 12,522 Da, respectively for SMSV 1 and SMSV 4) of which 32% of the conserved amino acids were basic residues, implying a possible nucleic acid

  5. Nullomers and High Order Nullomers in Genomic Sequences

    PubMed Central

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon

  6. Protein E of Haemophilus influenzae is a ubiquitous highly conserved adhesin.

    PubMed

    Singh, Birendra; Brant, Marta; Kilian, Mogens; Hallström, Björn; Riesbeck, Kristian

    2010-02-01

    Protein E (PE) of nontypeable Haemophilus influenzae (NTHi) is involved in adhesion and activation of epithelial cells. A total of 186 clinical NTHi isolates, encapsulated H. influenzae, and culture collection strains were analyzed. PE was highly conserved in both NTHi and encapsulated H. influenzae (96.9%-100% identity without the signal peptide). PE also existed in other members of the genus Pasteurellaceae. The epithelial cell binding region (amino acids 84-108) was completely conserved. Phylogenetic analysis of the pe sequence separated Haemophilus species into 2 separate clusters. Importantly, PE was expressed in 98.4% of all NTHi (126 isolates) independently of the growth phase.

  7. Structural analysis of the regulatory elements of the type-II procollagen gene. Conservation of promoter and first intron sequences between human and mouse.

    PubMed Central

    Vikkula, M; Metsäranta, M; Syvänen, A C; Ala-Kokko, L; Vuorio, E; Peltonen, L

    1992-01-01

    Transcription of the type-II procollagen gene (COL2A1) is very specifically restricted to a limited number of tissues, particularly cartilages. In order to identify transcription-control motifs we have sequenced the promoter region and the first intron of the human and mouse COL2A1 genes. With the assumption that these motifs should be well conserved during evolution, we have searched for potential elements important for the tissue-specific transcription of the COL2A1 gene by aligning the two sequences with each other and with the available rat type-II procollagen sequence for the promoter. With this approach we could identify specific evolutionarily well-conserved motifs in the promoter area. On the other hand, several suggested regulatory elements in the promoter region did not show evolutionary conservation. In the middle of the first intron we found a cluster of well-conserved transcription-control elements and we conclude that these conserved motifs most probably possess a significant function in the control of the tissue-specific transcription of the COL2A1 gene. We also describe locations of additional, highly conserved nucleotide stretches, which are good candidate regions in the search for binding sites of yet-uncharacterized cartilage-specific transcription regulators of the COL2A1 gene. PMID:1637314

  8. Expression of cassini, a murine gamma-satellite sequence conserved in evolution, is regulated in normal and malignant hematopoietic cells.

    PubMed

    Arutyunyan, Anna; Stoddart, Sonia; Yi, Sun-ju; Fei, Fei; Lim, Min; Groffen, Paula; Feldhahn, Niklas; Groffen, John; Heisterkamp, Nora

    2012-08-23

    Acute lymphoblastic leukemia (ALL) cells treated with drugs can become drug-tolerant if co-cultured with protective stromal mouse embryonic fibroblasts (MEFs). We performed transcriptional profiling on these stromal fibroblasts to investigate if they were affected by the presence of drug-treated ALL cells. These mitotically inactivated MEFs showed few changes in gene expression, but a family of sequences of which transcription is significantly increased was identified. A sequence related to this family, which we named cassini, was selected for further characterization. We found that cassini was highly upregulated in drug-treated ALL cells. Analysis of RNAs from different normal mouse tissues showed that cassini expression is highest in spleen and thymus, and can be further enhanced in these organs by exposure of mice to bacterial endotoxin. Heat shock, but not other types of stress, significantly induced the transcription of this locus in ALL cells. Transient overexpression of cassini in human 293 embryonic kidney cells did not increase the cytotoxic or cytostatic effects of chemotherapeutic drugs but provided some protection. Database searches revealed that sequences highly homologous to cassini are present in rodents, apicomplexans, flatworms and primates, indicating that they are conserved in evolution. Moreover, CASSINI RNA was induced in human ALL cells treated with vincristine. Surprisingly, cassini belongs to the previously reported murine family of γ-satellite/major satellite DNA sequences, which were not known to be present in other species. Our results show that the transcription of at least one member of these sequences is regulated, suggesting that this has a function in normal and transformed immune cells. Expression of these sequences may protect cells when they are exposed to specific stress stimuli.

  9. Expression of cassini, a murine gamma-satellite sequence conserved in evolution, is regulated in normal and malignant hematopoietic cells

    PubMed Central

    2012-01-01

    Background Acute lymphoblastic leukemia (ALL) cells treated with drugs can become drug-tolerant if co-cultured with protective stromal mouse embryonic fibroblasts (MEFs). Results We performed transcriptional profiling on these stromal fibroblasts to investigate if they were affected by the presence of drug-treated ALL cells. These mitotically inactivated MEFs showed few changes in gene expression, but a family of sequences of which transcription is significantly increased was identified. A sequence related to this family, which we named cassini, was selected for further characterization. We found that cassini was highly upregulated in drug-treated ALL cells. Analysis of RNAs from different normal mouse tissues showed that cassini expression is highest in spleen and thymus, and can be further enhanced in these organs by exposure of mice to bacterial endotoxin. Heat shock, but not other types of stress, significantly induced the transcription of this locus in ALL cells. Transient overexpression of cassini in human 293 embryonic kidney cells did not increase the cytotoxic or cytostatic effects of chemotherapeutic drugs but provided some protection. Database searches revealed that sequences highly homologous to cassini are present in rodents, apicomplexans, flatworms and primates, indicating that they are conserved in evolution. Moreover, CASSINI RNA was induced in human ALL cells treated with vincristine. Surprisingly, cassini belongs to the previously reported murine family of γ-satellite/major satellite DNA sequences, which were not known to be present in other species. Conclusions Our results show that the transcription of at least one member of these sequences is regulated, suggesting that this has a function in normal and transformed immune cells. Expression of these sequences may protect cells when they are exposed to specific stress stimuli. PMID:22916712

  10. Processing of yeast mitochondrial messenger RNAs at a conserved dodecamer sequence.

    PubMed Central

    Osinga, K A; De Vries, E; Van der Horst, G; Tabak, H F

    1984-01-01

    The yeast mitochondrial genes coding for cytochrome c oxidase subunit I ( COX1 ) and the ATPase subunits 8 and 6 are organized in one transcription unit. Precise mapping of RNA termini with S1 nuclease and primer extension analysis shows that the 3' end of the COX1 mRNA and the 5' end of the ATPase precursor RNA are juxtaposed within a conserved dodecamer sequence (5'- AAUAAUAUUCUU -3'). Sequence comparison reveals that this motif is present downstream of nearly all protein-encoding genes, including extragenic unassigned reading frames ( URFs ) and two URFs located within introns. Also the 3' terminus of an RNA species derived from the URF -containing intron of the large rRNA gene maps within such a dodecamer sequence. It is likely, therefore, that this motif serves as a processing point in the generation of mature mRNA. From a comparison of the various transcription units, we infer that RNAs that originate from an endonucleolytic cleavage at this sequence have stable 3' termini, while further processing of the 5' ends occurs. The efficiency of the initial cleavage varies between the different positions at which the motif is present. Images Fig. 1. Fig. 2. Fig. 3. PMID:6327291

  11. Yeast general transcription factor GFI: sequence requirements for binding to DNA and evolutionary conservation.

    PubMed Central

    Dorsman, J C; van Heeswijk, W C; Grivell, L A

    1990-01-01

    GFI is an abundant DNA binding protein in the yeast S. cerevisiae. The protein binds to specific sequences in both ARS elements and the upstream regions of a large number of genes and is likely to play an important role in yeast cell growth. To get insight into the relative strength of the various GFI-DNA binding sites within the yeast genome, we have determined dissociation rates for several GFI-DNA complexes and found them to vary over a 70-fold range. Strong binding sites for GFI are present in the upstream activating sequences of the gene encoding the 40 kDa subunit II of the QH2:cytochrome c reductase, the gene encoding ribosomal protein S33 and in the intron of the actin gene. The binding site in the ARS1-TRP1 region is of intermediate strength. All strong binding sites conform to the sequence 5' RTCRYYYNNNACG-3'. Modification interference experiments and studies with mutant binding sites indicate that critical bases for GFI recognition are within the two elements of the consensus DNA recognition sequence. Proteins with the DNA binding specificities of GFI and GFII can also be detected in the yeast K. lactis, suggesting evolutionary conservation of at least the respective DNA-binding domains in both yeasts. Images PMID:2187179

  12. Conserved sequence motifs among bacterial, eukaryotic, and archaeal phosphatases that define a new phosphohydrolase superfamily.

    PubMed Central

    Thaller, M. C.; Schippa, S.; Rossolini, G. M.

    1998-01-01

    Members of a new molecular family of bacterial nonspecific acid phosphatases (NSAPs), indicated as class C, were found to share significant sequence similarities to bacterial class B NSAPs and to some plant acid phosphatases, representing the first example of a family of bacterial NSAPs that has a relatively close eukaryotic counterpart. Despite the lack of an overall similarity, conserved sequence motifs were also identified among the above enzyme families (class B and class C bacterial NSAPs, and related plant phosphatases) and several other families of phosphohydrolases, including bacterial phosphoglycolate phosphatases, histidinol-phosphatase domains of the bacterial bifunctional enzymes imidazole-glycerolphosphate dehydratases, and bacterial, eukaryotic, and archaeal phosphoserine phosphatases and threalose-6-phosphatases. These conserved motifs are clustered within two domains, separated by a variable spacer region, according to the pattern [FILMAVT]-D-[ILFRMVY]-D-[GSNDE]-[TV]-[ILVAM]-[AT S VILMC]-X-¿YFWHKR)-X-¿YFWHNQ¿-X( 102,191)-¿KRHNQ¿-G-D-¿FYWHILVMC¿-¿QNH¿-¿FWYGP¿-D -¿PSNQYW¿. The dephosphorylating activity common to all these proteins supports the definition of this phosphatase motif and the inclusion of these enzymes into a superfamily of phosphohydrolases that we propose to indicate as "DDDD" after the presence of the four invariant aspartate residues. Database searches retrieved various hypothetical proteins of unknown function containing this or similar motifs, for which a phosphohydrolase activity could be hypothesized. PMID:9684901

  13. Sequence of Radiotherapy and Chemotherapy in Breast Cancer After Breast-Conserving Surgery

    SciTech Connect

    Jobsen, Jan J.; Palen, Job van der; Brinkhuis, Marieel; Ong, Francisca; Struikmans, Henk

    2012-04-01

    Purpose: The optimal sequence of radiotherapy and chemotherapy in breast-conserving therapy is unknown. Methods and Materials: From 1983 through 2007, a total of 641 patients with 653 instances of breast-conserving therapy (BCT), received both chemotherapy and radiotherapy and are the basis of this analysis. Patients were divided into three groups. Groups A and B comprised patients treated before 2005, Group A radiotherapy first and Group B chemotherapy first. Group C consisted of patients treated from 2005 onward, when we had a fixed sequence of radiotherapy first, followed by chemotherapy. Results: Local control did not show any differences among the three groups. For distant metastasis, no difference was shown between Groups A and B. Group C, when compared with Group A, showed, on univariate and multivariate analyses, a significantly better distant metastasis-free survival. The same was noted for disease-free survival. With respect to disease-specific survival, no differences were shown on multivariate analysis among the three groups. Conclusion: Radiotherapy, as an integral part of the primary treatment of BCT, should be administered first, followed by adjuvant chemotherapy.

  14. Heterogeneous tempo and mode of conserved noncoding sequence evolution among four mammalian orders.

    PubMed

    Babarinde, Isaac Adeyemi; Saitou, Naruya

    2013-01-01

    Conserved noncoding sequences (CNSs) of vertebrates are considered to be closely linked with protein-coding gene regulatory functions. We examined the abundance and genomic distribution of CNSs in four mammalian orders: primates, rodents, carnivores, and cetartiodactyls. We defined the two thresholds for CNS using conservation level of coding genes; using all the three coding positions and using only first and second codon positions. The abundance of CNSs varied among lineages, with primates and rodents having highest and lowest number of CNSs, respectively, whereas carnivores and cetartiodactyls had intermediate values. These CNSs cover 1.3-5.5% of the mammalian genomes and have signatures of selective constraints that are stronger in more ancestral than the recent ones. Evolution of new CNSs as well as retention of ancestral CNSs contribute to the differences in abundance. The genomic distribution of CNSs is dynamic with higher proportions of rodent and primate CNSs located in the introns compared with carnivores and cetartiodactyls. In fact, 19% of orthologous single-copy CNSs between human and dog are located in different genomic regions. If CNSs can be considered as candidates of gene expression regulatory sequences, heterogeneity of CNSs among the four mammalian orders may have played an important role in creating the order-specific phenotypes. Fewer CNSs in rodents suggest that rodent diversity is related to lower regulatory conservation. With CNSs shown to cluster around genes involved in nervous systems and the higher number of primate CNSs, our result suggests that CNSs may be involved in the higher complexity of the primate nervous system.

  15. Robust high-order space-time conservative schemes for solving conservation laws on hybrid meshes

    NASA Astrophysics Data System (ADS)

    Shen, Hua; Wen, Chih-Yung; Liu, Kaixin; Zhang, Deliang

    2015-01-01

    In this paper, the second-order space-time conservation element and solution element (CE/SE) method proposed by Chang (1995) [3] is implemented on hybrid meshes for solving conservation laws. In addition, the present scheme has been extended to high-order versions including third and fourth order. Most methodologies of proposed schemes are consistent with that of the original CE/SE method, including: (i) a unified treatment of space and time (thereby ensuring good conservation in both space and time); (ii) a highly compact node stencil (the solution node is calculated using only the neighboring mesh nodes) regardless of the order of accuracy at the cost of storing all derivatives. A staggered time marching strategy is adopted and the solutions are updated alternatively between cell centers and vertexes. To construct explicit high-order schemes, second- and third-order derivatives are calculated by a modified finite-difference/weighted-average procedure which is different from that used to calculate the first-order derivatives. The present schemes can be implemented on a wide variety of meshes, including triangular, quadrilateral and hybrid (consisting of both triangular and quadrilateral elements). Beyond that, it can be easily extended to arbitrary-order schemes and arbitrary shape of polygonal elements by using the present methodologies. A series of common benchmark examples are used to confirm the accuracy and robustness of the proposed schemes.

  16. Conservation agriculture in high tunnels: soil health and profit enhancement

    USDA-ARS?s Scientific Manuscript database

    In 2013, through the USDA’s Evans-Allen capacity grant, the high tunnel became an on-farm research laboratory for conservation agriculture. Dr. Manuel R. Reyes, Professor and his research team from the North Carolina Agriculture and Technology State University (NCATSU), Greensboro, North Carolina (1...

  17. Heterochromatin protein 1, a known suppressor of position-effect variegation, is highly conserved in Drosophila.

    PubMed Central

    Clark, R F; Elgin, S C

    1992-01-01

    The Su(var)205 gene of Drosophila melanogaster encodes heterochromatin protein 1 (HP1), a protein located preferentially within beta-heterochromatin. Mutation of this gene has been associated with dominant suppression of position-effect variegation. We have cloned and sequenced the gene encoding HP1 from Drosophila virilis, a distantly related species. Comparison of the predicted amino acid sequence with Drosophila melanogaster HP1 shows two regions of strong homology, one near the N-terminus (57/61 amino acids identical) and the other near the C-terminus (62/68 amino acids identical) of the protein. Little homology is seen in the 5' and 3' untranslated portions of the gene, as well as in the intronic sequences, although intron/exon boundaries are generally conserved. A comparison of the deduced amino acid sequences of HP1-like proteins from other species shows that the cores of the N-terminal and C-terminal domains have been conserved from insects to mammals. The high degree of conservation suggests that these N- and C-terminal domains could interact with other macromolecules in the formation of the condensed structure of heterochromatin. Images PMID:1461737

  18. A conserved sequence in the mouse variable T cell receptor alpha recombination signal sequence 23-bp spacer can affect recombination.

    PubMed

    Probst, Jochen; Blumenthal, Sibylle G; Tenzer, Stefan; Weinschenk, Toni; Dittmer, Jürgen; Schoor, Oliver; Six, Adrien; Rammensee, Hans-Georg; Pascolo, Steve

    2004-08-01

    Although the V-gene segments coding for the TCR alpha and delta chains are mixed together in the alpha delta locus and are recombined by the same processes, some gene segments (TRAV) are rearranged only with TCR Jalpha gene segments, some (TRDV) only with TCR Ddelta gene segments and some (TRADV) with both. To date, no molecular signal is known that can characterize these three different types of gene segments. Studying the recombination signal sequences (RSS) of all mouse TCR V-gene segments we observed that 80% of the TRAV contain a palindrome sequence (CTGCAG) or its related variant CTGTAG in their 23-bp spacer. Using gel-shift assays we show that these sequences are specifically recognized by some nuclear proteins that are expressed by fresh thymocytes, fresh lymphocytes and tumor cells. Recombination assays on plasmid substrates in a pre-B cell line showed that RSS containing the CTGCAG sequence can impair recombination. From the protein fractions containing the CTGCAG-binding activity, three proteins were identified: G3BP1 (a nucleic-acid-binding protein with a proposed helicase activity) and two proteins from the high-mobility group (HMG) family--HMGB2 and HMGB3. We hypothesize that these proteins can affect recombination at the TCR alpha delta locus.

  19. Genome-wide analyses of Epstein-Barr virus reveal conserved RNA structures and a novel stable intronic sequence RNA

    PubMed Central

    2013-01-01

    Background Epstein-Barr virus (EBV) is a human herpesvirus implicated in cancer and autoimmune disorders. Little is known concerning the roles of RNA structure in this important human pathogen. This study provides the first comprehensive genome-wide survey of RNA and RNA structure in EBV. Results Novel EBV RNAs and RNA structures were identified by computational modeling and RNA-Seq analyses of EBV. Scans of the genomic sequences of four EBV strains (EBV-1, EBV-2, GD1, and GD2) and of the closely related Macacine herpesvirus 4 using the RNAz program discovered 265 regions with high probability of forming conserved RNA structures. Secondary structure models are proposed for these regions based on a combination of free energy minimization and comparative sequence analysis. The analysis of RNA-Seq data uncovered the first observation of a stable intronic sequence RNA (sisRNA) in EBV. The abundance of this sisRNA rivals that of the well-known and highly expressed EBV-encoded non-coding RNAs (EBERs). Conclusion This work identifies regions of the EBV genome likely to generate functional RNAs and RNA structures, provides structural models for these regions, and discusses potential functions suggested by the modeled structures. Enhanced understanding of the EBV transcriptome will guide future experimental analyses of the discovered RNAs and RNA structures. PMID:23937650

  20. Comparative genomic analysis of equilibrative nucleoside transporters suggests conserved protein structure despite limited sequence identity.

    PubMed

    Sankar, Narendra; Machado, Jerry; Abdulla, Parween; Hilliker, Arthur J; Coe, Imogen R

    2002-10-15

    Equilibrative nucleoside transporters (ENTs) are a recently characterized and poorly understood group of membrane proteins that are important in the uptake of endogenous nucleosides required for nucleic acid and nucleoside triphosphate synthesis. Despite their central importance in cellular metabolism and nucleoside analog chemotherapy, no human ENT gene has been described and nothing is known about gene structure and function. To gain insight into the ENT gene family, we used experimental and in silico comparative genomic approaches to identify ENT genes in three evolutionarily diverse organisms with completely (or almost completely) sequenced genomes, Homo sapiens, Caenorhabditis elegans and Drosophila melanogaster. We describe the chromosomal location, the predicted ENT gene structure and putative structural topologies of predicted ENT proteins derived from the open reading frames. Despite variations in genomic layout and limited ortholog protein sequence identity (< or =27.45%), predicted topologies of ENT proteins are strikingly similar, suggesting an evolutionary conservation of a prototypic structure. In addition, a similar distribution of protein domains on exons is apparent in all three taxa. These data demonstrate that comparative sequence analyses should be combined with other approaches (such as genomic and proteomic analyses) to fully understand structure, function and evolution of protein families.

  1. Remote homology detection of integral membrane proteins using conserved sequence features.

    PubMed

    Bernsel, Andreas; Viklund, Håkan; Elofsson, Arne

    2008-05-15

    Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/. 2007 Wiley-Liss, Inc.

  2. Evolutionary conservation of sequence and secondary structures inCRISPR repeats

    SciTech Connect

    Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

    2006-09-01

    Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.

  3. Dinoflagellate tandem array gene transcripts are highly conserved and not polycistronic

    PubMed Central

    Beauchemin, Mathieu; Roy, Sougata; Daoust, Philippe; Dagenais-Bellefeuille, Steve; Bertomeu, Thierry; Letourneau, Louis; Lang, B. Franz; Morse, David

    2012-01-01

    Dinoflagellates are an important component of the marine biota, but a large genome with high–copy number (up to 5,000) tandem gene arrays has made genomic sequencing problematic. More importantly, little is known about the expression and conservation of these unusual gene arrays. We assembled de novo a gene catalog of 74,655 contigs for the dinoflagellate Lingulodinium polyedrum from RNA-Seq (Illumina) reads. The catalog contains 93% of a Lingulodinium EST dataset deposited in GenBank and 94% of the enzymes in 16 primary metabolic KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, indicating it is a good representation of the transcriptome. Analysis of the catalog shows a marked underrepresentation of DNA-binding proteins and DNA-binding domains compared with other algae. Despite this, we found no evidence to support the proposal of polycistronic transcription, including a marked underrepresentation of sequences corresponding to the intergenic spacers of two tandem array genes. We also have used RNA-Seq to assess the degree of sequence conservation in tandem array genes and found their transcripts to be highly conserved. Interestingly, some of the sequences in the catalog have only bacterial homologs and are potential candidates for horizontal gene transfer. These presumably were transferred as single-copy genes, and because they are now all GC-rich, any derived from AT-rich contexts must have experienced extensive mutation. Our study not only has provided the most complete dinoflagellate gene catalog known to date, it has also exploited RNA-Seq to address fundamental issues in basic transcription mechanisms and sequence conservation in these algae. PMID:23019363

  4. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  5. The nucleotide sequence of the nitrogen-regulation gene ntrA of Klebsiella pneumoniae and comparison with conserved features in bacterial RNA polymerase sigma factors.

    PubMed Central

    Merrick, M J; Gibbins, J R

    1985-01-01

    The nucleotide sequence of the Klebsiella pneumoniae ntrA gene has been determined. NtrA encodes a 53,926 Dalton acidic polypeptide; a calculated molecular weight which is significantly lower than that determined by SDS polyacrylamide gel analysis. NtrA is followed by another open-reading frame (orf) of at least 75 amino acids. In the spacer region between ntrA and orf there are no apparent transcription termination or promoter sequences and therefore orf may be co-transcribed with ntrA. Previous authors have proposed that NtrA could act as an RNA polymerase sigma factor but the NtrA amino acid sequence does not show a high level of homology to any known sigma factor. However analysis of sequences of five sigma factors from E. coli and B. subtilis has identified two conserved sequences at the C-terminal end of all these polypeptides. These sequences resemble those found in known site-specific DNA-binding domains and may be involved in recognition of conserved -35 and -10 promoter sequences. A similar pair of sequences is present at the C-terminus of NtrA and could play a role in recognition of ntr-activatable promoters. Images PMID:2999700

  6. High-resolution schemes for hyperbolic conservation laws

    NASA Technical Reports Server (NTRS)

    Harten, A.

    1982-01-01

    A class of new explicit second order accurate finite difference schemes for the computation of weak solutions of hyperbolic conservation laws is presented. These highly nonlinear schemes are obtained by applying a nonoscillatory first order accurae scheme to an appropriately modified flux function. The so derived second order accurate schemes achieve high resolution while preserving the robustness of the original nonoscillatory first order accurate scheme.

  7. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  8. Identification of LAG3 high affinity aptamers by HT-SELEX and Conserved Motif Accumulation (CMA).

    PubMed

    Soldevilla, Mario Martínez; Hervas, Sandra; Villanueva, Helena; Lozano, Teresa; Rabal, Obdulia; Oyarzabal, Julen; Lasarte, Juan José; Bendandi, Maurizio; Inoges, Susana; López-Díaz de Cerio, Ascensión; Pastor, Fernando

    2017-01-01

    LAG3 receptor belongs to a family of immune-checkpoints expressed in T lymphocytes and other cells of the immune system. It plays an important role as a rheostat of the immune response. Focus on this receptor as a potential therapeutic target in cancer immunotherapy has been underscored after the success of other immune-checkpoint blockade strategies in clinical trials. LAG3 showcases the interest in the field of autoimmunity as several studies show that LAG3-targeting antibodies can also be used for the treatment of autoimmune diseases. In this work we describe the identification of a high-affinity LAG3 aptamer by High Throughput Sequencing SELEX in combination with a study of potential conserved binding modes according to sequence conservation by using 2D-structure prediction and 3D-RNA modeling using Rosetta. The aptamer with the highest accumulation of these conserved sequence motifs displays the highest affinity to LAG3 recombinant soluble proteins and binds to LAG3-expressing lymphocytes. The aptamer described herein has the potential to be used as a therapeutic agent, as it enhances the threshold of T-cell activation. Nonetheless, in future applications, it could also be engineered for treatment of autoimmune diseases by target depletion of LAG3-effector T lymphocytes.

  9. Sequence conservation, HLA-E-Restricted peptide, and best-defined CTL/CD8+ epitopes in gag P24 (capsid) of HIV-1 subtype B

    NASA Astrophysics Data System (ADS)

    Prasetyo, Afiono Agung; Dharmawan, Ruben; Sari, Yulia; Sariyatun, Ratna

    2017-02-01

    Human immunodeficiency virus type 1 (HIV-1) remains a cause of global health problem. Continuous studies of HIV-1 genetic and immunological profiles are important to find strategies against the virus. This study aimed to conduct analysis of sequence conservation, HLA-E-restricted peptide, and best-defined CTL/CD8+ epitopes in p24 (capsid) of HIV-1 subtype B worldwide. The p24-coding sequences from 3,557 HIV subtype B isolates were aligned using MUSCLE and analysed. Some highly conserved regions (sequence conservation ≥95%) were observed. Two considerably long series of sequences with conservation of 100% was observed at base 349-356 and 550-557 of p24 (HXB2 numbering). The consensus from all aligned isolates was precisely the same as consensus B in the Los Alamos HIV Database. The HLA-E-restricted peptide in amino acid (aa) 14-22 of HIV-1 p24 (AISPRTLNA) was found in 55.9% (1,987/3,557) of HIV-1 subtype B worldwide. Forty-four best-defined CTL/CD8+ epitopes were observed, in which VKNWMTETL epitope (aa 181-189 of p24) restricted by B*4801 was the most frequent, as found in 94.9% of isolates. The results of this study would contribute information about HIV-1 subtype B and benefits for further works willing to develop diagnostic and therapeutic strategies against the virus.

  10. Regulation of SHOOT MERISTEMLESS genes via an upstream-conserved noncoding sequence coordinates leaf development

    PubMed Central

    Uchida, Naoyuki; Townsley, Brad; Chung, Kook-Hyun; Sinha, Neelima

    2007-01-01

    The indeterminate shoot apical meristem of plants is characterized by the expression of the Class 1 KNOTTED1-LIKE HOMEOBOX (KNOX1) genes. KNOX1 genes have been implicated in the acquisition and/or maintenance of meristematic fate. One of the earliest indicators of a switch in fate from indeterminate meristem to determinate leaf primordium is the down-regulation of KNOX1 genes orthologous to SHOOT MERISTEMLESS (STM) in Arabidopsis (hereafter called STM genes) in the initiating primordia. In simple leafed plants, this down-regulation persists during leaf formation. In compound leafed plants, however, KNOX1 gene expression is reestablished later in the developing primordia, creating an indeterminate environment for leaflet formation. Despite this knowledge, most aspects of how STM gene expression is regulated remain largely unknown. Here, we identify two evolutionarily conserved noncoding sequences within the 5′ upstream region of STM genes in both simple and compound leafed species across monocots and dicots. We show that one of these elements is involved in the regulation of the persistent repression and/or the reestablishment of STM expression in the developing leaves but is not involved in the initial down-regulation in the initiating primordia. We also show evidence that this regulation is developmentally significant for leaf formation in the pathway involving ASYMMETRIC LEAVES1/2 (AS1/2) gene expression; these genes are known to function in leaf development. Together, these findings reveal a regulatory point of leaf development mediated through a conserved, noncoding sequence in STM genes. PMID:17898165

  11. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    PubMed

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  12. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    PubMed Central

    2010-01-01

    Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586

  13. The human archain gene, ARCN1, has highly conserved homologs in rice and drosophila

    SciTech Connect

    Radice, P.; Jones, C.; Perry, H.

    1995-03-01

    A novel human gene, ARCN1, has been identified in chromosome band 11q23.3. It maps approximately 50 kb telomeric to MLL, a gene that is disrupted in a number of leukemia-associated translocation chromosomes. cDNA clones representing ARCN1 hybridize to 4-kb mRNA species present in all tissues tested. Sequencing of cDNAs suggests that at least two forms of mRNA with alternative 5 {prime} ends are present within the cell. The mRNA with the longest open reading frame gives rise to a protein of 57 kDa. Although the sequence reported is novel, remarkable similarity is observed with two predicted protein sequences from partial DNA sequences generated by rice (Oryza sativa) and fruit fly (Drosophila melanogaster) genome projects. The degree of sequence conservation is comparable to that observed for highly conserved structural proteins, such as heat shock protein HSP70, and is greater than that of {gamma}-gubulin and heat shock protein HSP60. A more distant relationship to the group of clathrin-associated proteins suggests a possible role in vesicle structure or trafficking. In view of its ancient pedigree and a potential involvement in cellular architecture, the authors propose that the ARCN1 protein be named archain. 20 refs., 5 figs.

  14. In Silico Structure and Sequence Analysis of Bacterial Porins and Specific Diffusion Channels for Hydrophilic Molecules: Conservation, Multimericity and Multifunctionality

    PubMed Central

    Vollan, Hilde S.; Tannæs, Tone; Vriend, Gert; Bukholm, Geir

    2016-01-01

    Diffusion channels are involved in the selective uptake of nutrients and form the largest outer membrane protein (OMP) family in Gram-negative bacteria. Differences in pore size and amino acid composition contribute to the specificity. Structure-based multiple sequence alignments shed light on the structure-function relations for all eight subclasses. Entropy-variability analysis results are correlated to known structural and functional aspects, such as structural integrity, multimericity, specificity and biological niche adaptation. The high mutation rate in their surface-exposed loops is likely an important mechanism for host immune system evasion. Multiple sequence alignments for each subclass revealed conserved residue positions that are involved in substrate recognition and specificity. An analysis of monomeric protein channels revealed particular sequence patterns of amino acids that were observed in other classes at multimeric interfaces. This adds to the emerging evidence that all members of the family exist in a multimeric state. Our findings are important for understanding the role of members of this family in a wide range of bacterial processes, including bacterial food uptake, survival and adaptation mechanisms. PMID:27110766

  15. Analysis of 90 Mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition

    PubMed Central

    Zhu, Wei; Ouyang, Shu; Iovene, Marina; O'Brien, Kimberly; Vuong, Hue; Jiang, Jiming; Buell, C Robin

    2008-01-01

    Background The Solanaceae family contains a number of important crop species including potato (Solanum tuberosum) which is grown for its underground storage organ known as a tuber. Albeit the 4th most important food crop in the world, other than a collection of ~220,000 Expressed Sequence Tags, limited genomic sequence information is currently available for potato and advances in potato yield and nutrition content would be greatly assisted through access to a complete genome sequence. While morphologically diverse, Solanaceae species such as potato, tomato, pepper, and eggplant share not only genes but also gene order thereby permitting highly informative comparative genomic analyses. Results In this study, we report on analysis 89.9 Mb of potato genomic sequence representing 10.2% of the genome generated through end sequencing of a potato bacterial artificial chromosome (BAC) clone library (87 Mb) and sequencing of 22 potato BAC clones (2.9 Mb). The GC content of potato is very similar to Solanum lycopersicon (tomato) and other dicotyledonous species yet distinct from the monocotyledonous grass species, Oryza sativa. Parallel analyses of repetitive sequences in potato and tomato revealed substantial differences in their abundance, 34.2% in potato versus 46.3% in tomato, which is consistent with the increased genome size per haploid genome of these two Solanum species. Specific classes and types of repetitive sequences were also differentially represented between these two species including a telomeric-related repetitive sequence, ribosomal DNA, and a number of unclassified repetitive sequences. Comparative analyses between tomato and potato at the gene level revealed a high level of conservation of gene content, genic feature, and gene order although discordances in synteny were observed. Conclusion Genomic level analyses of potato and tomato confirm that gene sequence and gene order are conserved between these solanaceous species and that this conservation can be

  16. Comparative sequence and structure analysis reveals the conservation and diversity of nucleotide positions and their associated tertiary interactions in the riboswitches.

    PubMed

    Appasamy, Sri D; Ramlan, Effirul Ikhwan; Firdaus-Raih, Mohd

    2013-01-01

    The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand's functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch.

  17. Comparative Sequence and Structure Analysis Reveals the Conservation and Diversity of Nucleotide Positions and Their Associated Tertiary Interactions in the Riboswitches

    PubMed Central

    Appasamy, Sri D.; Ramlan, Effirul Ikhwan; Firdaus-Raih, Mohd

    2013-01-01

    The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand’s functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch. PMID:24040136

  18. Differential Gene Expression in the Human Brain Is Associated with Conserved, but Not Accelerated, Noncoding Sequences

    PubMed Central

    Meyer, Kyle A.; Marques-Bonet, Tomas

    2017-01-01

    Previous studies have found that genes which are differentially expressed within the developing human brain disproportionately neighbor conserved noncoding sequences (CNSs) that have an elevated substitution rate in humans and in other species. One explanation for this general association of differential expression with accelerated CNSs is that genes with pre-existing patterns of differential expression have been preferentially targeted by species-specific regulatory changes. Here we provide support for an alternative explanation: genes that neighbor a greater number of CNSs have a higher probability of differential expression and a higher probability of neighboring a CNS with lineage-specific acceleration. Thus, neighboring an accelerated element from any species signals that a gene likely neighbors many CNSs. We extend the analyses beyond the prenatal time points considered in previous studies to demonstrate that this association persists across developmental and adult periods. Examining differential expression between non-neural tissues suggests that the relationship between the number of CNSs a gene neighbors and its differential expression status may be particularly strong for expression differences among brain regions. In addition, by considering this relationship, we highlight a recently defined set of putative human-specific gain-of-function sequences that, even after adjusting for the number of CNSs neighbored by genes, shows a positive relationship with upregulation in the brain compared with other tissues examined. PMID:28204568

  19. Relationship between sequence conservation and three-dimensional structure in a large family of esterases, lipases, and related proteins.

    PubMed Central

    Cygler, M.; Schrag, J. D.; Sussman, J. L.; Harel, M.; Silman, I.; Gentry, M. K.; Doctor, B. P.

    1993-01-01

    Based on the recently determined X-ray structures of Torpedo californica acetylcholinesterase and Geotrichum candidum lipase and on their three-dimensional superposition, an improved alignment of a collection of 32 related amino acid sequences of other esterases, lipases, and related proteins was obtained. On the basis of this alignment, 24 residues are found to be invariant in 29 sequences of hydrolytic enzymes, and an additional 49 are well conserved. The conservation in the three remaining sequences is somewhat lower. The conserved residues include the active site, disulfide bridges, salt bridges, and residues in the core of the proteins. Most invariant residues are located at the edges of secondary structural elements. A clear structural basis for the preservation of many of these residues can be determined from comparison of the two X-ray structures. PMID:8453375

  20. Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

    PubMed

    Gupta, Radhey S

    2012-11-01

    The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been

  1. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

    PubMed

    Siddharthan, Rahul

    2006-03-16

    Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  2. The expressed TCRβ CDR3 repertoire is dominated by conserved DNA sequences in channel catfish.

    PubMed

    Findly, R Craig; Niagro, Frank D; Dickerson, Harry W

    2017-03-01

    We analyzed by high-throughput sequencing T cell receptor beta CDR3 repertoires expressed by αβ T cells in outbred channel catfish before and after an immunizing infection with the parasitic protozoan Ichthyophthirius multifiliis. We compared CDR3 repertoires in caudal fin before infection and at three weeks after infection, and in skin, PBL, spleen and head kidney at seven and twenty-one weeks after infection. Public clonotypes with the same CDR3 amino acid sequence were expressed by αβ T cells that underwent clonal expansion following development of immunity. These clonally expanded αβ T cells were primarily located in spleen and skin, which is a site of infection. Although multiple DNA sequences were expected to code for each public clonotype, each public clonotype was predominately coded by an identical CDR3 DNA sequence in combination with the same J gene in all fish. The processes underlying this shared use of CDR3 DNA sequences are not clear.

  3. Discovery and profiling of novel and conserved microRNAs during flower development in Carya cathayensis via deep sequencing.

    PubMed

    Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song

    2012-08-01

    Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction.

  4. Structural and Sequence Similarities of Hydra Xeroderma Pigmentosum A Protein to Human Homolog Suggest Early Evolution and Conservation

    PubMed Central

    Ghaskadbi, Saroj

    2013-01-01

    Xeroderma pigmentosum group A (XPA) is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER) pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1) and replication protein A 70 kDa subunit (RPA70) proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla. PMID:24083246

  5. Structural and sequence similarities of hydra xeroderma pigmentosum A protein to human homolog suggest early evolution and conservation.

    PubMed

    Barve, Apurva; Ghaskadbi, Saroj; Ghaskadbi, Surendra

    2013-01-01

    Xeroderma pigmentosum group A (XPA) is a protein that binds to damaged DNA, verifies presence of a lesion, and recruits other proteins of the nucleotide excision repair (NER) pathway to the site. Though its homologs from yeast, Drosophila, humans, and so forth are well studied, XPA has not so far been reported from protozoa and lower animal phyla. Hydra is a fresh-water cnidarian with a remarkable capacity for regeneration and apparent lack of organismal ageing. Cnidarians are among the first metazoa with a defined body axis, tissue grade organisation, and nervous system. We report here for the first time presence of XPA gene in hydra. Putative protein sequence of hydra XPA contains nuclear localization signal and bears the zinc-finger motif. It contains two conserved Pfam domains and various characterized features of XPA proteins like regions for binding to excision repair cross-complementing protein-1 (ERCC1) and replication protein A 70 kDa subunit (RPA70) proteins. Hydra XPA shows a high degree of similarity with vertebrate homologs and clusters with deuterostomes in phylogenetic analysis. Homology modelling corroborates the very close similarity between hydra and human XPA. The protein thus most likely functions in hydra in the same manner as in other animals, indicating that it arose early in evolution and has been conserved across animal phyla.

  6. Structural Relationships between Highly Conserved Elements and Genes in Vertebrate Genomes

    PubMed Central

    Sun, Hong; Skogerbø, Geir; Wang, Zhen; Liu, Wei; Li, Yixue

    2008-01-01

    Large numbers of sequence elements have been identified to be highly conserved among vertebrate genomes. These highly conserved elements (HCEs) are often located in or around genes that are involved in transcription regulation and early development. They have been shown to be involved in cis-regulatory activities through both in vivo and additional computational studies. We have investigated the structural relationships between such elements and genes in six vertebrate genomes human, mouse, rat, chicken, zebrafish and tetraodon and detected several thousand cases of conserved HCE-gene associations, and also cases of HCEs with no common target genes. A few examples underscore the potential significance of our findings about several individual genes. We found that the conserved association between HCE/HCEs and gene/genes are not restricted to elements by their absolute distance on the genome. Notably, long-range associations were identified and the molecular functions of the associated genes do not show any particular overrepresentation of the functional categories previously reported. HCEs in close proximity are found to be linked with different set of gene/genes. The results reflect the highly complex correlation between HCEs and their putative target genes. PMID:19008958

  7. Characterization of a highly repeated DNA sequence family in five species of the genus Eulemur.

    PubMed

    Ventura, M; Boniotto, M; Cardone, M F; Fulizio, L; Archidiacono, N; Rocchi, M; Crovella, S

    2001-09-19

    The karyotypes of Eulemur species exhibit a high degree of variation, as a consequence of the Robertsonian fusion and/or centromere fission. Centromeric and pericentromeric heterochromatin of eulemurs is constituted by highly repeated DNA sequences (including some telomeric TTAGGG repeats) which have so far been investigated and used for the study of the systematic relationships of the different species of the genus Eulemur. In our study, we have cloned a set of repetitive pericentromeric sequences of five Eulemur species: E. fulvus fulvus (EFU), E. mongoz (EMO), E. macaco (EMA), E. rubriventer (ERU), and E. coronatus (ECO). We have characterized these clones by sequence comparison and by comparative fluorescence in situ hybridization analysis in EMA and EFU. Our results showed a high degree of sequence similarity among Eulemur species, indicating a strong conservation, within the five species, of these pericentromeric highly repeated DNA sequences.

  8. Identifying Conserved and Novel MicroRNAs in Developing Seeds of Brassica napus Using Deep Sequencing

    PubMed Central

    Körbes, Ana Paula; Machado, Ronei Dorneles; Guzman, Frank; Almerão, Mauricio Pereira; de Oliveira, Luiz Felipe Valter; Loss-Morais, Guilherme; Turchetto-Zolet, Andreia Carina; Cagliari, Alexandro; dos Santos Maraschin, Felipe; Margis-Pinheiro, Marcia; Margis, Rogerio

    2012-01-01

    MicroRNAs (miRNAs) are important post-transcriptional regulators of plant development and seed formation. In Brassica napus, an important edible oil crop, valuable lipids are synthesized and stored in specific seed tissues during embryogenesis. The miRNA transcriptome of B. napus is currently poorly characterized, especially at different seed developmental stages. This work aims to describe the miRNAome of developing seeds of B. napus by identifying plant-conserved and novel miRNAs and comparing miRNA abundance in mature versus developing seeds. Members of 59 miRNA families were detected through a computational analysis of a large number of reads obtained from deep sequencing two small RNA and two RNA-seq libraries of (i) pooled immature developing stages and (ii) mature B. napus seeds. Among these miRNA families, 17 families are currently known to exist in B. napus; additionally 29 families not reported in B. napus but conserved in other plant species were identified by alignment with known plant mature miRNAs. Assembled mRNA-seq contigs allowed for a search of putative new precursors and led to the identification of 13 novel miRNA families. Analysis of miRNA population between libraries reveals that several miRNAs and isomiRNAs have different abundance in developing stages compared to mature seeds. The predicted miRNA target genes encode a broad range of proteins related to seed development and energy storage. This work presents a comparative study of the miRNA transcriptome of mature and developing B. napus seeds and provides a basis for future research on individual miRNAs and their functions in embryogenesis, seed maturation and lipid accumulation in B. napus. PMID:23226347

  9. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila.

    PubMed

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang; Xu, Yong-Zhen

    2015-04-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5' intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5' intron finds the 3' introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5' intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing.

  10. A conserved intronic U1 snRNP-binding sequence promotes trans-splicing in Drosophila

    PubMed Central

    Gao, Jun-Li; Fan, Yu-Jie; Wang, Xiu-Ye; Zhang, Yu; Pu, Jia; Li, Liang; Shao, Wei; Zhan, Shuai; Hao, Jianjiang

    2015-01-01

    Unlike typical cis-splicing, trans-splicing joins exons from two separate transcripts to produce chimeric mRNA and has been detected in most eukaryotes. Trans-splicing in trypanosomes and nematodes has been characterized as a spliced leader RNA-facilitated reaction; in contrast, its mechanism in higher eukaryotes remains unclear. Here we investigate mod(mdg4), a classic trans-spliced gene in Drosophila, and report that two critical RNA sequences in the middle of the last 5′ intron, TSA and TSB, promote trans-splicing of mod(mdg4). In TSA, a 13-nucleotide (nt) core motif is conserved across Drosophila species and is essential and sufficient for trans-splicing, which binds U1 small nuclear RNP (snRNP) through strong base-pairing with U1 snRNA. In TSB, a conserved secondary structure acts as an enhancer. Deletions of TSA and TSB using the CRISPR/Cas9 system result in developmental defects in flies. Although it is not clear how the 5′ intron finds the 3′ introns, compensatory changes in U1 snRNA rescue trans-splicing of TSA mutants, demonstrating that U1 recruitment is critical to promote trans-splicing in vivo. Furthermore, TSA core-like motifs are found in many other trans-spliced Drosophila genes, including lola. These findings represent a novel mechanism of trans-splicing, in which RNA motifs in the 5′ intron are sufficient to bring separate transcripts into close proximity to promote trans-splicing. PMID:25838544

  11. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects.

    PubMed

    Hancock, J M; Shaw, P J; Bonneton, F; Dover, G A

    1999-02-01

    Extensive sequence analysis of the developmental gene hunchback and its 5' and 3' regulatory regions in Drosophila melanogaster, Drosophila virilis, Musca domestica, and Tribolium castaneum, using a variety of computer algorithms, reveals regions of high sequence simplicity probably generated by slippage-like mechanisms of turnover. No regions are entirely refractory to the action of slippage, although the density and composition of simple sequence motifs varies from region to region. Interestingly, the 5' and 3' flanking regions share short repetitive motifs despite their separation by the gene itself, and the motifs are different in composition from those in the exons and introns. Furthermore, there are high levels of conservation of motifs in equivalent orthologous regions. Detailed sequence analysis of the P2 promoter and DNA footprinting assays reveal that the number, orientation, sequence, spacing, and protein-binding affinities of the BICOID-binding sites varies between species and that the 'P2' promoter, the nanos response element in the 3' untranslated region, and several conserved boxes of sequence in the gene (e.g., the two zinc-finger regions) are surrounded by cryptically-simple-sequence DNA. We argue that high sequence turnover and genetic redundancy permit both the general maintenance of promoter functions through the establishment of coevolutionary (compensatory) changes in cis- and trans-acting genetic elements and, at the same time, the possibility of subtle changes in the regulation of hunchback in the different species.

  12. High Throughput Sequencing of Extracellular RNA from Human Plasma

    PubMed Central

    Danielson, Kirsty M.; Rubio, Renee; Abderazzaq, Fieda; Das, Saumya; Wang, Yaoyu E.

    2017-01-01

    The presence and relative stability of extracellular RNAs (exRNAs) in biofluids has led to an emerging recognition of their promise as ‘liquid biopsies’ for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs) using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species). Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms. PMID:28060806

  13. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses.

    PubMed

    Turco, Gina; Schnable, James C; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.

  14. The first complete plastid genomes of Melastomataceae are highly structurally conserved

    PubMed Central

    Neubig, Kurt M.; Majure, Lucas C.

    2016-01-01

    Background In the past three decades, several studies have predominantly relied on a small sample of the plastome to infer deep phylogenetic relationships in the species-rich Melastomataceae. Here, we report the first full plastid sequences of this family, compare general features of the sampled plastomes to other sequenced Myrtales, and survey the plastomes for highly informative regions for phylogenetics. Methods Genome skimming was performed for 16 species spread across the Melastomataceae. Plastomes were assembled, annotated and compared to eight sequenced plastids in the Myrtales. Phylogenetic inference was performed using Maximum Likelihood on six different data sets, where putative biases were taken into account. Summary statistics were generated for all introns and intergenic spacers with suitable size for polymerase chain reaction (PCR) amplification and used to rank the markers by phylogenetic information. Results The majority of the plastomes sampled are conserved in gene content and order, as well as in sequence length and GC content within plastid regions and sequence classes. Departures include the putative presence of rps16 and rpl2 pseudogenes in some plastomes. Phylogenetic analyses of the majority of the schemes analyzed resulted in the same topology with high values of bootstrap support. Although there is still uncertainty in some relationships, in the highest supported topologies only two nodes received bootstrap values lower than 95%. Discussion Melastomataceae plastomes are no exception for the general patterns observed in the genomic structure of land plant chloroplasts, being highly conserved and structurally similar to most other Myrtales. Despite the fact that the full plastome phylogeny shares most of the clades with the previously widely used and reduced data set, some changes are still observed and bootstrap support is higher. The plastome data set presented here is a step towards phylogenomic analyses in the Melastomataceae and will be

  15. The Putative Leishmania Telomerase RNA (LeishTER) Undergoes Trans-Splicing and Contains a Conserved Template Sequence

    PubMed Central

    da Silva, Marcelo S.; Segatto, Marcela; Myler, Peter J.; Cano, Maria Isabel N.

    2014-01-01

    Telomerase RNAs (TERs) are highly divergent between species, varying in size and sequence composition. Here, we identify a candidate for the telomerase RNA component of Leishmania genus, which includes species that cause leishmaniasis, a neglected tropical disease. Merging a thorough computational screening combined with RNA-seq evidence, we mapped a non-coding RNA gene localized in a syntenic locus on chromosome 25 of five Leishmania species that shares partial synteny with both Trypanosoma brucei TER locus and a putative TER candidate-containing locus of Crithidia fasciculata. Using target-driven molecular biology approaches, we detected a ∼2,100 nt transcript (LeishTER) that contains a 5′ spliced leader (SL) cap, a putative 3′ polyA tail and a predicted C/D box snoRNA domain. LeishTER is expressed at similar levels in the logarithmic and stationary growth phases of promastigote forms. A 5′SL capped LeishTER co-immunoprecipitated and co-localized with the telomerase protein component (TERT) in a cell cycle-dependent manner. Prediction of its secondary structure strongly suggests the existence of a bona fide single-stranded template sequence and a conserved C[U/C]GUCA motif-containing helix II, representing the template boundary element. This study paves the way for further investigations on the biogenesis of parasite TERT ribonucleoproteins (RNPs) and its role in parasite telomere biology. PMID:25391020

  16. The putative Leishmania telomerase RNA (LeishTER) undergoes trans-splicing and contains a conserved template sequence.

    PubMed

    Vasconcelos, Elton J R; Nunes, Vinícius S; da Silva, Marcelo S; Segatto, Marcela; Myler, Peter J; Cano, Maria Isabel N

    2014-01-01

    Telomerase RNAs (TERs) are highly divergent between species, varying in size and sequence composition. Here, we identify a candidate for the telomerase RNA component of Leishmania genus, which includes species that cause leishmaniasis, a neglected tropical disease. Merging a thorough computational screening combined with RNA-seq evidence, we mapped a non-coding RNA gene localized in a syntenic locus on chromosome 25 of five Leishmania species that shares partial synteny with both Trypanosoma brucei TER locus and a putative TER candidate-containing locus of Crithidia fasciculata. Using target-driven molecular biology approaches, we detected a ∼2,100 nt transcript (LeishTER) that contains a 5' spliced leader (SL) cap, a putative 3' polyA tail and a predicted C/D box snoRNA domain. LeishTER is expressed at similar levels in the logarithmic and stationary growth phases of promastigote forms. A 5'SL capped LeishTER co-immunoprecipitated and co-localized with the telomerase protein component (TERT) in a cell cycle-dependent manner. Prediction of its secondary structure strongly suggests the existence of a bona fide single-stranded template sequence and a conserved C[U/C]GUCA motif-containing helix II, representing the template boundary element. This study paves the way for further investigations on the biogenesis of parasite TERT ribonucleoproteins (RNPs) and its role in parasite telomere biology.

  17. Conserved antigenic sites between MERS-CoV and Bat-coronavirus are revealed through sequence analysis.

    PubMed

    Sharmin, Refat; Islam, Abul B M M K

    2016-01-01

    MERS-CoV is a newly emerged human coronavirus reported closely related with HKU4 and HKU5 Bat coronaviruses. Bat and MERS corona-viruses are structurally related. Therefore, it is of interest to estimate the degree of conserved antigenic sites among them. It is of importance to elucidate the shared antigenic-sites and extent of conservation between them to understand the evolutionary dynamics of MERS-CoV. Multiple sequence alignment of the spike (S), membrane (M), enveloped (E) and nucleocapsid (N) proteins was employed to identify the sequence conservation among MERS and Bat (HKU4, HKU5) coronaviruses. We used various in silico tools to predict the conserved antigenic sites. We found that MERS-CoV shared 30 % of its S protein antigenic sites with HKU4 and 70 % with HKU5 bat-CoV. Whereas 100 % of its E, M and N protein's antigenic sites are found to be conserved with those in HKU4 and HKU5. This sharing suggests that in case of pathogenicity MERS-CoV is more closely related to HKU5 bat-CoV than HKU4 bat-CoV. The conserved epitopes indicates their evolutionary relationship and ancestry of pathogenicity.

  18. THE GRK4 SUBFAMILY OF G PROTEIN-COUPLED RECEPTOR KINASES: ALTERNATIVE SPLICING, GENE ORGANIZATION, AND SEQUENCE CONSERVATION

    EPA Science Inventory

    The GRK4 subfamily of G protein-coupled receptor kinases. Alternative splicing, gene organization, and sequence conservation.

    Premont RT, Macrae AD, Aparicio SA, Kendall HE, Welch JE, Lefkowitz RJ.

    Department of Medicine, Howard Hughes Medical Institute, Duke Univer...

  19. THE GRK4 SUBFAMILY OF G PROTEIN-COUPLED RECEPTOR KINASES: ALTERNATIVE SPLICING, GENE ORGANIZATION, AND SEQUENCE CONSERVATION

    EPA Science Inventory

    The GRK4 subfamily of G protein-coupled receptor kinases. Alternative splicing, gene organization, and sequence conservation.

    Premont RT, Macrae AD, Aparicio SA, Kendall HE, Welch JE, Lefkowitz RJ.

    Department of Medicine, Howard Hughes Medical Institute, Duke Univer...

  20. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize

    PubMed Central

    Salvi, Silvio; Sponza, Giorgio; Morgante, Michele; Tomes, Dwight; Niu, Xiaomu; Fengler, Kevin A.; Meeley, Robert; Ananiev, Evgueni V.; Svitashev, Sergei; Bruggemann, Edward; Li, Bailin; Hainey, Christine F.; Radovic, Slobodanka; Zaina, Giusi; Rafalski, J.-Antoni; Tingey, Scott V.; Miao, Guo-Hua; Phillips, Ronald L.; Tuberosa, Roberto

    2007-01-01

    Flowering time is a fundamental trait of maize adaptation to different agricultural environments. Although a large body of information is available on the map position of quantitative trait loci for flowering time, little is known about the molecular basis of quantitative trait loci. Through positional cloning and association mapping, we resolved the major flowering-time quantitative trait locus, Vegetative to generative transition 1 (Vgt1), to an ≈2-kb noncoding region positioned 70 kb upstream of an Ap2-like transcription factor that we have shown to be involved in flowering-time control. Vgt1 functions as a cis-acting regulatory element as indicated by the correlation of the Vgt1 alleles with the transcript expression levels of the downstream gene. Additionally, within Vgt1, we identified evolutionarily conserved noncoding sequences across the maize–sorghum–rice lineages. Our results support the notion that changes in distant cis-acting regulatory regions are a key component of plant genetic adaptation throughout breeding and evolution. PMID:17595297

  1. Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces.

    PubMed

    Aytuna, A Selim; Gursoy, Attila; Keskin, Ozlem

    2005-06-15

    Elucidation of the full network of protein-protein interactions is crucial for understanding of the principles of biological systems and processes. Thus, there is a need for in silico methods for predicting interactions. We present a novel algorithm for automated prediction of protein-protein interactions that employs a unique bottom-up approach combining structure and sequence conservation in protein interfaces. Running the algorithm on a template dataset of 67 interfaces and a sequentially non-redundant dataset of 6170 protein structures, 62 616 potential interactions are predicted. These interactions are compared with the ones in two publicly available interaction databases (Database of Interacting Proteins and Biomolecular Interaction Network Database) and also the Protein Data Bank. A significant number of predictions are verified in these databases. The unverified ones may correspond to (1) interactions that are not covered in these databases but known in literature, (2) unknown interactions that actually occur in nature and (3) interactions that do not occur naturally but may possibly be realized synthetically in laboratory conditions. Some unverified interactions, supported significantly with studies found in the literature, are discussed. http://gordion.hpc.eng.ku.edu.tr/prism agursoy@ku.edu.tr; okeskin@ku.edu.tr.

  2. Optimal assembly for high throughput shotgun sequencing

    PubMed Central

    2013-01-01

    We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. Building on earlier works, we design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes, including the GAGE datasets. The results are based on a set of necessary and sufficient conditions on the DNA sequence and the reads for reconstruction. The conditions can be viewed as the shotgun sequencing analogue of Ukkonen-Pevzner's necessary and sufficient conditions for Sequencing by Hybridization. PMID:23902516

  3. The sexually dimorphic on the Y-chromosome gene (sdY) is a conserved male-specific Y-chromosome sequence in many salmonids

    PubMed Central

    Yano, Ayaka; Nicol, Barbara; Jouanno, Elodie; Quillet, Edwige; Fostier, Alexis; Guyomard, René; Guiguen, Yann

    2013-01-01

    All salmonid species investigated to date have been characterized with a male heterogametic sex-determination system. However, as these species do not share any Y-chromosome conserved synteny, there remains a debate on whether they share a common master sex-determining gene. In this study, we investigated the extent of conservation and evolution of the rainbow trout (Oncorhynchus mykiss) master sex-determining gene, sdY (sexually dimorphic on the Y-chromosome), in 15 different species of salmonids. We found that the sdY sequence is highly conserved in all salmonids and that sdY is a male-specific Y-chromosome gene in the majority of these species. These findings demonstrate that most salmonids share a conserved sex-determining locus and also strongly suggest that sdY may be this conserved master sex-determining gene. However, in two whitefish species (subfamily Coregoninae), sdY was found both in males and females, suggesting that alternative sex-determination systems may have also evolved in this family. Based on the wide conservation of sdY as a male-specific Y-chromosome gene, efficient and easy molecular sexing techniques can now be developed that will be of great interest for studying these economically and environmentally important species. PMID:23745140

  4. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  5. Sequence evaluation of FGF and FGFR gene conserved non-coding elements in non-syndromic cleft lip and palate cases.

    PubMed

    Riley, Bridget M; Murray, Jeffrey C

    2007-12-15

    Non-syndromic cleft lip and palate (NS CLP) is a complex birth defect resulting from multiple genetic and environmental factors. We have previously reported the sequencing of the coding region of genes in the fibroblast growth factor (FGF) signaling pathway, in which missense and non-sense mutations contribute to approximately 5%-6% NS CLP cases. In this article we report the sequencing of conserved non-coding elements (CNEs) in and around 11 of the FGF and FGFR genes, which identified 55 novel variants. Seven of variants are highly conserved among >/=8 species and 31 variants alter transcription factor binding sites, 8 of which are important for craniofacial development. Additionally, 15 NS CLP patients had a combination of coding mutations and CNE variants, suggesting that an accumulation of variants in the FGF signaling pathway may contribute to clefting. (c) 2007 Wiley-Liss, Inc.

  6. Cdc14: a highly conserved family of phosphatases with non-conserved functions?

    PubMed

    Mocciaro, Annamaria; Schiebel, Elmar

    2010-09-01

    CDC14 was originally identified by L. Hartwell in his famous screen for genes that regulate the budding yeast cell cycle. Subsequent work showed that Cdc14 belongs to a family of highly conserved dual-specificity phosphatases that are present in a wide range of organisms from yeast to human. Human CDC14B is even able to fulfill the essential functions of budding yeast Cdc14. In budding yeast, Cdc14 counteracts the activity of cyclin dependent kinase (Cdk1) at the end of mitosis and thus has important roles in the regulation of anaphase, mitotic exit and cytokinesis. On the basis of the functional conservation of other cell-cycle genes it seemed obvious to assume that Cdc14 phosphatases also have roles in late mitosis in mammalian cells and regulate similar targets to those found in yeast. However, analysis of the human Cdc14 proteins (CDC14A, CDC14B and CDC14C) by overexpression or by depletion using small interfering RNA (siRNA) has suggested functions that are quite different from those of ScCdc14. Recent studies in avian and human somatic cell lines in which the gene encoding either Cdc14A or Cdc14B had been deleted, have shown - surprisingly - that neither of the two phosphatases on its own is essential for viability, cell-cycle progression and checkpoint control. In this Commentary, we critically review the available data on the functions of yeast and vertebrate Cdc14 phosphatases, and discuss whether they indeed share common functions as generally assumed.

  7. CDvist: A webserver for identification and visualization of conserved domains in protein sequences

    SciTech Connect

    Adebali, Ogun; Ortega, Davi R.; Zhulin, Igor B.

    2014-12-18

    Identification of domains in protein sequences allows their assigning to biological functions. Several webservers exist for identification of protein domains using similarity searches against various databases of protein domain models. However, none of them provides comprehensive domain coverage while allowing bulk querying and their visualization schemes can be improved. To address these issues, we developed CDvist (a comprehensive domain visualization tool), which combines the best available search algorithms and databases into a user-friendly framework. First, a given protein sequence is matched to domain models using high-specificity tools and only then unmatched segments are subjected to more sensitive algorithms resulting in a best possible comprehensive coverage. In conclusion, bulk querying and rich visualization and download options provide improved functionality to domain architecture analysis.

  8. CDvist: A webserver for identification and visualization of conserved domains in protein sequences

    DOE PAGES

    Adebali, Ogun; Ortega, Davi R.; Zhulin, Igor B.

    2014-12-18

    Identification of domains in protein sequences allows their assigning to biological functions. Several webservers exist for identification of protein domains using similarity searches against various databases of protein domain models. However, none of them provides comprehensive domain coverage while allowing bulk querying and their visualization schemes can be improved. To address these issues, we developed CDvist (a comprehensive domain visualization tool), which combines the best available search algorithms and databases into a user-friendly framework. First, a given protein sequence is matched to domain models using high-specificity tools and only then unmatched segments are subjected to more sensitive algorithms resulting inmore » a best possible comprehensive coverage. In conclusion, bulk querying and rich visualization and download options provide improved functionality to domain architecture analysis.« less

  9. QColors: an algorithm for conservative viral quasispecies reconstruction from short and non-contiguous next generation sequencing reads.

    PubMed

    Huang, Austin; Kantor, Rami; DeLong, Allison; Schreier, Leeann; Istrail, Sorin

    Next generation sequencing technologies have recently been applied to characterize mutational spectra of the heterogeneous population of viral genotypes (known as a quasispecies) within HIV-infected patients. Such information is clinically relevant because minority genetic subpopulations of HIV within patients enable viral escape from selection pressures such as the immune response and antiretroviral therapy. However, methods for quasispecies sequence reconstruction from next generation sequencing reads are not yet widely used and remains an emerging area of research. Furthermore, the majority of research methodology in HIV has focused on 454 sequencing, while many next-generation sequencing platforms used in practice are limited to shorter read lengths relative to 454 sequencing. Little work has been done in determining how best to address the read length limitations of other platforms. The approach described here incorporates graph representations of both read differences and read overlap to conservatively determine the regions of the sequence with sufficient variability to separate quasispecies sequences. Within these tractable regions of quasispecies inference, we use constraint programming to solve for an optimal quasispecies subsequence determination via vertex coloring of the conflict graph, a representation which also lends itself to data with non-contiguous reads such as paired-end sequencing. We demonstrate the utility of the method by applying it to simulations based on actual intra-patient clonal HIV-1 sequencing data.

  10. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  11. Effects of a Non-Conservative Sequence on the Properties of β-glucuronidase from Aspergillus terreus Li-20

    PubMed Central

    Liu, Yanli; Huangfu, Jie; Qi, Feng; Kaleem, Imdad; E, Wenwen; Li, Chun

    2012-01-01

    We cloned the β-glucuronidase gene (AtGUS) from Aspergillus terreus Li-20 encoding 657 amino acids (aa), which can transform glycyrrhizin into glycyrrhetinic acid monoglucuronide (GAMG) and glycyrrhetinic acid (GA). Based on sequence alignment, the C-terminal non-conservative sequence showed low identity with those of other species; thus, the partial sequence AtGUS(-3t) (1–592 aa) was amplified to determine the effects of the non-conservative sequence on the enzymatic properties. AtGUS and AtGUS(-3t) were expressed in E. coli BL21, producing AtGUS-E and AtGUS(-3t)-E, respectively. At the similar optimum temperature (55°C) and pH (AtGUS-E, 6.6; AtGUS(-3t)-E, 7.0) conditions, the thermal stability of AtGUS(-3t)-E was enhanced at 65°C, and the metal ions Co2+, Ca2+ and Ni2+ showed opposite effects on AtGUS-E and AtGUS(-3t)-E, respectively. Furthermore, Km of AtGUS(-3t)-E (1.95 mM) was just nearly one-seventh that of AtGUS-E (12.9 mM), whereas the catalytic efficiency of AtGUS(-3t)-E was 3.2 fold higher than that of AtGUS-E (7.16 vs. 2.24 mM s−1), revealing that the truncation of non-conservative sequence can significantly improve the catalytic efficiency of AtGUS. Conformational analysis illustrated significant difference in the secondary structure between AtGUS-E and AtGUS(-3t)-E by circular dichroism (CD). The results showed that the truncation of the non-conservative sequence could preferably alter and influence the stability and catalytic efficiency of enzyme. PMID:22347419

  12. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    PubMed

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  13. Taking High Conservation Value from Forests to Freshwaters

    NASA Astrophysics Data System (ADS)

    Abell, Robin; Morgan, Siân K.; Morgan, Alexis J.

    2015-07-01

    The high conservation value (HCV) concept, originally developed by the Forest Stewardship Council, has been widely incorporated outside the forestry sector into companies' supply chain assessments and responsible purchasing policies, financial institutions' investment policies, and numerous voluntary commodity standards. Many, if not most, of these newer applications relate to production practices that are likely to affect freshwater systems directly or indirectly, yet there is little guidance as to whether or how HCV can be applied to water bodies. We focus this paper on commodity standards and begin by exploring how prominent standards currently address both HCVs and freshwaters. We then highlight freshwater features of high conservation importance and examine how well those features are captured by the existing HCV framework. We propose a new set of freshwater `elements' for each of the six values and suggest an approach for identifying HCV Areas that takes out-of-fence line impacts into account, thereby spatially extending the scope of existing methods to define HCVs. We argue that virtually any non-marine HCV assessment, regardless of the production sector, should be expanded to include freshwater values, and we suggest how to put those recommendations into practice.

  14. Antibody Recognition of a Highly Conserved Influenza Virus Epitope

    SciTech Connect

    Ekiert, Damian C.; Bhabha, Gira; Elsliger, Marc-André; Friesen, Robert H.E.; Jongeneelen, Mandy; Throsby, Mark; Goudsmit, Jaap; Wilson, Ian A.; Scripps; Crucell

    2009-05-21

    Influenza virus presents an important and persistent threat to public health worldwide, and current vaccines provide immunity to viral isolates similar to the vaccine strain. High-affinity antibodies against a conserved epitope could provide immunity to the diverse influenza subtypes and protection against future pandemic viruses. Cocrystal structures were determined at 2.2 and 2.7 angstrom resolutions for broadly neutralizing human antibody CR6261 Fab in complexes with the major surface antigen (hemagglutinin, HA) from viruses responsible for the 1918 H1N1 influenza pandemic and a recent lethal case of H5N1 avian influenza. In contrast to other structurally characterized influenza antibodies, CR6261 recognizes a highly conserved helical region in the membrane-proximal stem of HA1 and HA2. The antibody neutralizes the virus by blocking conformational rearrangements associated with membrane fusion. The CR6261 epitope identified here should accelerate the design and implementation of improved vaccines that can elicit CR6261-like antibodies, as well as antibody-based therapies for the treatment of influenza.

  15. Taking high conservation value from forests to freshwaters.

    PubMed

    Abell, Robin; Morgan, Siân K; Morgan, Alexis J

    2015-07-01

    The high conservation value (HCV) concept, originally developed by the Forest Stewardship Council, has been widely incorporated outside the forestry sector into companies' supply chain assessments and responsible purchasing policies, financial institutions' investment policies, and numerous voluntary commodity standards. Many, if not most, of these newer applications relate to production practices that are likely to affect freshwater systems directly or indirectly, yet there is little guidance as to whether or how HCV can be applied to water bodies. We focus this paper on commodity standards and begin by exploring how prominent standards currently address both HCVs and freshwaters. We then highlight freshwater features of high conservation importance and examine how well those features are captured by the existing HCV framework. We propose a new set of freshwater 'elements' for each of the six values and suggest an approach for identifying HCV Areas that takes out-of-fence line impacts into account, thereby spatially extending the scope of existing methods to define HCVs. We argue that virtually any non-marine HCV assessment, regardless of the production sector, should be expanded to include freshwater values, and we suggest how to put those recommendations into practice.

  16. Sequences of conserved region in the A subunit of DNA gyrase from nine species of the genus Mycobacterium: phylogenetic analysis and implication for intrinsic susceptibility to quinolones.

    PubMed

    Guillemin, I; Cambau, E; Jarlier, V

    1995-09-01

    The sequences of a conserved region in the A subunit of DNA gyrase corresponding to the quinolone resistance-determining region were determined for nine mycobacterial species and were compared. Although the nucleotide sequences were highly conserved, they clearly differentiated one species from another. The results of the phylogenetic analysis based on the sequences of the quinolone resistance-determining regions were compared with those provided by the 16S rRNA sequences. Deduced amino acid sequences were identical within the nine species except for amino acid 83, which was frequently involved in acquired resistance to quinolones in many genera, including mycobacteria. The presence at position 83 of an alanine for seven mycobacterial species (M. tuberculosis, M. bovis BCG, M. leprae, M. avium, M. kansasii, M. chelonae, and M. smegmatis) and of a serine for the two remaining mycobacterial species (M. fortuitum and M. aurum) correlated well with the MICs of ofloxacin for both groups of species, suggesting the role of this residue in intrinsic susceptibility to quinolones in mycobacteria.

  17. High performance computing with a conservative spectral Boltzmann solver

    NASA Astrophysics Data System (ADS)

    Haack, Jeffrey R.; Gamba, Irene M.

    2012-11-01

    We present new results building on the conservative deterministic spectral method for the space inhomogeneous Boltzmann equation developed by Gamba and Tharkabhushaman. This approach is a two-step process that acts on the weak form of the Boltzmann equation, and uses the machinery of the Fourier transform to reformulate the collisional integral into a weighted convolution in Fourier space. A constrained optimization problem is solved to preserve the mass, momentum, and energy of the resulting distribution. We extend this method to second order accuracy in space and time, and explore how to leverage the structure of the collisional formulation for high performance computing environments. The locality in space of the collisional term provides a straightforward memory decomposition, and we perform some initial scaling tests on high performance computing resources. We also use the improved computational power of this method to investigate a boundary-layer generated shock problem that cannot be described by classical hydrodynamics.

  18. Deep small RNA sequencing from the nematode Ascaris reveals conservation, functional diversification, and novel developmental profiles.

    PubMed

    Wang, Jianbin; Czech, Benjamin; Crunk, Amanda; Wallace, Adam; Mitreva, Makedonka; Hannon, Gregory J; Davis, Richard E

    2011-09-01

    Eukaryotic cells express several classes of small RNAs that regulate gene expression and ensure genome maintenance. Endogenous siRNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs) mainly control gene and transposon expression in the germline, while microRNAs (miRNAs) generally function in post-transcriptional gene silencing in both somatic and germline cells. To provide an evolutionary and developmental perspective on small RNA pathways in nematodes, we identified and characterized known and novel small RNA classes through gametogenesis and embryo development in the parasitic nematode Ascaris suum and compared them with known small RNAs of Caenorhabditis elegans. piRNAs, Piwi-clade Argonautes, and other proteins associated with the piRNA pathway have been lost in Ascaris. miRNAs are synthesized immediately after fertilization in utero, before pronuclear fusion, and before the first cleavage of the zygote. This is the earliest expression of small RNAs ever described at a developmental stage long thought to be transcriptionally quiescent. A comparison of the two classes of Ascaris endo-siRNAs, 22G-RNAs and 26G-RNAs, to those in C. elegans, suggests great diversification and plasticity in the use of small RNA pathways during spermatogenesis in different nematodes. Our data reveal conserved characteristics of nematode small RNAs as well as features unique to Ascaris that illustrate significant flexibility in the use of small RNAs pathways, some of which are likely an adaptation to Ascaris' life cycle and parasitism. The transcriptome assembly has been submitted to NCBI Transcriptome Shotgun Assembly Sequence Database(http://www.ncbi.nlm.nih.gov/genbank/TSA.html) under accession numbers JI163767–JI182837 and JI210738–JI257410.

  19. A conserved 11 nucleotide sequence contains an essential promoter element of the maize mitochondrial atp1 gene.

    PubMed Central

    Rapp, W D; Stern, D B

    1992-01-01

    To determine the structure of a functional plant mitochondrial promoter, we have partially purified an RNA polymerase activity that correctly initiates transcription at the maize mitochondrial atp1 promoter in vitro. Using a series of 5' deletion constructs, we found that essential sequences are located within--19 nucleotides (nt) of the transcription initiation site. The region surrounding the initiation site includes conserved sequence motifs previously proposed to be maize mitochondrial promoter elements. Deletion of a conserved 11 nt sequence showed that it is critical for promoter function, but deletion or alteration of conserved upstream G(A/T)3-4 repeats had no effect. When the atp1 11 nt sequence was inserted into different plasmids lacking mitochondrial promoter activity, transcription was only observed for one of these constructs. We infer from these data that the functional promoter extends beyond this motif, most likely in the 5' direction. The maize mitochondrial cox3 and atp6 promoters also direct transcription initiation in this in vitro system, suggesting that it may be widely applicable for studies of mitochondrial transcription in this species. Images PMID:1372246

  20. Highly conserved gene order and numerous novel repetitive elements in genomic regions linked to wing pattern variation in Heliconius butterflies

    PubMed Central

    Papa, Riccardo; Morrison, Clayton M; Walters, James R; Counterman, Brian A; Chen, Rui; Halder, Georg; Ferguson, Laura; Chamberlain, Nicola; ffrench-Constant, Richard; Kapan, Durrell D; Jiggins, Chris D; Reed, Robert D; McMillan, William O

    2008-01-01

    Background With over 20 parapatric races differing in their warningly colored wing patterns, the butterfly Heliconius erato provides a fascinating example of an adaptive radiation. Together with matching races of its co-mimic Heliconius melpomene, H. erato also represents a textbook case of Müllerian mimicry, a phenomenon where common warning signals are shared amongst noxious organisms. It is of great interest to identify the specific genes that control the mimetic wing patterns of H. erato and H. melpomene. To this end we have undertaken comparative mapping and targeted genomic sequencing in both species. This paper reports on a comparative analysis of genomic sequences linked to color pattern mimicry genes in Heliconius. Results Scoring AFLP polymorphisms in H. erato broods allowed us to survey loci at approximately 362 kb intervals across the genome. With this strategy we were able to identify markers tightly linked to two color pattern genes: D and Cr, which were then used to screen H. erato BAC libraries in order to identify clones for sequencing. Gene density across 600 kb of BAC sequences appeared relatively low, although the number of predicted open reading frames was typical for an insect. We focused analyses on the D- and Cr-linked H. erato BAC sequences and on the Yb-linked H. melpomene BAC sequence. A comparative analysis between homologous regions of H. erato (Cr-linked BAC) and H. melpomene (Yb-linked BAC) revealed high levels of sequence conservation and microsynteny between the two species. We found that repeated elements constitute 26% and 20% of BAC sequences from H. erato and H. melpomene respectively. The majority of these repetitive sequences appear to be novel, as they showed no significant similarity to any other available insect sequences. We also observed signs of fine scale conservation of gene order between Heliconius and the moth Bombyx mori, suggesting that lepidopteran genome architecture may be conserved over very long evolutionary

  1. Highly conserved gene order and numerous novel repetitive elements in genomic regions linked to wing pattern variation in Heliconius butterflies.

    PubMed

    Papa, Riccardo; Morrison, Clayton M; Walters, James R; Counterman, Brian A; Chen, Rui; Halder, Georg; Ferguson, Laura; Chamberlain, Nicola; Ffrench-Constant, Richard; Kapan, Durrell D; Jiggins, Chris D; Reed, Robert D; McMillan, William O

    2008-07-22

    With over 20 parapatric races differing in their warningly colored wing patterns, the butterfly Heliconius erato provides a fascinating example of an adaptive radiation. Together with matching races of its co-mimic Heliconius melpomene, H. erato also represents a textbook case of Müllerian mimicry, a phenomenon where common warning signals are shared amongst noxious organisms. It is of great interest to identify the specific genes that control the mimetic wing patterns of H. erato and H. melpomene. To this end we have undertaken comparative mapping and targeted genomic sequencing in both species. This paper reports on a comparative analysis of genomic sequences linked to color pattern mimicry genes in Heliconius. Scoring AFLP polymorphisms in H. erato broods allowed us to survey loci at approximately 362 kb intervals across the genome. With this strategy we were able to identify markers tightly linked to two color pattern genes: D and Cr, which were then used to screen H. erato BAC libraries in order to identify clones for sequencing. Gene density across 600 kb of BAC sequences appeared relatively low, although the number of predicted open reading frames was typical for an insect. We focused analyses on the D- and Cr-linked H. erato BAC sequences and on the Yb-linked H. melpomene BAC sequence. A comparative analysis between homologous regions of H. erato (Cr-linked BAC) and H. melpomene (Yb-linked BAC) revealed high levels of sequence conservation and microsynteny between the two species. We found that repeated elements constitute 26% and 20% of BAC sequences from H. erato and H. melpomene respectively. The majority of these repetitive sequences appear to be novel, as they showed no significant similarity to any other available insect sequences. We also observed signs of fine scale conservation of gene order between Heliconius and the moth Bombyx mori, suggesting that lepidopteran genome architecture may be conserved over very long evolutionary time scales. Here

  2. The Atrazine Catabolism Genes atzABC Are Widespread and Highly Conserved

    PubMed Central

    de Souza, Mervyn L.; Seffernick, Jennifer; Martinez, Betsy; Sadowsky, Michael J.; Wackett, Lawrence P.

    1998-01-01

    Pseudomonas strain ADP metabolizes the herbicide atrazine via three enzymatic steps, encoded by the genes atzABC, to yield cyanuric acid, a nitrogen source for many bacteria. Here, we show that five geographically distinct atrazine-degrading bacteria contain genes homologous to atzA, -B, and -C. The sequence identities of the atz genes from different atrazine-degrading bacteria were greater than 99% in all pairwise comparisons. This differs from bacterial genes involved in the catabolism of other chlorinated compounds, for which the average sequence identity in pairwise comparisons of the known members of a class ranged from 25 to 56%. Our results indicate that globally distributed atrazine-catabolic genes are highly conserved in diverse genera of bacteria. PMID:9537398

  3. Concentration of specific amino acids at the catalytic/active centers of highly-conserved "housekeeping" enzymes of central metabolism in archaea, bacteria and Eukaryota: is there a widely conserved chemical signal of prebiotic assembly?

    PubMed

    Pollack, J Dennis; Pan, Xueliang; Pearl, Dennis K

    2010-06-01

    In alignments of 1969 protein sequences the amino acid glycine and others were found concentrated at most-conserved sites within approximately 15 A of catalytic/active centers (C/AC) of highly conserved kinases, dehydrogenases or lyases of Archaea, Bacteria and Eukaryota. Lysine and glutamic acid were concentrated at least-conserved sites furthest from their C/ACs. Logistic-regression analyses corroborated the "movement" of glycine towards and lysine away from their C/ACs: the odds of a glycine occupying a site were decreased by 19%, while the odds for a lysine were increased by 53%, for every 10 A moving away from the C/AC. Average conservation of MSA consensus sites was highest surrounding the C/AC and directly decreased in transition toward model's peripheries. Findings held with statistical confidence using sequences restricted to individual Domains or enzyme classes or to both. Our data describe variability in the rate of mutation and likelihoods for phylogenetic trees based on protein sequence data and endorse the extension of substitution models by incorporating data on conservation and distance to C/ACs rather than only using cumulative levels. The data support the view that in the most-conserved environment immediately surrounding the C/AC of taxonomically distant and highly conserved essential enzymes of central metabolism there are amino acids whose identity and degree of occupancy is similar to a proposed amino acid set and frequency associated with prebiotic evolution.

  4. High resolution mapping of Twist to DNA in Drosophila embryos: Efficient functional analysis and evolutionary conservation

    PubMed Central

    Ozdemir, Anil; Fisher-Aylor, Katherine I.; Pepke, Shirley; Samanta, Manoj; Dunipace, Leslie; McCue, Kenneth; Zeng, Lucy; Ogawa, Nobuo; Wold, Barbara J.; Stathopoulos, Angelike

    2011-01-01

    Cis-regulatory modules (CRMs) function by binding sequence specific transcription factors, but the relationship between in vivo physical binding and the regulatory capacity of factor-bound DNA elements remains uncertain. We investigate this relationship for the well-studied Twist factor in Drosophila melanogaster embryos by analyzing genome-wide factor occupancy and testing the functional significance of Twist occupied regions and motifs within regions. Twist ChIP-seq data efficiently identified previously studied Twist-dependent CRMs and robustly predicted new CRM activity in transgenesis, with newly identified Twist-occupied regions supporting diverse spatiotemporal patterns (>74% positive, n = 31). Some, but not all, candidate CRMs require Twist for proper expression in the embryo. The Twist motifs most favored in genome ChIP data (in vivo) differed from those most favored by Systematic Evolution of Ligands by EXponential enrichment (SELEX) (in vitro). Furthermore, the majority of ChIP-seq signals could be parsimoniously explained by a CABVTG motif located within 50 bp of the ChIP summit and, of these, CACATG was most prevalent. Mutagenesis experiments demonstrated that different Twist E-box motif types are not fully interchangeable, suggesting that the ChIP-derived consensus (CABVTG) includes sites having distinct regulatory outputs. Further analysis of position, frequency of occurrence, and sequence conservation revealed significant enrichment and conservation of CABVTG E-box motifs near Twist ChIP-seq signal summits, preferential conservation of ±150 bp surrounding Twist occupied summits, and enrichment of GA- and CA-repeat sequences near Twist occupied summits. Our results show that high resolution in vivo occupancy data can be used to drive efficient discovery and dissection of global and local cis-regulatory logic. PMID:21383317

  5. Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

    Treesearch

    Brian J. Knaus; Richard Cronn; Aaron Liston; Kristine Pilgrim; Michael K. Schwartz

    2011-01-01

    Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the...

  6. Conservation of nif- and species-specific domains within repeated promoter sequences from fast-growing Rhizobium species.

    PubMed Central

    Schofield, P R; Watson, J M

    1985-01-01

    In the fast-growing Rhizobium species, repeated DNA sequences, which include the promoter region of the nif HDK operon have been described. These repeated sequences are promoters which specifically activate transcription in the endosymbiotic state. Hybridization analysis of these sequences from R. trifolii has revealed that they may be involved in the species-specific activation of the various genes whose transcription they promote. Comparative analysis of various copies of these repeated sequences, from R. trifolii (the clover symbiont) and R. meliloti (the alfalfa symbiont), reveals the presence of domains of intra- and interspecific conservation within the promoter regions. We suggest that these promoter elements represent sites which are involved in the species-specific and general, nif-specific activation of Rhizobium symbiotic genes. PMID:3892479

  7. Comparison of CD45 extracellular domain sequences from divergent vertebrate species suggests the conservation of three fibronectin type III domains.

    PubMed

    Okumura, M; Matthews, R J; Robb, B; Litman, G W; Bork, P; Thomas, M L

    1996-08-15

    Mammalian CD45 is a transmembrane protein tyrosine phosphatase expressed by all nucleated cells of hematopoietic origin. In lymphocytes, CD45 is required for Ag-induced signal transduction due to its ability to positively regulate Src family members. The mechanisms by which CD45 function is regulated are unknown. Indeed, the interactions of CD45 extracellular domains are largely undefined. To gain insight into potentially important regions of the extracellular domain, we sought to identify conserved features from divergent species. cDNAs encoding the putative CD45 homologue from Heterodontus francisci (horned shark) were isolated. The cDNA sequence predicts a protein of 1200 amino acids that contains a 452-amino acid extracellular domain, a 22-amino acid transmembrane region, and a 703-amino acid cytoplasmic domain. Alignment searches revealed that the Heterodontus cytoplasmic domain sequence was most identical to mammalian CD45 and a transmembrane protein tyrosine phosphatase sequence identified from chickens, ChPTP lambda. A dendrogram with other transmembrane protein tyrosine phosphatase sequences suggest that the Heterodontus and chicken sequences represents CD45 orthologues for their respective species. Analysis of vertebrate CD45 extracellular domain sequences indicates the conservation of three structural regions: a region containing potential O-linked carbohydrate sites, a cysteine-containing region, and a region containing three fibronectin type III domains. For each vertebrate species, multiple isoforms are generated by alternative splicing of three exons that encode a portion of the region containing potential O-linked glycosylation sites. These studies provide evidence for a conservation in CD45 extracellular domain structure between divergent species and provide a basis for understanding CD45 extracellular domain interactions.

  8. Generating barcoded libraries for multiplex high-throughput sequencing.

    PubMed

    Knapp, Michael; Stiller, Mathias; Meyer, Matthias

    2012-01-01

    Molecular barcoding is an essential tool to use the high throughput of next generation sequencing platforms optimally in studies involving more than one sample. Various barcoding strategies allow for the incorporation of short recognition sequences (barcodes) into sequencing libraries, either by ligation or polymerase chain reaction (PCR). Here, we present two approaches optimized for generating barcoded sequencing libraries from low copy number extracts and amplification products typical of ancient DNA studies.

  9. Comparative and genetic analysis of the four sequenced Paenibacillus polymyxa genomes reveals a diverse metabolism and conservation of genes relevant to plant-growth promotion and competitiveness.

    PubMed

    Eastman, Alexander W; Heinrichs, David E; Yuan, Ze-Chun

    2014-10-03

    Members of the genus Paenibacillus are important plant growth-promoting rhizobacteria that can serve as bio-reactors. Paenibacillus polymyxa promotes the growth of a variety of economically important crops. Our lab recently completed the genome sequence of Paenibacillus polymyxa CR1. As of January 2014, four P. polymyxa genomes have been completely sequenced but no comparative genomic analyses have been reported. Here we report the comparative and genetic analyses of four sequenced P. polymyxa genomes, which revealed a significantly conserved core genome. Complex metabolic pathways and regulatory networks were highly conserved and allow P. polymyxa to rapidly respond to dynamic environmental cues. Genes responsible for phytohormone synthesis, phosphate solubilization, iron acquisition, transcriptional regulation, σ-factors, stress responses, transporters and biomass degradation were well conserved, indicating an intimate association with plant hosts and the rhizosphere niche. In addition, genes responsible for antimicrobial resistance and non-ribosomal peptide/polyketide synthesis are present in both the core and accessory genome of each strain. Comparative analyses also reveal variations in the accessory genome, including large plasmids present in strains M1 and SC2. Furthermore, a considerable number of strain-specific genes and genomic islands are irregularly distributed throughout each genome. Although a variety of plant-growth promoting traits are encoded by all strains, only P. polymyxa CR1 encodes the unique nitrogen fixation cluster found in other Paenibacillus sp. Our study revealed that genomic loci relevant to host interaction and ecological fitness are highly conserved within the P. polymyxa genomes analysed, despite variations in the accessory genome. This work suggets that plant-growth promotion by P. polymyxa is mediated largely through phytohormone production, increased nutrient availability and bio-control mechanisms. This study provides an in

  10. Dominant sequences of human major histocompatibility complex conserved extended haplotypes from HLA-DQA2 to DAXX.

    PubMed

    Larsen, Charles E; Alford, Dennis R; Trautwein, Michael R; Jalloh, Yanoh K; Tarnacki, Jennifer L; Kunnenkeri, Sushruta K; Fici, Dolores A; Yunis, Edmond J; Awdeh, Zuheir L; Alper, Chester A

    2014-10-01

    We resequenced and phased 27 kb of DNA within 580 kb of the MHC class II region in 158 population chromosomes, most of which were conserved extended haplotypes (CEHs) of European descent or contained their centromeric fragments. We determined the single nucleotide polymorphism and deletion-insertion polymorphism alleles of the dominant sequences from HLA-DQA2 to DAXX for these CEHs. Nine of 13 CEHs remained sufficiently intact to possess a dominant sequence extending at least to DAXX, 230 kb centromeric to HLA-DPB1. We identified the regions centromeric to HLA-DQB1 within which single instances of eight "common" European MHC haplotypes previously sequenced by the MHC Haplotype Project (MHP) were representative of those dominant CEH sequences. Only two MHP haplotypes had a dominant CEH sequence throughout the centromeric and extended class II region and one MHP haplotype did not represent a known European CEH anywhere in the region. We identified the centromeric recombination transition points of other MHP sequences from CEH representation to non-representation. Several CEH pairs or groups shared sequence identity in small blocks but had significantly different (although still conserved for each separate CEH) sequences in surrounding regions. These patterns partly explain strong calculated linkage disequilibrium over only short (tens to hundreds of kilobases) distances in the context of a finite number of observed megabase-length CEHs comprising half a population's haplotypes. Our results provide a clearer picture of European CEH class II allelic structure and population haplotype architecture, improved regional CEH markers, and raise questions concerning regional recombination hotspots.

  11. Dominant Sequences of Human Major Histocompatibility Complex Conserved Extended Haplotypes from HLA-DQA2 to DAXX

    PubMed Central

    Larsen, Charles E.; Alford, Dennis R.; Trautwein, Michael R.; Jalloh, Yanoh K.; Tarnacki, Jennifer L.; Kunnenkeri, Sushruta K.; Fici, Dolores A.; Yunis, Edmond J.; Awdeh, Zuheir L.; Alper, Chester A.

    2014-01-01

    We resequenced and phased 27 kb of DNA within 580 kb of the MHC class II region in 158 population chromosomes, most of which were conserved extended haplotypes (CEHs) of European descent or contained their centromeric fragments. We determined the single nucleotide polymorphism and deletion-insertion polymorphism alleles of the dominant sequences from HLA-DQA2 to DAXX for these CEHs. Nine of 13 CEHs remained sufficiently intact to possess a dominant sequence extending at least to DAXX, 230 kb centromeric to HLA-DPB1. We identified the regions centromeric to HLA-DQB1 within which single instances of eight “common” European MHC haplotypes previously sequenced by the MHC Haplotype Project (MHP) were representative of those dominant CEH sequences. Only two MHP haplotypes had a dominant CEH sequence throughout the centromeric and extended class II region and one MHP haplotype did not represent a known European CEH anywhere in the region. We identified the centromeric recombination transition points of other MHP sequences from CEH representation to non-representation. Several CEH pairs or groups shared sequence identity in small blocks but had significantly different (although still conserved for each separate CEH) sequences in surrounding regions. These patterns partly explain strong calculated linkage disequilibrium over only short (tens to hundreds of kilobases) distances in the context of a finite number of observed megabase-length CEHs comprising half a population's haplotypes. Our results provide a clearer picture of European CEH class II allelic structure and population haplotype architecture, improved regional CEH markers, and raise questions concerning regional recombination hotspots. PMID:25299700

  12. JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

    PubMed

    Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2012-02-15

    We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.

  13. High-throughput sequencing and vaccine design.

    PubMed

    Luciani, F

    2016-04-01

    Next-generation sequencing (NGS) technologies have reshaped genome research. The resulting increase in sequencing depth and resolution has led to an unprecedented level of genomic detail and thus an increasing awareness of the complexity of animal, human and pathogen genomes. This has resulted in new approaches to vaccine research. On the one hand, the increase in genome complexity challenges our ability to study and understand pathogen biology and pathogen-host interactions. On the other hand, the increase in genomic data also provides key information for developing and designing improved vaccines against pathogens that were previously extremely difficult to deal with, such as rapidly mutating RNA viruses or bacteria that have complex interactions with the host immune system. This review describes how the broad application of NGS technologies to genome research is affecting vaccine research. It focuses on implications for the field of viral genomics, and includes recent animal and human studies.

  14. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  15. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing

    PubMed Central

    Jain, Mukesh; Chevala, VVS Narayana; Garg, Rohini

    2014-01-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes. PMID:25151616

  16. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing.

    PubMed

    Jain, Mukesh; Chevala, V V S Narayana; Garg, Rohini

    2014-11-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes.

  17. An evolutionary conserved pattern of 18S rRNA sequence complementarity to mRNA 5′ UTRs and its implications for eukaryotic gene translation regulation

    PubMed Central

    Pánek, Josef; Kolář, Michal; Vohradský, Jiří; Shivaya Valášek, Leoš

    2013-01-01

    There are several key mechanisms regulating eukaryotic gene expression at the level of protein synthesis. Interestingly, the least explored mechanisms of translational control are those that involve the translating ribosome per se, mediated for example via predicted interactions between the ribosomal RNAs (rRNAs) and mRNAs. Here, we took advantage of robustly growing large-scale data sets of mRNA sequences for numerous organisms, solved ribosomal structures and computational power to computationally explore the mRNA–rRNA complementarity that is statistically significant across the species. Our predictions reveal highly specific sequence complementarity of 18S rRNA sequences with mRNA 5′ untranslated regions (UTRs) forming a well-defined 3D pattern on the rRNA sequence of the 40S subunit. Broader evolutionary conservation of this pattern may imply that 5′ UTRs of eukaryotic mRNAs, which have already emerged from the mRNA-binding channel, may contact several complementary spots on 18S rRNA situated near the exit of the mRNA binding channel and on the middle-to-lower body of the solvent-exposed 40S ribosome including its left foot. We discuss physiological significance of this structurally conserved pattern and, in the context of previously published experimental results, propose that it modulates scanning of the 40S subunit through 5′ UTRs of mRNAs. PMID:23804757

  18. An evolutionary conserved pattern of 18S rRNA sequence complementarity to mRNA 5' UTRs and its implications for eukaryotic gene translation regulation.

    PubMed

    Pánek, Josef; Kolár, Michal; Vohradský, Jirí; Shivaya Valásek, Leos

    2013-09-01

    There are several key mechanisms regulating eukaryotic gene expression at the level of protein synthesis. Interestingly, the least explored mechanisms of translational control are those that involve the translating ribosome per se, mediated for example via predicted interactions between the ribosomal RNAs (rRNAs) and mRNAs. Here, we took advantage of robustly growing large-scale data sets of mRNA sequences for numerous organisms, solved ribosomal structures and computational power to computationally explore the mRNA-rRNA complementarity that is statistically significant across the species. Our predictions reveal highly specific sequence complementarity of 18S rRNA sequences with mRNA 5' untranslated regions (UTRs) forming a well-defined 3D pattern on the rRNA sequence of the 40S subunit. Broader evolutionary conservation of this pattern may imply that 5' UTRs of eukaryotic mRNAs, which have already emerged from the mRNA-binding channel, may contact several complementary spots on 18S rRNA situated near the exit of the mRNA binding channel and on the middle-to-lower body of the solvent-exposed 40S ribosome including its left foot. We discuss physiological significance of this structurally conserved pattern and, in the context of previously published experimental results, propose that it modulates scanning of the 40S subunit through 5' UTRs of mRNAs.

  19. Parallel Tagged Next-Generation Sequencing on Pooled Samples – A New Approach for Population Genetics in Ecology and Conservation

    PubMed Central

    Zavodna, Monika; Grueber, Catherine E.; Gemmell, Neil J.

    2013-01-01

    Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches – parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects. PMID:23637841

  20. Parallel tagged next-generation sequencing on pooled samples - a new approach for population genetics in ecology and conservation.

    PubMed

    Zavodna, Monika; Grueber, Catherine E; Gemmell, Neil J

    2013-01-01

    Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches - parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects.

  1. CodaChrome: a tool for the visualization of proteome conservation across all fully sequenced bacterial genomes

    PubMed Central

    2014-01-01

    Background The relationships between bacterial genomes are complicated by rampant horizontal gene transfer, varied selection pressures, acquisition of new genes, loss of genes, and divergence of genes, even in closely related lineages. As more and more bacterial genomes are sequenced, organizing and interpreting the incredible amount of relational information that connects them becomes increasingly difficult. Results We have developed CodaChrome (http://www.sourceforge.com/p/codachrome), a one-versus-all proteome comparison tool that allows the user to visually investigate the relationship between a bacterial proteome of interest and the proteomes encoded by every other bacterial genome recorded in GenBank in a massive interactive heat map. This tool has allowed us to rapidly identify the most highly conserved proteins encoded in the bacterial pan-genome, fast-clock genes useful for subtyping of bacterial species, the evolutionary history of an indel in the Sphingobium lineage, and an example of horizontal gene transfer from a member of the genus Enterococcus to a recent ancestor of Helicobacter pylori. Conclusion CodaChrome is a user-friendly and powerful tool for simultaneously visualizing relationships between thousands of proteomes. PMID:24460813

  2. cDNA sequence, genomic organization, and evolutionary conservation of a novel gene from the WAGR region

    SciTech Connect

    Schwartz, F.; Eisenman, R.; Knoll, J.; Bruns, G.

    1995-09-20

    A new gene (239FB) with predominant and differential expression in fetal brain has recently been isolated from a chromosome 11p13-p14 boundary area near FSHB. The corresponding mRNA has an open reading frame of 294 amino acids, a 3` untranslated region of 1247 nucleotides, and a highly GC-rich 5` untranslated region. The coding and 3` UT sequence is specified by 6 exons within nearly 87 kb of isolated genomic locus. The 5` end region of the transcript maps adjacent to the only genomically defined CpG island in a chromosomal subregion that may be associated with part of the mental retardation of some WAGR (Wilms tumor, aniridia, genitourinary anomalies, and mental retardation) syndrome patients. In addition to nucleotide and amino acid similarity to an EST from a normalized infant brain cDNA library, the predicted protein has extensive similarity to Caenorhbditis elegans polypeptides of, as yet, unknown function. The 239FB locus is, therefore, likely part of a family of genes with two members expressed in human brain. The extensive conservation of the predicted protein suggests a fundamental function of the gene product and will enable evaluation of the role of the 239FB gene in neurogenesis in model organisms. 48 refs., 4 figs., 1 tab.

  3. Highly conserved small subunit residues influence rubisco large subunit catalysis.

    PubMed

    Genkov, Todor; Spreitzer, Robert J

    2009-10-30

    The chloroplast enzyme ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco) catalyzes the rate-limiting step of photosynthetic CO(2) fixation. With a deeper understanding of its structure-function relationships and competitive inhibition by O(2), it may be possible to engineer an increase in agricultural productivity and renewable energy. The chloroplast-encoded large subunits form the active site, but the nuclear-encoded small subunits can also influence catalytic efficiency and CO(2)/O(2) specificity. To further define the role of the small subunit in Rubisco function, the 10 most conserved residues in all small subunits were substituted with alanine by transformation of a Chlamydomonas reinhardtii mutant that lacks the small subunit gene family. All the mutant strains were able to grow photosynthetically, indicating that none of the residues is essential for function. Three of the substitutions have little or no effect (S16A, P19A, and E92A), one primarily affects holoenzyme stability (L18A), and the remainder affect catalysis with or without some level of associated structural instability (Y32A, E43A, W73A, L78A, P79A, and F81A). Y32A and E43A cause decreases in CO(2)/O(2) specificity. Based on the x-ray crystal structure of Chlamydomonas Rubisco, all but one (Glu-92) of the conserved residues are in contact with large subunits and cluster near the amino- or carboxyl-terminal ends of large subunit alpha-helix 8, which is a structural element of the alpha/beta-barrel active site. Small subunit residues Glu-43 and Trp-73 identify a possible structural connection between active site alpha-helix 8 and the highly variable small subunit loop between beta-strands A and B, which can also influence Rubisco CO(2)/O(2) specificity.

  4. Conserved sequence-specific lincRNA-steroid receptor interactions drive transcriptional repression and direct cell fate

    SciTech Connect

    Hudson, William H.; Pickard, Mark R.; de Vera, Ian Mitchelle S.; Kuiper, Emily G.; Mourtada-Maarabouni, Mirna; Conn, Graeme L.; Kojetin, Douglas J.; Williams, Gwyn T.; Ortlund, Eric A.

    2014-12-23

    The majority of the eukaryotic genome is transcribed, generating a significant number of long intergenic noncoding RNAs (lincRNAs). Although lincRNAs represent the most poorly understood product of transcription, recent work has shown lincRNAs fulfill important cellular functions. In addition to low sequence conservation, poor understanding of structural mechanisms driving lincRNA biology hinders systematic prediction of their function. Here we report the molecular requirements for the recognition of steroid receptors (SRs) by the lincRNA growth arrest-specific 5 (Gas5), which regulates steroid-mediated transcriptional regulation, growth arrest and apoptosis. We identify the functional Gas5-SR interface and generate point mutations that ablate the SR-Gas5 lincRNA interaction, altering Gas5-driven apoptosis in cancer cell lines. Further, we find that the Gas5 SR-recognition sequence is conserved among haplorhines, with its evolutionary origin as a splice acceptor site. This study demonstrates that lincRNAs can recognize protein targets in a conserved, sequence-specific manner in order to affect critical cell functions.

  5. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    PubMed

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-01-11

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Sequencing strategy for the whole mitochondrial genome resulting in high quality sequences

    PubMed Central

    Fendt, Liane; Zimmermann, Bettina; Daniaux, Martin; Parson, Walther

    2009-01-01

    Background It has been demonstrated that a reliable and fail-safe sequencing strategy is mandatory for high-quality analysis of mitochondrial (mt) DNA, as the sequencing and base-calling process is prone to error. Here, we present a high quality, reliable and easy handling manual procedure for the sequencing of full mt genomes that is also appropriate for laboratories where fully automated processes are not available. Results We amplified whole mitochondrial genomes as two overlapping PCR-fragments comprising each about 8500 bases in length. We developed a set of 96 primers that can be applied to a (manual) 96 well-based technology, which resulted in at least double strand sequence coverage of the entire coding region (codR). Conclusion This elaborated sequencing strategy is straightforward and allows for an unambiguous sequence analysis and interpretation including sometimes challenging phenomena such as point and length heteroplasmy that are relevant for the investigation of forensic and clinical samples. PMID:19331681

  7. Sequence of the human 40-kDa keratin reveals an unusual structure with very high sequence identity to the corresponding bovine keratin

    SciTech Connect

    Eckert, R.L.

    1988-02-01

    The complete amino acid and DNA sequences of the human 40-kDa keratin are reported. The DNA sequence encodes a protein of 44,098 Da, which is unique in that it lacks the terminal non-..cap alpha..-helical tail segment found in all other keratins. When the human 40-kDa keratin amino acid sequence is compared to the corresponding bovine keratin, the overall identity is 89%. The coil-forming regions are 89% identical and the head regions are 88% identical. This similarity is also evident in the DNA sequence of the coding region, the 5' upstream sequences, and the 3' noncoding sequences. The high degree of cross-species identity between bovine and human 40-kDa keratins suggests that there is strong evolutionary pressure to conserve the structure of this keratin. This in turn suggests an important and universal role for this intermediate filament subunit in all species.

  8. MLST analysis reveals a highly conserved core genome among poultry isolates of Clostridium septicum.

    PubMed

    Neumann, Anthony P; Rehberger, Thomas G

    2009-06-01

    Clostridium septicum is a highly virulent, anaerobic bacterium capable of establishing necrotizing tissue infections and forming heat resistant endospores. Disease is primarily facilitated by secretion of numerous toxic products including a lethal pore-forming cytolysin. Spontaneously occurring clostridial myonecrosis involving C. septicum has recently reemerged as a concern for many poultry producers. However, despite its increasing prevalence, the epidemiology of infection and population structure of C. septicum remains largely unknown. In this study a multilocus sequence typing (MLST) approach was utilized to examine evolutionary relationships within a diverse collection of C. septicum isolates recovered from poultry flocks experiencing episodes of gangrenous dermatitis. The 109 isolates examined represented 42 turkey flocks and 24 different flocks of broiler chickens as well as C. septicum type strain, ATCC 12464. Isolates were recovered predominantly from gangrenous lesions although isolates from livers, gastrointestinal tracts, spleens and blood were included. The loci analyzed were csa, the major lethal toxin produced by C. septicum, and the housekeeping genes gyrA, groEL, dnaK, recA, tpi, ddl, colA and glpK. These loci were included in part because of their previous use in MLST analysis of Clostridium perfringens and Clostridium difficile. Results indicated a high level of conservation present within these housekeeping gene fragments when compared to what has been previously reported for the aforementioned clostridia. Of the 5352 bp of sequence data examined for each isolate, 99.7% (5335/5352) was absolutely conserved among the 109 isolates. Only one of the ten unique sequence types, or allelic profiles, identified among the isolates was recovered from both turkeys and broiler chickens suggesting some host species preference. Phylogenetic analyses identified two unique clusters, or clonal complexes, among these poultry isolates which may have important

  9. High-throughput sequence alignment using Graphics Processing Units.

    PubMed

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-12-10

    The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  10. Mapping the Transcription Start Points of the Staphylococcus aureus eap, emp, and vwb Promoters Reveals a Conserved Octanucleotide Sequence That Is Essential for Expression of These Genes▿ †

    PubMed Central

    Harraghy, Niamh; Homerova, Dagmar; Herrmann, Mathias; Kormanec, Jan

    2008-01-01

    Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp. PMID:17965149

  11. Mapping the transcription start points of the Staphylococcus aureus eap, emp, and vwb promoters reveals a conserved octanucleotide sequence that is essential for expression of these genes.

    PubMed

    Harraghy, Niamh; Homerova, Dagmar; Herrmann, Mathias; Kormanec, Jan

    2008-01-01

    Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp.

  12. Sequence of a cDNA encoding nitrite reductase from the tree Betula pendula and identification of conserved protein regions.

    PubMed

    Friemann, A; Brinkmann, K; Hachtel, W

    1992-02-01

    The sequence of an mRNA encoding nitrite reductase (NiR, EC 1.7.7.1.) from the tree Betula pendula was determined. A cDNA library constructed from leaf poly(A)+ mRNA was screened with an oligonucleotide probe deduced from NiR sequences from spinach and maize. A 2.5 kb cDNA was isolated that hybridized to an mRNA, the steady-state level of which increased markedly upon induction with nitrate. The nucleotide sequence of the cDNA contains a reading frame encoding a protein of 583 amino acids that reveals 79% identity with NiR from spinach. The transit peptide of the NiR precursor from birch was determined to be 22 amino acids in size by sequence comparison with NiR from spinach and maize and is the shortest transit peptide reported so far. A graphical evaluation of identities found in the NiR sequence alignment revealed nine well conserved sections each exceeding ten amino acids in size. Sequence comparisons with related redox proteins identified essential residues involved in cofactor binding. A putative binding site for ferredoxin was found in the N-terminal half of the protein.

  13. B chromosomes of rye are highly conserved and accompanied the development of early agriculture

    PubMed Central

    Marques, André; Banaei-Moghaddam, Ali M.; Klemme, Sonja; Blattner, Frank R.; Niwa, Katsumasa; Guerra, Marcelo; Houben, Andreas

    2013-01-01

    Background and Aims Supernumerary B chromosomes (Bs) represent a specific type of selfish genetic element. As Bs are dispensable for normal growth, it is expected to observe B polymorphisms among populations. To address whether Bs maintained in geographically distinct populations of cultivated and weedy rye are polymorphic, the distribution patterns and the transcriptional activity of different B-located repeats were analysed. Methods Bs of cultivated and weedy rye from seven origins were analysed by fluorescence in situ hybridization (FISH) with probes specific for the pericentromeric and interstitial regions as well as the B-specific non-disjunction control region. The DNA replication, chromatin composition and transcription behaviour of the non-disjunction regions were determined. To address whether the B-marker repeats E3900 and D1100 have diverged genotypes of different origin at the sequence level, the genomic sequences of both repeats were compared between cultivated rye and weedy rye from five different origins. Key Results B chromosomes in cultivated and weedy rye have maintained a similar molecular structure at the level of subspecies. The high degree of conservation of the non-disjunction control region regarding its transcription activity, histone composition and replication underlines the functional importance of this chromosome region for the maintenance of Bs. The conserved chromosome structure suggests a monophyletic origin of the rye B. As Bs were found in different countries, it is likely that Bs were frequently present in the seed material used in early agriculture. Conclusions The surprisingly conserved chromosome structure suggests that although the rye Bs experienced rapid evolution including multiple rearrangements at the early evolutionary stages, this process has slowed significantly and may have even ceased during its recent evolution. PMID:23739836

  14. Automated cleaning and pre-processing of immunoglobulin gene sequences from high-throughput sequencing

    PubMed Central

    Michaeli, Miri; Noga, Hila; Tabibian-Keissar, Hilla; Barshack, Iris; Mehr, Ramit

    2012-01-01

    High-throughput sequencing (HTS) yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig) genes, which are variable and often highly mutated. This paper describes Ig High-Throughput Sequencing Cleaner (Ig-HTS-Cleaner), a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig Insertion—Deletion Identifier (Ig-Indel-Identifier), a program for identifying legitimate and artifact insertions and/or deletions (indels). Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets. PMID:23293637

  15. Domains in microbial beta-1, 4-glycanases: sequence conservation, function, and enzyme families.

    PubMed Central

    Gilkes, N R; Henrissat, B; Kilburn, D G; Miller, R C; Warren, R A

    1991-01-01

    Several types of domain occur in beta-1, 4-glycanases. The best characterized of these are the catalytic domains and the cellulose-binding domains. The domains may be joined by linker sequences rich in proline or hydroxyamino acids or both. Some of the enzymes contain repeated sequences up to 150 amino acids in length. The enzymes can be grouped into families on the basis of sequence similarities between the catalytic domains. There are sequence similarities between the cellulose-binding domains, of which two types have been identified, and also between some domains of unknown function. The beta-1, 4-glycanases appear to have arisen by the shuffling of a relatively small number of progenitor sequences. PMID:1886523

  16. Large distribution and high sequence identity of a Copia-type retrotransposon in angiosperm families.

    PubMed

    Dias, Elaine Silva; Hatt, Clémence; Hamon, Serge; Hamon, Perla; Rigoreau, Michel; Crouzillat, Dominique; Carareto, Claudia Marcia Aparecida; de Kochko, Alexandre; Guyot, Romain

    2015-09-01

    Retrotransposons are the main component of plant genomes. Recent studies have revealed the complexity of their evolutionary dynamics. Here, we have identified Copia25 in Coffea canephora, a new plant retrotransposon belonging to the Ty1-Copia superfamily. In the Coffea genomes analyzed, Copia25 is present in relatively low copy numbers and transcribed. Similarity sequence searches and PCR analyses show that this retrotransposon with LTRs (Long Terminal Repeats) is widely distributed among the Rubiaceae family and that it is also present in other distantly related species belonging to Asterids, Rosids and monocots. A particular situation is the high sequence identity found between the Copia25 sequences of Musa, a monocot, and Ixora, a dicot species (Rubiaceae). Our results reveal the complexity of the evolutionary dynamics of the ancient element Copia25 in angiosperm, involving several processes including sequence conservation, rapid turnover, stochastic losses and horizontal transfer.

  17. Role of Escherichia coli YbeY, a highly conserved protein, in rRNA processing

    PubMed Central

    Davies, Bryan W.; Köhrer, Caroline; Jacob, Asha I.; Simmons, Lyle A.; Zhu, Jianyu; Aleman, Lourdes M.; RajBhandary, Uttam L.; Walker, Graham C.

    2010-01-01

    The UPF0054 protein family is highly conserved with homologs present in nearly every sequenced bacterium. In some bacteria, the respective gene is essential, while in others its loss results in a highly pleiotropic phenotype. Despite detailed structural studies, a cellular role for this protein family has remained unknown. We report here that deletion of the Escherichia coli homolog, YbeY, causes striking defects that affect ribosome activity, translational fidelity and ribosome assembly. Mapping of 16S, 23S and 5S rRNA termini reveals that YbeY influences the maturation of all three rRNAs, with a particularly strong effect on maturation at both the 5′- and 3′-ends of 16S rRNA as well as maturation of the 5′-termini of 23S and 5S rRNAs. Furthermore, we demonstrate strong genetic interactions between ybeY and rnc (encoding RNase III), ybeY and rnr (encoding RNase R), and ybeY and pnp (encoding PNPase), further suggesting a role for YbeY in rRNA maturation. Mutation of highly conserved amino acids in YbeY, allowed the identification of two residues (H114, R59) that were found to have a significant effect in vivo. We discuss the implications of these findings for rRNA maturation and ribosome assembly in bacteria. PMID:20807199

  18. Management of High-Throughput DNA Sequencing Projects: Alpheus

    PubMed Central

    Miller, Neil A.; Kingsmore, Stephen F.; Farmer, Andrew; Langley, Raymond J.; Mudge, Joann; Crow, John A.; Gonzalez, Alvaro J.; Schilkey, Faye D.; Kim, Ryan J.; van Velkinburgh, Jennifer; May, Gregory D.; Black, C. Forrest; Myers, M. Kathy; Utsey, John P.; Frost, Nicholas S.; Sugarbaker, David J.; Bueno, Raphael; Gullans, Stephen R.; Baxter, Susan M.; Day, Steve W.; Retzel, Ernest F.

    2009-01-01

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem’s SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis. PMID:20151039

  19. High-Order Space-Time Methods for Conservation Laws

    NASA Technical Reports Server (NTRS)

    Huynh, H. T.

    2013-01-01

    Current high-order methods such as discontinuous Galerkin and/or flux reconstruction can provide effective discretization for the spatial derivatives. Together with a time discretization, such methods result in either too small a time step size in the case of an explicit scheme or a very large system in the case of an implicit one. To tackle these problems, two new high-order space-time schemes for conservation laws are introduced: the first is explicit and the second, implicit. The explicit method here, also called the moment scheme, achieves a Courant-Friedrichs-Lewy (CFL) condition of 1 for the case of one-spatial dimension regardless of the degree of the polynomial approximation. (For standard explicit methods, if the spatial approximation is of degree p, then the time step sizes are typically proportional to 1/p(exp 2)). Fourier analyses for the one and two-dimensional cases are carried out. The property of super accuracy (or super convergence) is discussed. The implicit method is a simplified but optimal version of the discontinuous Galerkin scheme applied to time. It reduces to a collocation implicit Runge-Kutta (RK) method for ordinary differential equations (ODE) called Radau IIA. The explicit and implicit schemes are closely related since they employ the same intermediate time levels, and the former can serve as a key building block in an iterative procedure for the latter. A limiting technique for the piecewise linear scheme is also discussed. The technique can suppress oscillations near a discontinuity while preserving accuracy near extrema. Preliminary numerical results are shown

  20. Cloning and characterization of a highly repetitive fish nucleotide sequence.

    PubMed

    Datta, U; Dutta, P; Mandal, R K

    1988-01-01

    We have cloned and sequenced a highly repetitive HindIII fragment of DNA from the common carp Cyprinus carpio. It represents a tandemly repeated sequence with a monomeric unit of 245 bp and comprises 8% of the fish genome. Higher units of this monomer appear as a ladder in Southern blots. The monomeric unit has been sequenced; it is A + T-rich with some direct and some inverse-repeat nucleotide clusters.

  1. Structure-sequence based analysis for identification of conserved regions in proteins

    DOEpatents

    Zemla, Adam T; Zhou, Carol E; Lam, Marisa W; Smith, Jason R; Pardes, Elizabeth

    2013-05-28

    Disclosed are computational methods, and associated hardware and software products for scoring conservation in a protein structure based on a computationally identified family or cluster of protein structures. A method of computationally identifying a family or cluster of protein structures in also disclosed herein.

  2. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.

    PubMed

    Laehnemann, David; Borkhardt, Arndt; McHardy, Alice Carolyn

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

  3. The role of evolutionary conserved germline DH sequence in B-1 cell development and natural antibody production

    PubMed Central

    Vale, Andre M.; Nobrega, Alberto; Schroeder, Harry W.

    2015-01-01

    Due to N addition and variation in the site of V–D–J joining, the third complementarity-determining region of the heavy chain (CDR-H3) is the most diverse component of the initial immunoglobulin antigen-binding site repertoire. A large component of the peritoneal cavity B-1 cell component is the product of fetal and perinatal B cell production. The CDR-H3 repertoire is thus depleted of N addition, which increases dependency on germ-line sequence. Cross-species comparisons have shown that DH gene sequence demonstrates conservation of amino acid preferences by reading frame. Preference for reading frame 1, which is enriched for tyrosine and glycine, is created both by rearrangement patterns and by pre-BCR and BCR selection. In previous studies, we have assessed the role of conserved DH sequence by examining peritoneal cavity B-1 cell numbers and antibody production in BALB/c mice with altered DH loci. Here, we review our finding that changes in the constraints normally imposed by germ line–encoded amino acids within the CDR-H3 repertoire profoundly affect B-1 cell development, especially B-1a cells, and thus natural antibody immunity. Our studies suggest that both natural and somatic selection operate to create a restricted B-1 cell CDR-H3 repertoire. PMID:26104486

  4. Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation

    PubMed Central

    Martin, Natalia; Patel, Satyakam; Segre, Julia A.

    2004-01-01

    Mammalian epidermis provides a permeability barrier between an organism and its environment. Under homeostatic conditions, epidermal cells produce structural proteins, which are cross-linked in an orderly fashion to form a cornified envelope (CE). However, under genetic or environmental stress, specific genes are induced to rapidly build a temporary barrier. Small proline-rich (SPRR) proteins are the primary constituents of the CE. Under stress the entire family of 14 Sprr genes is upregulated. The Sprr genes are clustered within the larger epidermal differentiation complex on mouse chromosome 3, human chromosome 1q21. The clustering of the Sprr genes and their upregulation under stress suggest that these genes may be coordinately regulated. To identify enhancer elements that regulate this stress response activation of the Sprr locus, we utilized bioinformatic tools and classical biochemical dissection. Long-range comparative sequence analysis identified conserved noncoding sequences (CNSs). Clusters of epidermal-specific DNaseI-hypersensitive sites (HSs) mapped to specific CNSs. Increased prevalence of these HSs in barrier-deficient epidermis provides in vivo evidence of the regulation of the Sprr locus by these conserved sequences. Individual components of these HSs were cloned, and one was shown to have strong enhancer activity specific to conditions when the Sprr genes are coordinately upregulated. PMID:15574822

  5. The role of evolutionarily conserved germ-line DH sequence in B-1 cell development and natural antibody production.

    PubMed

    Vale, Andre M; Nobrega, Alberto; Schroeder, Harry W

    2015-12-01

    Because of N addition and variation in the site of VDJ joining, the third complementarity-determining region of the heavy chain (CDR-H3) is the most diverse component of the initial immunoglobulin antigen-binding site repertoire. A large component of the peritoneal cavity B-1 cell component is the product of fetal and perinatal B cell production. The CDR-H3 repertoire is thus depleted of N addition, which increases dependency on germ-line sequence. Cross-species comparisons have shown that DH gene sequence demonstrates conservation of amino acid preferences by reading frame. Preference for reading frame 1, which is enriched for tyrosine and glycine, is created both by rearrangement patterns and by pre-BCR and BCR selection. In previous studies, we have assessed the role of conserved DH sequence by examining peritoneal cavity B-1 cell numbers and antibody production in BALB/c mice with altered DH loci. Here, we review our finding that changes in the constraints normally imposed by germ-line-encoded amino acids within the CDR-H3 repertoire profoundly affect B-1 cell development, especially B-1a cells, and thus natural antibody immunity. Our studies suggest that both natural and somatic selection operate to create a restricted B-1 cell CDR-H3 repertoire.

  6. Conserved hypothetical protein Rv1977 in Mycobacterium tuberculosis strains contains sequence polymorphisms and might be involved in ongoing immune evasion.

    PubMed

    Jiang, Yi; Liu, Haican; Wang, Xuezhi; Li, Guilian; Qiu, Yan; Dou, Xiangfeng; Wan, Kanglin

    2015-01-01

    Host immune pressure and associated parasite immune evasion are key features of host-pathogen co-evolution. A previous study showed that human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved and thus it was deduced that M. tuberculosis lacks antigenic variation and immune evasion. Here, we selected 151 clinical Mycobacterium tuberculosis isolates from China, amplified gene encoding Rv1977 and compared the sequences. The results showed that Rv1977, a conserved hypothetical protein, is not conserved in M. tuberculosis strains and there are polymorphisms existed in the protein. Some mutations, especially one frameshift mutation, occurred in the antigen Rv1977, which is uncommon in M.tb strains and may lead to the protein function altering. Mutations and deletion in the gene all affect one of three T cell epitopes and the changed T cell epitope contained more than one variable position, which may suggest ongoing immune evasion.

  7. Use of ancient sedimentary DNA as a novel conservation tool for high-altitude tropical biodiversity.

    PubMed

    Boessenkool, Sanne; McGlynn, Gayle; Epp, Laura S; Taylor, David; Pimentel, Manuel; Gizaw, Abel; Nemomissa, Sileshi; Brochmann, Christian; Popp, Magnus

    2014-04-01

    Conservation of biodiversity may in the future increasingly depend upon the availability of scientific information to set suitable restoration targets. In traditional paleoecology, sediment-based pollen provides a means to define preanthropogenic impact conditions, but problems in establishing the exact provenance and ecologically meaningful levels of taxonomic resolution of the evidence are limiting. We explored the extent to which the use of sedimentary ancient DNA (sedaDNA) may complement pollen data in reconstructing past alpine environments in the tropics. We constructed a record of afro-alpine plants retrieved from DNA preserved in sediment cores from 2 volcanic crater sites in the Albertine Rift, eastern Africa. The record extended well beyond the onset of substantial anthropogenic effects on tropical mountains. To ensure high-quality taxonomic inference from the sedaDNA sequences, we built an extensive DNA reference library covering the majority of the afro-alpine flora, by sequencing DNA from taxonomically verified specimens. Comparisons with pollen records from the same sediment cores showed that plant diversity recovered with sedaDNA improved vegetation reconstructions based on pollen records by revealing both additional taxa and providing increased taxonomic resolution. Furthermore, combining the 2 measures assisted in distinguishing vegetation change at different geographic scales; sedaDNA almost exclusively reflects local vegetation, whereas pollen can potentially originate from a wide area that in highlands in particular can span several ecozones. Our results suggest that sedaDNA may provide information on restoration targets and the nature and magnitude of human-induced environmental changes, including in high conservation priority, biodiversity hotspots, where understanding of preanthropogenic impact (or reference) conditions is highly limited.

  8. Specific binding of eukaryotic ORC to DNA replication origins depends on highly conserved basic residues.

    PubMed

    Kawakami, Hironori; Ohashi, Eiji; Kanamoto, Shota; Tsurimoto, Toshiki; Katayama, Tsutomu

    2015-10-12

    In eukaryotes, the origin recognition complex (ORC) heterohexamer preferentially binds replication origins to trigger initiation of DNA replication. Crystallographic studies using eubacterial and archaeal ORC orthologs suggested that eukaryotic ORC may bind to origin DNA via putative winged-helix DNA-binding domains and AAA+ ATPase domains. However, the mechanisms how eukaryotic ORC recognizes origin DNA remain elusive. Here, we show in budding yeast that Lys-362 and Arg-367 residues of the largest subunit (Orc1), both outside the aforementioned domains, are crucial for specific binding of ORC to origin DNA. These basic residues, which reside in a putative disordered domain, were dispensable for interaction with ATP and non-specific DNA sequences, suggesting a specific role in recognition. Consistent with this, both residues were required for origin binding of Orc1 in vivo. A truncated Orc1 polypeptide containing these residues solely recognizes ARS sequence with low affinity and Arg-367 residue stimulates sequence specific binding mode of the polypeptide. Lys-362 and Arg-367 residues of Orc1 are highly conserved among eukaryotic ORCs, but not in eubacterial and archaeal orthologs, suggesting a eukaryote-specific mechanism underlying recognition of replication origins by ORC.

  9. The human HNRPD locus maps to 4q21 and encodes a highly conserved protein.

    PubMed

    Dempsey, L A; Li, M J; DePace, A; Bray-Ward, P; Maizels, N

    1998-05-01

    The hnRNP D protein interacts with nucleic acids both in vivo and in vitro. Like many other proteins that interact with RNA, it contains RBD (or "RRM") domains and arg-gly-gly (RGG) motifs. We have examined the organization and localization of the human and murine genes that encode the hnRNP D protein. Comparison of the predicted sequences of the hnRNP D proteins in human and mouse shows that they are 96.9% identical (98.9% similar). This very high level of conservation suggests a critical function for hnRNP D. Sequence analysis of the human HNRPD gene shows that the protein is encoded by eight exons and that two additional exons specify sequences in the 3' UTR. Use of two of the coding exons is determined by alternative splicing of the HNRPD mRNA. The human HNRPD gene maps to 4q21. The mouse Hnrpd gene maps to the F region of chromosome 3, which is syntenic with the human 4q21 region.

  10. Mammalian ets-1 and ets-2 genes encode highly conserved proteins.

    PubMed Central

    Watson, D K; McWilliams, M J; Lapis, P; Lautenberger, J A; Schweinfest, C W; Papas, T S

    1988-01-01

    Cellular ets sequences homologous to v-ets of the avian leukemia virus E26 are highly conserved. In mammals the ets sequences are dispersed on two separate chromosomal loci, called ets-1 and ets-2. To determine the structure of these two genes and identify the open reading frames that code for the putative proteins, we have sequenced human ets-1 cDNAs and ets-2 cDNA clones obtained from both human and mouse. The human ETS1 gene is capable of encoding a protein of 441 amino acids. This protein is greater than 95% identical to the chicken c-ets-1 gene product. Thus, the human ETS1 gene is homologous to the chicken c-ets-1 gene, the protooncogene that the E26 virus transduced. Human and mouse ets-2 cDNA clones are closely related and contain open reading frames capable of encoding proteins of 469 and 468 residues, respectively. Direct comparison of these data with previously published findings indicates that ets is a family of genes whose members share distinct domains. PMID:2847145

  11. Specific binding of eukaryotic ORC to DNA replication origins depends on highly conserved basic residues

    PubMed Central

    Kawakami, Hironori; Ohashi, Eiji; Kanamoto, Shota; Tsurimoto, Toshiki; Katayama, Tsutomu

    2015-01-01

    In eukaryotes, the origin recognition complex (ORC) heterohexamer preferentially binds replication origins to trigger initiation of DNA replication. Crystallographic studies using eubacterial and archaeal ORC orthologs suggested that eukaryotic ORC may bind to origin DNA via putative winged-helix DNA-binding domains and AAA+ ATPase domains. However, the mechanisms how eukaryotic ORC recognizes origin DNA remain elusive. Here, we show in budding yeast that Lys-362 and Arg-367 residues of the largest subunit (Orc1), both outside the aforementioned domains, are crucial for specific binding of ORC to origin DNA. These basic residues, which reside in a putative disordered domain, were dispensable for interaction with ATP and non-specific DNA sequences, suggesting a specific role in recognition. Consistent with this, both residues were required for origin binding of Orc1 in vivo. A truncated Orc1 polypeptide containing these residues solely recognizes ARS sequence with low affinity and Arg-367 residue stimulates sequence specific binding mode of the polypeptide. Lys-362 and Arg-367 residues of Orc1 are highly conserved among eukaryotic ORCs, but not in eubacterial and archaeal orthologs, suggesting a eukaryote-specific mechanism underlying recognition of replication origins by ORC. PMID:26456755

  12. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  13. The sequence of learning cycle activities in high school chemistry

    NASA Astrophysics Data System (ADS)

    Abraham, Michael R.; Renner, John W.

    The sequence of the three phases of two high school learning cycles in chemistry was altered in order to: (I ) give insights into the factors which account for the success of the learning cycle, (2) serve as an indirect test of the association between Piaget's theory and the learning cycle, and (3) to compare the learning cycle with traditional instruction. Each of the six sequences (one n o d and five altered) was studied with content and atritudc measures. The outcomes of the study supported the contention that the normal learning cycle sequence is the optimum sequence for achievement of content knowledge.

  14. Species identification using genetic tools: the value of nuclear and mitochondrial gene sequences in whale conservation.

    PubMed

    Palumbi, S R; Cipriano, F

    1998-01-01

    DNA sequence analysis is a powerful tool for identifying the source of samples thought to be derived from threatened or endangered species. Analysis of mitochondrial DNA (mtDNA) from retail whale meat markets has shown consistently that the expected baleen whale in these markets, the minke whale, makes up only about half the products analyzed. The other products are either unregulated small toothed whales like dolphins or are protected baleen whales such as humpback, Bryde's, fin, or blue whales. Independent verification of such mtDNA identifications requires analysis of nuclear genetic loci, but this is technically more difficult than standard mtDNA sequencing. In addition, evolution of species-specific sequences (i.e., fixation of sequence differences to produce reciprocally monophyletic gene trees) is slower in nuclear than in mitochondrial genes primarily because genetic drift is slower at nuclear loci. When will use of nuclear sequences allow forensic DNA identification? Comparison of neutral theories of coalescence of mitochondrial and nuclear loci suggests a simple rule of thumb. The "three-times rule" suggests that phylogenetic sorting at nuclear loci is likely to produce species-specific sequences when mitochondrial alleles are reciprocally monophyletic and the branches leading to the mtDNA sequences of a species are three times longer than the average difference observed within species. A preliminary test of the three-times rule, which depends on many assumptions about the species and genes involved, suggests that blue and fin whales should have species-specific sequences at most neutral nuclear loci, whereas humpback and fin whales should show species-specific sequences at fewer nuclear loci. Partial sequences of actin introns from these species confirm the predictions of the three-times rule and show that blue and fin whales are reciprocally monophyletic at this locus. These intron sequences are thus good tools for the identification of these species

  15. Sex-biased gene expression and sequence conservation in Atlantic and Pacific salmon lice (Lepeophtheirus salmonis).

    PubMed

    Poley, Jordan D; Sutherland, Ben J G; Jones, Simon R M; Koop, Ben F; Fast, Mark D

    2016-07-04

    Salmon lice, Lepeophtheirus salmonis (Copepoda: Caligidae), are highly important ectoparasites of farmed and wild salmonids, and cause multi-million dollar losses to the salmon aquaculture industry annually. Salmon lice display extensive sexual dimorphism in ontogeny, morphology, physiology, behavior, and more. Therefore, the identification of transcripts with differential expression between males and females (sex-biased transcripts) may help elucidate the relationship between sexual selection and sexually dimorphic characteristics. Sex-biased transcripts were identified from transcriptome analyses of three L. salmonis populations, including both Atlantic and Pacific subspecies. A total of 35-43 % of all quality-filtered transcripts were sex-biased in L. salmonis, with male-biased transcripts exhibiting higher fold change than female-biased transcripts. For Gene Ontology and functional analyses, a consensus-based approach was used to identify concordantly differentially expressed sex-biased transcripts across the three populations. A total of 127 male-specific transcripts (i.e. those without detectable expression in any female) were identified, and were enriched with reproductive functions (e.g. seminal fluid and male accessory gland proteins). Other sex-biased transcripts involved in morphogenesis, feeding, energy generation, and sensory and immune system development and function were also identified. Interestingly, as observed in model systems, male-biased L. salmonis transcripts were more frequently without annotation compared to female-biased or unbiased transcripts, suggesting higher rates of sequence divergence in male-biased transcripts. Transcriptome differences between male and female L. salmonis described here provide key insights into the molecular mechanisms controlling sexual dimorphism in L. salmonis. This analysis offers targets for parasite control and provides a foundation for further analyses exploring critical topics such as the interaction

  16. Neurofibromatosis type 1 gene mutation analysis using sequence capture and high-throughput sequencing.

    PubMed

    Uusitalo, Elina; Hammais, Anna; Palonen, Elina; Brandt, Annika; Mäkelä, Ville-Veikko; Kallionpää, Roope; Jouhilahti, Eeva-Mari; Pöyhönen, Minna; Soini, Juhani; Peltonen, Juha; Peltonen, Sirkku

    2014-11-01

    Neurofibromatosis type 1 syndrome (NF1) is caused by mutations in the NF1 gene. Availability of new sequencing technology prompted us to search for an alternative method for NF1 mutation analysis. Genomic DNA was isolated from saliva avoiding invasive sampling. The NF1 exons with an additional 50bp of flanking intronic sequences were captured and enriched using the SeqCap EZ Choice Library protocol. The captured DNA was sequenced with the Roche/454 GS Junior system. The mean coverages of the targeted regions were 41x and 74x in 2 separate sets of samples. An NF1 mutation was discovered in 10 out of 16 separate patient samples. Our study provides proof of principle that the sequence capture methodology combined with high-throughput sequencing is applicable to NF1 mutation analysis. Deep intronic mutations may however remain undetectable, and change at the DNA level may not predict the outcome at the mRNA or protein levels.

  17. Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing.

    PubMed

    Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre

    2017-04-01

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.

  18. Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs.

    PubMed

    Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao

    2014-09-01

    Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms.

  19. A New DNA Binding Protein Highly Conserved in Diverse Crenarchaeal Viruses

    SciTech Connect

    Larson, E.T.; Eilers, B.J.; Reiter, D.; Ortmann, A.C.; Young, M.J.; Lawrence, C.M.; /Montana State U. /Tubingen U.

    2007-07-09

    Sulfolobus turreted icosahedral virus (STIV) infects Sulfolobus species found in the hot springs of Yellowstone National Park. Its 37 open reading frames (ORFs) generally lack sequence similarity to other genes. One exception, however, is ORF B116. While its function is unknown, orthologs are found in three additional crenarchaeal viral families. Due to the central importance of this protein family to crenarchaeal viruses, we have undertaken structural and biochemical studies of B116. The structure reveals a previously unobserved fold consisting of a five-stranded beta-sheet flanked on one side by three alpha helices. Two subunits come together to form a homodimer with a 10-stranded mixed beta-sheet, where the topology of the central strands resembles an unclosed beta-barrel. Highly conserved loops rise above the surface of the saddle-shaped protein and suggest an interaction with the major groove of DNA. The predicted B116-DNA interaction is confirmed by electrophoretic mobility shift assays.

  20. 7 CFR 760.821 - Compliance with highly erodible land and wetland conservation.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 7 2014-01-01 2014-01-01 false Compliance with highly erodible land and wetland... Disaster Program § 760.821 Compliance with highly erodible land and wetland conservation. (a) The highly erodible land and wetland conservation provisions of part 12 of this title apply to the receipt of...

  1. 7 CFR 760.821 - Compliance with highly erodible land and wetland conservation.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 7 2011-01-01 2011-01-01 false Compliance with highly erodible land and wetland... Disaster Program § 760.821 Compliance with highly erodible land and wetland conservation. (a) The highly erodible land and wetland conservation provisions of part 12 of this title apply to the receipt of...

  2. 7 CFR 760.821 - Compliance with highly erodible land and wetland conservation.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 7 2012-01-01 2012-01-01 false Compliance with highly erodible land and wetland... Disaster Program § 760.821 Compliance with highly erodible land and wetland conservation. (a) The highly erodible land and wetland conservation provisions of part 12 of this title apply to the receipt of...

  3. 7 CFR 760.821 - Compliance with highly erodible land and wetland conservation.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 7 2010-01-01 2010-01-01 false Compliance with highly erodible land and wetland... Disaster Program § 760.821 Compliance with highly erodible land and wetland conservation. (a) The highly erodible land and wetland conservation provisions of part 12 of this title apply to the receipt of...

  4. 7 CFR 760.821 - Compliance with highly erodible land and wetland conservation.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 7 2013-01-01 2013-01-01 false Compliance with highly erodible land and wetland... Disaster Program § 760.821 Compliance with highly erodible land and wetland conservation. (a) The highly erodible land and wetland conservation provisions of part 12 of this title apply to the receipt of...

  5. Conservation of nucleotide sequences for molecular diagnosis of Middle East respiratory syndrome coronavirus, 2015.

    PubMed

    Furuse, Yuki; Okamoto, Michiko; Oshitani, Hitoshi

    2015-11-01

    Infection due to the Middle East respiratory syndrome coronavirus (MERS-CoV) is widespread. The present study was performed to assess the protocols used for the molecular diagnosis of MERS-CoV by analyzing the nucleotide sequences of viruses detected between 2012 and 2015, including sequences from the large outbreak in eastern Asia in 2015. Although the diagnostic protocols were established only 2 years ago, mismatches between the sequences of primers/probes and viruses were found for several of the assays. Such mismatches could lead to a lower sensitivity of the assay, thereby leading to false-negative diagnosis. A slight modification in the primer design is suggested. Protocols for the molecular diagnosis of viral infections should be reviewed regularly after they are established, particularly for viruses that pose a great threat to public health such as MERS-CoV.

  6. A highly conserved NB-LRR encoding gene cluster effective against Setosphaeria turcica in sorghum

    PubMed Central

    2011-01-01

    Background The fungal pathogen Setosphaeria turcica causes turcicum or northern leaf blight disease on maize, sorghum and related grasses. A prevalent foliar disease found worldwide where the two host crops, maize and sorghum are grown. The aim of the present study was to find genes controlling the host defense response to this devastating plant pathogen. A cDNA-AFLP approach was taken to identify candidate sequences, which functions were further validated via virus induced gene silencing (VIGS), and real-time PCR analysis. Phylogenetic analysis was performed to address evolutionary events. Results cDNA-AFLP analysis was run on susceptible and resistant sorghum and maize genotypes to identify resistance-related sequences. One CC-NB-LRR encoding gene GRMZM2G005347 was found among the up-regulated maize transcripts after fungal challenge. The new plant resistance gene was designated as St referring to S. turcica. Genome sequence comparison revealed that the CC-NB-LRR encoding St genes are located on chromosome 2 in maize, and on chromosome 5 in sorghum. The six St sorghum genes reside in three pairs in one locus. When the sorghum St genes were silenced via VIGS, the resistance was clearly compromised, an observation that was supported by real-time PCR. Database searches and phylogenetic analysis suggest that the St genes have a common ancestor present before the grass subfamily split 50-70 million years ago. Today, 6 genes are present in sorghum, 9 in rice and foxtail millet, respectively, 3 in maize and 4 in Brachypodium distachyon. The St gene homologs have all highly conserved sequences, and commonly reside as gene pairs in the grass genomes. Conclusions Resistance genes to S. turcica, with a CC-NB-LRR protein domain architecture, have been found in maize and sorghum. VIGS analysis revealed their importance in the surveillance to S. turcica in sorghum. The St genes are highly conserved in sorghum, rice, foxtail millet, maize and Brachypodium, suggesting an

  7. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    SciTech Connect

    Le Coq, Johanne; Ghosh, Partho

    2012-06-19

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  8. Evolution of ITS1 rDNA in the Digenea (Platyhelminthes: trematoda): 3' end sequence conservation and its phylogenetic utility.

    PubMed

    vd Schulenburg, J H; Englisch, U; Wägele, J W

    1999-01-01

    A comparison of ribosomal internal transcribed spacer 1 (ITS1) elements of digenetic trematodes (Platyhelminthes) including unidentified digeneans isolated from Cyathura carinata (Crustacea: Isopoda) revealed DNA sequence similarities at more than half of the spacer at its 3' end. Primary sequence similarity was shown to be associated with secondary structure conservation, which suggested that similarity is due to identity by descent and not chance. Using an analysis of apomorphies, the sequence data were shown to produce a distinct phylogenetic signal. This was confirmed by the consistency of results of different tree reconstruction methods such as distance approaches, maximum parsimony, and maximum likelihood. Morphological evidence additionally supported the phylogenetic tree based on ITS1 data and the inferred phylogenetic position of the unidentified digeneans of C. carinata met the expectations from known trematode life-cycle patterns. Although ribosomal ITS1 elements are generally believed to be too variable for phylogenetic analysis above the species or genus level, the overall consistency of the results of this study strongly suggests that this is not the case in digenetic trematodes. Here, 3' end ITS1 sequence data seem to provide a valuable tool for elucidating phylogenetic relationships of a broad range of phylogenetically distinct taxa.

  9. Length heterogeneity at conserved sequence block 2 in human mitochondrial DNA acts as a rheostat for RNA polymerase POLRMT activity

    PubMed Central

    Tan, Benedict G.; Wellesley, Frederick C.; Savery, Nigel J.; Szczelkun, Mark D.

    2016-01-01

    The guanine (G)-tract of conserved sequence block 2 (CSB 2) in human mitochondrial DNA can result in transcription termination due to formation of a hybrid G-quadruplex between the nascent RNA and the nontemplate DNA strand. This structure can then influence genome replication, stability and localization. Here we surveyed the frequency of variation in sequence identity and length at CSB 2 amongst human mitochondrial genomes and used in vitro transcription to assess the effects of this length heterogeneity on the activity of the mitochondrial RNA polymerase, POLRMT. In general, increased G-tract length correlated with increased termination levels. However, variation in the population favoured CSB 2 sequences which produced efficient termination while particularly weak or strong signals were avoided. For all variants examined, the 3′ end of the transcripts mapped to the same downstream sequences and were prevented from terminating by addition of the transcription factor TEFM. We propose that CSB 2 length heterogeneity allows variation in the efficiency of transcription termination without affecting the position of the products or the capacity for regulation by TEFM. PMID:27436287

  10. Complete sequence of the mitochondrial DNA in the sea urchin Arbacia lixula: conserved features of the echinoid mitochondrial genome.

    PubMed

    De Giorgi, C; Martiradonna, A; Lanave, C; Saccone, C

    1996-04-01

    The complete nucleotide sequence (15,719 nucleotides) of the mitochondrial DNA (mtDNA) from the sea urchin Arbacia lixula is presented. The comparison of gene arrangement between different echinoderm orders of the same class provides evidence that the gene organization is conserved within the same echinoderm class. The peculiarities of sea urchin mtDNA features, already described, are confirmed by the A. lixula mtDNA sequence. The comparison of the entire sequences of mtDNA among A. lixula, Paracentrotus lividus, and Strongylocentrotus purpuratus allowed us to detect peculiar features, common to the three sea urchin species, that can represent the molecular signature of the mt genome in the sea urchin group. Analysis of the nucleotide composition indicates that A. lixula mtDNA, in contrast with the mtDNA of other sea urchins, shows a bias in the use of T and tends to avoid the use of C, most evident in the neutral part of the molecule, such as the third codon positions. This observation indicates that the three sea urchin mtDNAs evolve under different mutation pressure. Analysis of the sequence evolution allowed us to confirm the phylogenetic tree. However, the absolute divergence time, calculated on the basis of paleontological estimates, largely diverged from the expected one.

  11. A conserved unusual posttranscriptional processing mediated by short, direct repeated (SDR) sequences in plants.

    PubMed

    Niu, Xiangli; Luo, Di; Gao, Shaopei; Ren, Guangjun; Chang, Lijuan; Zhou, Yuke; Luo, Xiaoli; Li, Yuxiang; Hou, Pei; Tang, Wei; Lu, Bao-Rong; Liu, Yongsheng

    2010-01-01

    In several stress responsive gene loci of monocot cereal crops, we have previously identified an unusual posttranscriptional processing mediated by paired presence of short direct repeated (SDR) sequences at 5' and 3' splicing junctions that are distinct from conventional (U2/U12-type) splicing boundaries. By using the known SDR-containing sequences as probes, 24 plant candidate genes involved in diverse functional pathways from both monocots and dicots that potentially possess SDR-mediated posttranscriptional processing were predicted in the GenBank database. The SDRs-mediated posttranscriptional processing events including cis- and trans-actions were experimentally detected in majority of the predicted candidates. Extensive sequence analysis demonstrates several types of SDR-associated splicing peculiarities including partial exon deletion, exon fragment repetition, exon fragment scrambling and trans-splicing that result in either loss of partial exon or unusual exonic sequence rearrangements within or between RNA molecules. In addition, we show that the paired presence of SDR is necessary but not sufficient in SDR-mediated splicing in transient expression and stable transformation systems. We also show prokaryote is incapable of SDR-mediated premRNA splicing.

  12. Conservation and Recombination in the Genome Sequence of Haemophilus influenzae Type f WAPHL1.

    PubMed

    Bateman, Allen C; Perez-Osorio, Ailyn C; Li, Zhen; Tran, Michael; Greninger, Alexander L

    2017-09-21

    We report here the second draft genome sequence of a bloodstream isolate of Haemophilus influenzae serotype f. Three discrete 3.1- to 7.8-kb sites contained 80% of the variability in the genome, consistent with recombination in known virulence factors. Copyright © 2017 Bateman et al.

  13. A highly conserved nucleotide string shared by all genomes of human papillomaviruses.

    PubMed

    Campione-Piccardo, J; Montpetit, M L; Grégoire, L; Arella, M

    1991-10-01

    The nucleotide string TAAAACGAAAGT is the longest perfect homology shared by all sequenced human papillomavirus genomes. This nucleotide string, which was also found to be highly specific for human papillomavirus genomes, shares the same genomic position in all viral types (5' end of the E1 open reading frame) and putatively codes in every case for the same amino acids. One possible evolutionary model was used to estimate the probability of random occurrence of the nucleotide string in 10 human papillomavirus genomes. It assumed that the universal string had been subjected to the same mutation rate as the entire E1 open reading frame. The estimated probability was found to be very low, suggesting that the conservation of the string could not have resulted from random divergence and that its conservation among human papillomaviruses is likely to reflect the occurrence of biological constraints. It is speculated that this nucleotide string may be required to code for amino acids indispensable for the nuclear localization of E1-coded peptides or to bind cellular factors affecting viral replicative functions. Definitive evidence is expected to come from oligonucleotide-protein binding experiments and from site-directed mutagenesis of cloned HPV genomes. This motif, universal among human papillomaviruses, is being successfully used in the design of consensus primers from the early region.

  14. Structure-function studies of nerve growth factor: functional importance of highly conserved amino acid residues.

    PubMed Central

    Ibáñez, C F; Hallböök, F; Ebendal, T; Persson, H

    1990-01-01

    Selected amino acid residues in chicken nerve growth factor (NGF) were replaced by site-directed mutagenesis. Mutated NGF sequences were transiently expressed in COS cells and the yield of NGF protein in conditioned medium was quantified by Western blotting. Binding of each mutant to NGF receptors on PC12 cells was evaluated in a competition assay. The biological activity was determined by measuring stimulation of neurite outgrowth from chick sympathetic ganglia. The residues homologous to the proposed receptor binding site of insulin (Ser18, Met19, Val21, Asp23) were substituted by Ala. Replacement of Ser18, Met19 and Asp23 did not affect NGF activity. Modification of Val21 notably reduced both receptor binding and biological activity, suggesting that this residue is important to retain a fully active NGF. The highly conserved Tyr51 and Arg99 were converted into Phe and Lys respectively, without changing the biological properties of the molecule. However, binding and biological activity were greatly impaired after the simultaneous replacement of both Arg99 and Arg102 by Gly. The three conserved Trp residues at positions 20, 75 and 98 were substituted by Phe. The Trp mutated proteins retained 15-60% of receptor binding and 40-80% of biological activity, indicating that the Trp residues are not essential for NGF activity. However, replacement of Trp20 significantly reduced the amount of NGF in the medium, suggesting that this residue may be important for protein stability. Images Fig. 4. PMID:2328722

  15. A highly conserved SOX6 double binding site mediates SOX6 gene downregulation in erythroid cells

    PubMed Central

    Cantu', Claudio; Grande, Vito; Alborelli, Ilaria; Cassinelli, Letizia; Cantu’, Ileana; Colzani, Maria Teresa; Ierardi, Rossella; Ronzoni, Luisa; Cappellini, Maria Domenica; Ferrari, Giuliana; Ottolenghi, Sergio; Ronchi, Antonella

    2011-01-01

    The Sox6 transcription factor plays critical roles in various cell types, including erythroid cells. Sox6-deficient mice are anemic due to impaired red cell maturation and show inappropriate globin gene expression in definitive erythrocytes. To identify new Sox6 target genes in erythroid cells, we used the known repressive double Sox6 consensus within the εy-globin promoter to perform a bioinformatic genome-wide search for similar, evolutionarily conserved motifs located within genes whose expression changes during erythropoiesis. We found a highly conserved Sox6 consensus within the Sox6 human gene promoter itself. This sequence is bound by Sox6 in vitro and in vivo, and mediates transcriptional repression in transient transfections in human erythroleukemic K562 cells and in primary erythroblasts. The binding of a lentiviral transduced Sox6FLAG protein to the endogenous Sox6 promoter is accompanied, in erythroid cells, by strong downregulation of the endogenous Sox6 transcript and by decreased in vivo chromatin accessibility of this region to the PstI restriction enzyme. These observations suggest that the negative Sox6 autoregulation, mediated by the double Sox6 binding site within its own promoter, may be relevant to control the Sox6 transcriptional downregulation that we observe in human erythroid cultures and in mouse bone marrow cells in late erythroid maturation. PMID:20852263

  16. Sequence divergence and conservation in genomes of Helicobacter cetorum strains from a dolphin and a whale.

    PubMed

    Kersulyte, Dangeruta; Rossi, Mirko; Berg, Douglas E

    2013-01-01

    Strains of Helicobacter cetorum have been cultured from several marine mammals and have been found to be closely related in 16 S rDNA sequence to the human gastric pathogen H. pylori, but their genomes were not characterized further. The genomes of H. cetorum strains from a dolphin and a whale were sequenced completely using 454 technology and PCR and capillary sequencing. These genomes are 1.8 and 1.95 mb in size, some 7-26% larger than H. pylori genomes, and differ markedly from one another in gene content, and sequences and arrangements of shared genes. However, each strain is more related overall to H. pylori and its descendant H. acinonychis than to other known species. These H. cetorum strains lack cag pathogenicity islands, but contain novel alleles of the virulence-associated vacuolating cytotoxin (vacA) gene. Of particular note are (i) an extra triplet of vacA genes with ≤50% protein-level identity to each other in the 5' two-thirds of the gene needed for host factor interaction; (ii) divergent sets of outer membrane protein genes; (iii) several metabolic genes distinct from those of H. pylori; (iv) genes for an iron-cofactored urease related to those of Helicobacter species from terrestrial carnivores, in addition to genes for a nickel co-factored urease; and (v) members of the slr multigene family, some of which modulate host responses to infection and improve Helicobacter growth with mammalian cells. Our genome sequence data provide a glimpse into the novelty and great genetic diversity of marine helicobacters. These data should aid further analyses of microbial genome diversity and evolution and infection and disease mechanisms in vast and often fragile ocean ecosystems.

  17. High-Resolution Genuinely Multidimensional Solution of Conservation Laws by the Space-Time Conservation Element and Solution Element Method

    NASA Technical Reports Server (NTRS)

    Himansu, Ananda; Chang, Sin-Chung; Yu, Sheng-Tao; Wang, Xiao-Yen; Loh, Ching-Yuen; Jorgenson, Philip C. E.

    1999-01-01

    In this overview paper, we review the basic principles of the method of space-time conservation element and solution element for solving the conservation laws in one and two spatial dimensions. The present method is developed on the basis of local and global flux conservation in a space-time domain, in which space and time are treated in a unified manner. In contrast to the modern upwind schemes, the approach here does not use the Riemann solver and the reconstruction procedure as the building blocks. The drawbacks of the upwind approach, such as the difficulty of rationally extending the 1D scalar approach to systems of equations and particularly to multiple dimensions is here contrasted with the uniformity and ease of generalization of the Conservation Element and Solution Element (CE/SE) 1D scalar schemes to systems of equations and to multiple spatial dimensions. The assured compatibility with the simplest type of unstructured meshes, and the uniquely simple nonreflecting boundary conditions of the present method are also discussed. The present approach has yielded high-resolution shocks, rarefaction waves, acoustic waves, vortices, ZND detonation waves, and shock/acoustic waves/vortices interactions. Moreover, since no directional splitting is employed, numerical resolution of two-dimensional calculations is comparable to that of the one-dimensional calculations. Some sample applications displaying the strengths and broad applicability of the CE/SE method are reviewed.

  18. Sequence Similarity of Clostridium difficile Strains by Analysis of Conserved Genes and Genome Content Is Reflected by Their Ribotype Affiliation

    PubMed Central

    Kurka, Hedwig; Ehrenreich, Armin; Ludwig, Wolfgang; Monot, Marc; Rupnik, Maja; Barbut, Frederic; Indra, Alexander; Dupuy, Bruno; Liebl, Wolfgang

    2014-01-01

    PCR-ribotyping is a broadly used method for the classification of isolates of Clostridium difficile, an emerging intestinal pathogen, causing infections with increased disease severity and incidence in several European and North American countries. We have now carried out clustering analysis with selected genes of numerous C. difficile strains as well as gene content comparisons of their genomes in order to broaden our view of the relatedness of strains assigned to different ribotypes. We analyzed the genomic content of 48 C. difficile strains representing 21 different ribotypes. The calculation of distance matrix-based dendrograms using the neighbor joining method for 14 conserved genes (standard phylogenetic marker genes) from the genomes of the C. difficile strains demonstrated that the genes from strains with the same ribotype generally clustered together. Further, certain ribotypes always clustered together and formed ribotype groups, i.e. ribotypes 078, 033 and 126, as well as ribotypes 002 and 017, indicating their relatedness. Comparisons of the gene contents of the genomes of ribotypes that clustered according to the conserved gene analysis revealed that the number of common genes of the ribotypes belonging to each of these three ribotype groups were very similar for the 078/033/126 group (at most 69 specific genes between the different strains with the same ribotype) but less similar for the 002/017 group (86 genes difference). It appears that the ribotype is indicative not only of a specific pattern of the amplified 16S–23S rRNA intergenic spacer but also reflects specific differences in the nucleotide sequences of the conserved genes studied here. It can be anticipated that the sequence deviations of more genes of C. difficile strains are correlated with their PCR-ribotype. In conclusion, the results of this study corroborate and extend the concept of clonal C. difficile lineages, which correlate with ribotypes affiliation. PMID:24482682

  19. Storage and retrieval of highly repetitive sequence collections.

    PubMed

    Mäkinen, Veli; Navarro, Gonzalo; Sirén, Jouni; Välimäki, Niko

    2010-03-01

    A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occupies much space, which very soon inhibits in-memory analyses. Recent advances in full-text indexing reduce the space of the suffix tree to, essentially, that of the compressed sequences, while retaining its functionality with only a polylogarithmic slowdown. However, the underlying compression model considers only the predictability of the next sequence symbol given the k previous ones, where k is a small integer. This is unable to capture longer-term repetitiveness. For example, r identical copies of an incompressible sequence will be incompressible under this model. We develop new static and dynamic full-text indexes that are able of capturing the fact that a collection is highly repetitive, and require space basically proportional to the length of one typical sequence plus the total number of edit operations. The new indexes can be plugged into a recent dynamic fully-compressed suffix tree, achieving full functionality for sequence analysis, while retaining the reduced space and the polylogarithmic slowdown. Our experimental results confirm the practicality of our proposal.

  20. RNA sequence and secondary structure participate in high-affinity CsrA-RNA interaction.

    PubMed

    Dubey, Ashok K; Baker, Carol S; Romeo, Tony; Babitzke, Paul

    2005-10-01

    The global Csr regulatory system controls bacterial gene expression post-transcriptionally. CsrA of Escherichia coli is an RNA binding protein that plays a central role in repressing several stationary phase processes and activating certain exponential phase functions. CsrA regulates translation initiation of several genes by binding to the mRNA leaders and blocking ribosome binding. CsrB and CsrC are noncoding regulatory RNAs that are capable of sequestering CsrA and antagonizing its activity. Each of the known target transcripts contains multiple CsrA binding sites, although considerable sequence variation exists among these RNA targets, with GGA being the most highly conserved element. High-affinity RNA ligands containing single CsrA binding sites were identified from a combinatorial library using systematic evolution of ligands by exponential enrichment (SELEX). The SELEX-derived consensus was determined as RUACARGGAUGU, with the ACA and GGA motifs being 100% conserved and the GU sequence being present in all but one ligand. The majority (51/55) of the RNAs contained GGA in the loop of a hairpin within the most stable predicted structure, an arrangement similar to several natural CsrA binding sites. Strikingly, the identity of several nucleotides that were predicted to form base pairs in each stem were 100% conserved, suggesting that primary sequence information was embedded within the base-paired region. The affinity of CsrA for several selected ligands was measured using quantitative gel mobility shift assays. A mutational analysis of one selected ligand confirmed that the conserved ACA, GGA, and GU residues were critical for CsrA binding and that RNA secondary structure participates in CsrA-RNA recognition.

  1. High-Throughput Next-Generation Sequencing of Polioviruses.

    PubMed

    Montmayeur, Anna M; Ng, Terry Fei Fan; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A; Oberste, M Steven; Burns, Cara C

    2017-02-01

    The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance.

  2. Conserved regulatory elements of the promoter sequence of the gene rpoH of enteric bacteria

    PubMed Central

    Ramírez-Santos, Jesús; Collado-Vides, Julio; García-Varela, Martin; Gómez-Eichelmann, M. Carmen

    2001-01-01

    The rpoH regulatory region of different members of the enteric bacteria family was sequenced or downloaded from GenBank and compared. In addition, the transcriptional start sites of rpoH of Yersinia frederiksenii and Proteus mirabilis, two distant members of this family, were determined. Sequences similar to the σ70 promoters P1, P4 and P5, to the σE promoter P3 and to boxes DnaA1, DnaA2, cAMP receptor protein (CRP) boxes CRP1, CRP2 and box CytR present in Escherichia coli K12, were identified in sequences of closely related bacteria such as: E.coli, Shigella flexneri, Salmonella enterica serovar Typhimurium, Citrobacter freundii, Enterobacter cloacae and Klebsiella pneumoniae. In more distant bacteria, Y.frederiksenii and P.mirabilis, the rpoH regulatory region has a distal P1-like σ70 promoter and two proximal promoters: a heat-induced σE-like promoter and a σ70 promoter. Sequences similar to the regulatory boxes were not identified in these bacteria. This study suggests that the general pattern of transcription of the rpoH gene in enteric bacteria includes a distal σ70 promoter, >200 nt upstream of the initiation codon, and two proximal promoters: a heat-induced σE-like promoter and a σ70 promoter. A second proximal σ70 promoter under catabolite-regulation is probably present only in bacteria closely related to E.coli. PMID:11139607

  3. Co-conservation of rRNA tetraloop sequences and helix length suggests involvement of the tetraloops in higher-order interactions

    NASA Technical Reports Server (NTRS)

    Hedenstierna, K. O.; Siefert, J. L.; Fox, G. E.; Murgola, E. J.

    2000-01-01

    Terminal loops containing four nucleotides (tetraloops) are common in structural RNAs, and they frequently conform to one of three sequence motifs, GNRA, UNCG, or CUUG. Here we compare available sequences and secondary structures for rRNAs from bacteria, and we show that helices capped by phylogenetically conserved GNRA loops display a strong tendency to be of conserved length. The simplest interpretation of this correlation is that the conserved GNRA loops are involved in higher-order interactions, intramolecular or intermolecular, resulting in a selective pressure for maintaining the lengths of these helices. A small number of conserved UNCG loops were also found to be associated with conserved length helices, consistent with the possibility that this type of tetraloop also takes part in higher-order interactions.

  4. Co-conservation of rRNA tetraloop sequences and helix length suggests involvement of the tetraloops in higher-order interactions

    NASA Technical Reports Server (NTRS)

    Hedenstierna, K. O.; Siefert, J. L.; Fox, G. E.; Murgola, E. J.

    2000-01-01

    Terminal loops containing four nucleotides (tetraloops) are common in structural RNAs, and they frequently conform to one of three sequence motifs, GNRA, UNCG, or CUUG. Here we compare available sequences and secondary structures for rRNAs from bacteria, and we show that helices capped by phylogenetically conserved GNRA loops display a strong tendency to be of conserved length. The simplest interpretation of this correlation is that the conserved GNRA loops are involved in higher-order interactions, intramolecular or intermolecular, resulting in a selective pressure for maintaining the lengths of these helices. A small number of conserved UNCG loops were also found to be associated with conserved length helices, consistent with the possibility that this type of tetraloop also takes part in higher-order interactions.

  5. The first myriapod genome sequence reveals conservative arthropod gene content and genome organisation in the centipede Strigamia maritima.

    PubMed

    Chipman, Ariel D; Ferrier, David E K; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S T; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C; Alonso, Claudio R; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C J; Blankenburg, Kerstin P; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K; Du Pasquier, Louis; Duncan, Elizabeth J; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D; Extavour, Cassandra G; Francisco, Liezl; Gabaldón, Toni; Gillis, William J; Goodwin-Horn, Elizabeth A; Green, Jack E; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J P; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H L; Hunn, Julia P; Hunnekuhl, Vera S; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N; Jiggins, Francis M; Jones, Tamsin E; Kaiser, Tobias S; Kalra, Divya; Kenny, Nathan J; Korchina, Viktoriya; Kovar, Christie L; Kraus, F Bernhard; Lapraz, François; Lee, Sandra L; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C; Robertson, Helen E; Robertson, Hugh M; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E; Schurko, Andrew M; Siggens, Kenneth W; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M; Willis, Judith H; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M; Worley, Kim C; Gibbs, Richard A; Akam, Michael; Richards, Stephen

    2014-11-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific

  6. Sorting out relationships among the grouse and ptarmigan using intron, mitochondrial, and ultra-conserved element sequences.

    PubMed

    Persons, Nicholas W; Hosner, Peter A; Meiklejohn, Kelly A; Braun, Edward L; Kimball, Rebecca T

    2016-05-01

    The Holarctic phasianid clade of the grouse and ptarmigan has received substantial attention in areas such as evolution of mating systems, display behavior, and population ecology related to their conservation and management as wild game species. There are multiple molecular phylogenetic studies that focus on grouse and ptarmigan. In spite of this, there is little consensus regarding historical relationships, particularly among genera, which has led to unstable and partial taxonomic revisions. We estimated the phylogeny of all currently recognized species using a combination of novel data from seven nuclear loci (largely intron sequences) and published data from one additional autosomal locus, two W-linked loci, and four mitochondrial regions. To explore relationships among genera and assess paraphyly of one genus more rigorously, we then added over 3000 ultra-conserved element (UCE) loci (over 1.7million bp) gathered using Illumina sequencing. The UCE topology agreed with that of the combined nuclear intron and previously published sequence data with 100% bootstrap support for all relationships. These data strongly support previous studies separating Bonasa from Tetrastes and Dendragapus from Falcipennis. However, the placement of Lagopus differed from previous studies, and we found no support for Falcipennis monophyly. Biogeographic analysis suggests that the ancestors of grouse and ptarmigan were distributed in the New World and subsequently underwent at least four dispersal events between the Old and New Worlds. Divergence time estimates from maternally-inherited and autosomal markers show stark differences across this clade, with divergence time estimates from maternally-inherited markers being nearly half that of the autosomal markers at some nodes, and nearly twice that at other nodes. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. The First Myriapod Genome Sequence Reveals Conservative Arthropod Gene Content and Genome Organisation in the Centipede Strigamia maritima

    PubMed Central

    Chipman, Ariel D.; Ferrier, David E. K.; Brena, Carlo; Qu, Jiaxin; Hughes, Daniel S. T.; Schröder, Reinhard; Torres-Oliva, Montserrat; Znassi, Nadia; Jiang, Huaiyang; Almeida, Francisca C.; Alonso, Claudio R.; Apostolou, Zivkos; Aqrawi, Peshtewani; Arthur, Wallace; Barna, Jennifer C. J.; Blankenburg, Kerstin P.; Brites, Daniela; Capella-Gutiérrez, Salvador; Coyle, Marcus; Dearden, Peter K.; Du Pasquier, Louis; Duncan, Elizabeth J.; Ebert, Dieter; Eibner, Cornelius; Erikson, Galina; Evans, Peter D.; Extavour, Cassandra G.; Francisco, Liezl; Gabaldón, Toni; Gillis, William J.; Goodwin-Horn, Elizabeth A.; Green, Jack E.; Griffiths-Jones, Sam; Grimmelikhuijzen, Cornelis J. P.; Gubbala, Sai; Guigó, Roderic; Han, Yi; Hauser, Frank; Havlak, Paul; Hayden, Luke; Helbing, Sophie; Holder, Michael; Hui, Jerome H. L.; Hunn, Julia P.; Hunnekuhl, Vera S.; Jackson, LaRonda; Javaid, Mehwish; Jhangiani, Shalini N.; Jiggins, Francis M.; Jones, Tamsin E.; Kaiser, Tobias S.; Kalra, Divya; Kenny, Nathan J.; Korchina, Viktoriya; Kovar, Christie L.; Kraus, F. Bernhard; Lapraz, François; Lee, Sandra L.; Lv, Jie; Mandapat, Christigale; Manning, Gerard; Mariotti, Marco; Mata, Robert; Mathew, Tittu; Neumann, Tobias; Newsham, Irene; Ngo, Dinh N.; Ninova, Maria; Okwuonu, Geoffrey; Ongeri, Fiona; Palmer, William J.; Patil, Shobha; Patraquim, Pedro; Pham, Christopher; Pu, Ling-Ling; Putman, Nicholas H.; Rabouille, Catherine; Ramos, Olivia Mendivil; Rhodes, Adelaide C.; Robertson, Helen E.; Robertson, Hugh M.; Ronshaugen, Matthew; Rozas, Julio; Saada, Nehad; Sánchez-Gracia, Alejandro; Scherer, Steven E.; Schurko, Andrew M.; Siggens, Kenneth W.; Simmons, DeNard; Stief, Anna; Stolle, Eckart; Telford, Maximilian J.; Tessmar-Raible, Kristin; Thornton, Rebecca; van der Zee, Maurijn; von Haeseler, Arndt; Williams, James M.; Willis, Judith H.; Wu, Yuanqing; Zou, Xiaoyan; Lawson, Daniel; Muzny, Donna M.; Worley, Kim C.; Gibbs, Richard A.; Akam, Michael; Richards, Stephen

    2014-01-01

    Myriapods (e.g., centipedes and millipedes) display a simple homonomous body plan relative to other arthropods. All members of the class are terrestrial, but they attained terrestriality independently of insects. Myriapoda is the only arthropod class not represented by a sequenced genome. We present an analysis of the genome of the centipede Strigamia maritima. It retains a compact genome that has undergone less gene loss and shuffling than previously sequenced arthropods, and many orthologues of genes conserved from the bilaterian ancestor that have been lost in insects. Our analysis locates many genes in conserved macro-synteny contexts, and many small-scale examples of gene clustering. We describe several examples where S. maritima shows different solutions from insects to similar problems. The insect olfactory receptor gene family is absent from S. maritima, and olfaction in air is likely effected by expansion of other receptor gene families. For some genes S. maritima has evolved paralogues to generate coding sequence diversity, where insects use alternate splicing. This is most striking for the Dscam gene, which in Drosophila generates more than 100,000 alternate splice forms, but in S. maritima is encoded by over 100 paralogues. We see an intriguing linkage between the absence of any known photosensory proteins in a blind organism and the additional absence of canonical circadian clock genes. The phylogenetic position of myriapods allows us to identify where in arthropod phylogeny several particular molecular mechanisms and traits emerged. For example, we conclude that juvenile hormone signalling evolved with the emergence of the exoskeleton in the arthropods and that RR-1 containing cuticle proteins evolved in the lineage leading to Mandibulata. We also identify when various gene expansions and losses occurred. The genome of S. maritima offers us a unique glimpse into the ancestral arthropod genome, while also displaying many adaptations to its specific

  8. Library preparation for highly accurate population sequencing of RNA viruses

    PubMed Central

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  9. Variants within the yeast Ty sequence family encode a class of structurally conserved proteins.

    PubMed Central

    Fulton, A M; Mellor, J; Dobson, M J; Chester, J; Warmington, J R; Indge, K J; Oliver, S G; de la Paz, P; Wilson, W; Kingsman, A J

    1985-01-01

    The Ty transposable elements of Saccharomyces cerevisiae form a heterogeneous family within which two broad structural classes (I and II) exist. The two classes differ by two large substitutions and many restriction sites. We show that, like class I elements a class II element, Tyl-17, also appears to contain at least two major protein coding regions, designated TYA and TYB, and the organisational relationship of these regions has been conserved. The TYA genes of both classes encode proteins, designated p1 proteins, with an approximate molecular weight of 50 Kd and, despite considerable variation between the TYA regions at the DNA level, the structures of these proteins are remarkably similar. These observations strongly suggest that the p1 proteins of Ty elements are functionally significant and that they have been subject to selection. Images PMID:2989787

  10. A Next-Generation Sequencing Method for Genotyping-by-Sequencing of Highly Heterozygous Autotetraploid Potato

    PubMed Central

    Uitdewilligen, Jan G. A. M. L.; Wolters, Anne-Marie A.; D’hoop, Bjorn B.; Borm, Theo J. A.; Visser, Richard G. F.; van Eck, Herman J.

    2013-01-01

    Assessment of genomic DNA sequence variation and genotype calling in autotetraploids implies the ability to distinguish among five possible alternative allele copy number states. This study demonstrates the accuracy of genotyping-by-sequencing (GBS) of a large collection of autotetraploid potato cultivars using next-generation sequencing. It is still costly to reach sufficient read depths on a genome wide scale, across the cultivated gene pool. Therefore, we enriched cultivar-specific DNA sequencing libraries using an in-solution hybridisation method (SureSelect). This complexity reduction allowed to confine our study to 807 target genes distributed across the genomes of 83 tetraploid cultivars and one reference (DM 1–3 511). Indexed sequencing libraries were paired-end sequenced in 7 pools of 12 samples using Illumina HiSeq2000. After filtering and processing the raw sequence data, 12.4 Gigabases of high-quality sequence data was obtained, which mapped to 2.1 Mb of the potato reference genome, with a median average read depth of 63× per cultivar. We detected 129,156 sequence variants and genotyped the allele copy number of each variant for every cultivar. In this cultivar panel a variant density of 1 SNP/24 bp in exons and 1 SNP/15 bp in introns was obtained. The average minor allele frequency (MAF) of a variant was 0.14. Potato germplasm displayed a large number of relatively rare variants and/or haplotypes, with 61% of the variants having a MAF below 0.05. A very high average nucleotide diversity (π = 0.0107) was observed. Nucleotide diversity varied among potato chromosomes. Several genes under selection were identified. Genotyping-by-sequencing results, with allele copy number estimates, were validated with a KASP genotyping assay. This validation showed that read depths of ∼60–80× can be used as a lower boundary for reliable assessment of allele copy number of sequence variants in autotetraploids. Genotypic data were associated with traits, and

  11. Automated degenerate PCR primer design for high-throughput sequencing improves efficiency of viral sequencing.

    PubMed

    Li, Kelvin; Shrivastava, Susmita; Brownley, Anushka; Katzel, Dan; Bera, Jayati; Nguyen, Anh Thu; Thovarai, Vishal; Halpin, Rebecca; Stockwell, Timothy B

    2012-11-06

    In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally. We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute's (JCVI) high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates. Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus sequence that encapsulates the allelic variation of the targeted

  12. Molecular polymorphisms associated with host range in the highly conserved genomes of burrowing nematodes, Radopholus spp.

    PubMed

    Kaplan, D T; Vanderspool, M C; Garrett, C; Chang, S; Opperman, C H

    1996-01-01

    Six polymorphic bands of DNA were amplified from purified Radopholus citrophilus genomic DNA from one strain of each of the sibling species R. citrophilus and R. similis in random amplified polymorphic DNA analyses involving 380 single 10-base primers. Four of these polymorphic DNA fragments were successfully cloned and amplified through subsequent use of primers designed to complement the terminal sequences of the polymorphic DNA. Results of ensuing studies using mini-prepped DNA from 14 burrowing nematode strains collected from Florida, Hawaii, and Central America, characterized for their ability to parasitize citrus, indicated that a 2.4-kb fragment appeared to be associated with citrus parasitism in burrowing nematode populations from Florida. However, a fragment of comparable size was also detected in R. citrophilus from Hawaii and from burrowing nematode populations collected from Belize and Puerto Rico. Overall, findings suggest that the genome organization of the burrowing nematode sibling species R. citrophilus and R. similis is highly conserved. This remarkable genetic similarity should facilitate identification of genetic sequence related to important phenotypes such as citrus parasitism. Detection of R. citrophilus-specific DNA fragments in burrowing nematodes collected from Belize and Puerto Rico suggests that R. citrophilus is resident in some Central American countries.

  13. Evolutionarily conserved sequences of striated muscle myosin heavy chain isoforms. Epitope mapping by cDNA expression.

    PubMed

    Miller, J B; Teal, S B; Stockdale, F E

    1989-08-05

    A cDNA expression strategy was used to localize amino acid sequences which were specific for fast, as opposed to slow, isoforms of the chicken skeletal muscle myosin heavy chain (MHC) and which were conserved in vertebrate evolution. Five monoclonal antibodies (mAbs), termed F18, F27, F30, F47, and F59, were prepared that reacted with all of the known chicken fast MHC isoforms but did not react with any of the known chicken slow nor with smooth muscle MHC isoforms. The epitopes recognized by mAbs F18, F30, F47, and F59 were on the globular head fragment of the MHC, whereas the epitope recognized by mAb F27 was on the helical tail or rod fragment. Reactivity of all five mAbs also was confined to fast MHCs in the rat, with the exception of mAb F59, which also reacted with the beta-cardiac MHC, the single slow MHC isoform common to both the rat heart and skeletal muscle. None of the five epitopes was expressed on amphioxus, nematode, or Dictyostelium MHC. The F27 and F59 epitopes were found on shark, electric ray, goldfish, newt, frog, turtle, chicken, quail, rabbit, and rat MHCs. The epitopes recognized by these mAbs were conserved, therefore, to varying degrees through vertebrate evolution and differed in sequence from homologous regions of a number of invertebrate MHCs and myosin-like proteins. The sequence of those epitopes on the head were mapped using a two-part cDNA expression strategy. First, Bal31 exonuclease digestion was used to rapidly generate fragments of a chicken embryonic fast MHC cDNA that were progressively deleted from the 3' end. These cDNA fragments were expressed as beta-galactosidase/MHC fusion proteins using the pUR290 vector; the fusion proteins were tested by immunoblotting for reactivity with the mAbs; and the approximate locations of the epitopes were determined from the sizes of the cDNA fragments that encoded a particular epitope. The epitopes were then precisely mapped by expression of overlapping cDNA fragments of known sequence that

  14. Horse domestication and conservation genetics of Przewalski's horse inferred from sex chromosomal and autosomal sequences.

    PubMed

    Lau, Allison N; Peng, Lei; Goto, Hiroki; Chemnick, Leona; Ryder, Oliver A; Makova, Kateryna D

    2009-01-01

    Despite their ability to interbreed and produce fertile offspring, there is continued disagreement about the genetic relationship of the domestic horse (Equus caballus) to its endangered wild relative, Przewalski's horse (Equus przewalskii). Analyses have differed as to whether or not Przewalski's horse is placed phylogenetically as a separate sister group to domestic horses. Because Przewalski's horse and domestic horse are so closely related, genetic data can also be used to infer domestication-specific differences between the two. To investigate the genetic relationship of Przewalski's horse to the domestic horse and to address whether evolution of the domestic horse is driven by males or females, five homologous introns (a total of approximately 3 kb) were sequenced on the X and Y chromosomes in two Przewalski's horses and three breeds of domestic horses: Arabian horse, Mongolian domestic horse, and Dartmoor pony. Five autosomal introns (a total of approximately 6 kb) were sequenced for these horses as well. The sequences of sex chromosomal and autosomal introns were used to determine nucleotide diversity and the forces driving evolution in these species. As a result, X chromosomal and autosomal data do not place Przewalski's horses in a separate clade within phylogenetic trees for horses, suggesting a close relationship between domestic and Przewalski's horses. It was also found that there was a lack of nucleotide diversity on the Y chromosome and higher nucleotide diversity than expected on the X chromosome in domestic horses as compared with the Y chromosome and autosomes. This supports the hypothesis that very few male horses along with numerous female horses founded the various domestic horse breeds. Patterns of nucleotide diversity among different types of chromosomes were distinct for Przewalski's in contrast to domestic horses, supporting unique evolutionary histories of the two species.

  15. High conservation of a 5' element required for RNA editing of a C target in chloroplast psbE transcripts.

    PubMed

    Hayes, Michael L; Hanson, Maureen R

    2008-09-01

    C-to-U editing modifies 30-40 distinct nucleotides within higher-plant chloroplast transcripts. Many C targets are located at the same position in homologous genes from different plants; these either could have emerged independently or could share a common origin. The 5' sequence GCCGUU, required for editing of C214 in tobacco psbE in vitro, is one of the few identified editing cis-elements. We investigated psbE sequences from many plant species to determine in what lineage(s) editing of psbE C214 emerged and whether the cis-element identified in tobacco is conserved in plants with a C214. The GCCGUU sequence is present at a high frequency in plants that carry a C214 in psbE. However, Sciadopitys verticillata (Pinophyta) edits C214 despite the presence of nucleotide differences compared to the conserved cis-element. The C214 site in psbE genes is represented in members of four branches of spermatophytes but not in gnetophytes, resulting in the parsimonious prediction that editing of psbE C214 was present in the ancestor of spermatophytes. Extracts from chloroplasts from a species that has a difference in the motif and lacks the C target are incapable of editing tobacco psbE C214 substrates, implying that the critical trans-acting protein factors were not retained without a C target. Because noncoding sequences are less constrained than coding regions, we analyzed sequences 5' to two C editing targets located within coding regions to search for possible editing-related conserved elements. Putative editing cis-elements were uncovered in the 5' UTRs near editing sites psbL C2 and ndhD C2.

  16. Canine Polydactyl Mutations With Heterogeneous Origin in the Conserved Intronic Sequence of LMBR1

    PubMed Central

    Park, Kiyun; Kang, Joohyun; Subedi, Krishna Pd.; Ha, Ji-Hong; Park, Chankyu

    2008-01-01

    Canine preaxial polydactyly (PPD) in the hind limb is a developmental trait that restores the first digit lost during canine evolution. Using a linkage analysis, we previously demonstrated that the affected gene in a Korean breed is located on canine chromosome 16. The candidate locus was further limited to a linkage disequilibrium (LD) block of <213 kb composing the single gene, LMBR1, by LD mapping with single nucleotide polymorphisms (SNPs) for affected individuals from both Korean and Western breeds. The ZPA regulatory sequence (ZRS) in intron 5 of LMBR1 was implicated in mammalian polydactyly. An analysis of the LD haplotypes around the ZRS for various dog breeds revealed that only a subset is assigned to Western breeds. Furthermore, two distinct affected haplotypes for Asian and Western breeds were found, each containing different single-base changes in the upstream sequence (pZRS) of the ZRS. Unlike the previously characterized cases of PPD identified in the mouse and human ZRS regions, the canine mutations in pZRS lacked the ectopic expression of sonic hedgehog in the anterior limb bud, distinguishing its role in limb development from that of the ZRS. PMID:18689889

  17. Deep sequencing and microarray hybridization identify conserved and species-specific microRNAs during somatic embryogenesis in hybrid yellow poplar.

    PubMed

    Li, Tingting; Chen, Jinhui; Qiu, Shuai; Zhang, Yanjuan; Wang, Pengkai; Yang, Liwei; Lu, Ye; Shi, Jisen

    2012-01-01

    To date, several studies have indicated a major role for microRNAs (miRNAs) in regulating plant development, but miRNA-mediated regulation of the developing somatic embryo is poorly understood, especially during early stages of somatic embryogenesis in hardwood plants. In this study, Solexa sequencing and miRNA microfluidic chips were used to discover conserved and species-specific miRNAs during somatic embryogenesis of hybrid yellow poplar (Liriodendron tulipifera×L. chinense). A total of 17,214,153 reads representing 7,421,623 distinct sequences were obtained from a short RNA library generated from small RNAs extracted from all stages of somatic embryos. Through a combination of deep sequencing and bioinformatic analyses, we discovered 83 sequences with perfect matches to known miRNAs from 33 conserved miRNA families and 273 species-specific candidate miRNAs. MicroRNA microarray results demonstrated that many conserved and species-specific miRNAs were expressed in hybrid yellow poplar embryos. In addition, the microarray also detected another 149 potential miRNAs, belonging to 29 conserved families, which were not discovered by deep sequencing analysis. The biological processes and molecular functions of the targets of these miRNAs were predicted by carrying out BLAST search against Arabidopsis thaliana GenBank sequences and then analyzing the results with Gene Ontology. Solexa sequencing and microarray hybridization were used to discover 232 candidate conserved miRNAs from 61 miRNA families and 273 candidate species-specific miRNAs in hybrid yellow poplar. In these predicted miRNAs, 64 conserved miRNAs and 177 species-specific miRNAs were detected by both sequencing and microarray hybridization. Our results suggest that miRNAs have wide-ranging characteristics and important roles during all stages of somatic embryogenesis in this economically important species.

  18. Sequencing of HLA class II genes based on the conserved diversity of the non-coding regions: sequencing based typing of HLA-DRB genes.

    PubMed

    Kotsch, K; Wehling, J; Blasczyk, R

    1999-05-01

    In this paper, we present a novel sequencing based typing strategy for the HLA-DRB1, 3, 4 and 5 loci. The new approach is based on a group-specific amplification from intron 1 to intron 2 according to the serologically-defined antigens. For this purpose, we have determined the 3' 500 bp-fragment of intron 1 and the 5' 340 bp-fragment of intron 2 of all serological antigens and their most frequent subtypes. We discovered a remarkably conserved diversity characterized by lineage-specific sequence motifs. This lineage-specificity of non-coding motifs in the 1st and 2nd intron offered the possibility to establish a clear serology-related amplification strategy. The method allows the complete analysis of the 2nd exon and the definition of the cis/trans linkage of sequence motifs by intron-mediated polymerase chain reaction (PCR)-based separation of the haplotypes in nearly all serologically heterozygous samples. In particular, the non-coding variabilities between the DR52-associated DRB1 groups made their independent amplification possible. Thus, compared to the standard procedures using exon-based amplification primers, the groups DR3, DR12, some DR13 alleles (1301, 1302) and the DR14 group could be amplified by specific primer mixes. The DR8 could be amplified with an individual primer mix not co-amplifying the DR12. The DR11 and DR13 did not show any individual motif in intron 1 or intron 2. In order to achieve a separate amplification, they had to be amplified by multispecific primer mixes (DR3/11/13/14; DR3/11/13 or DR11/13/14) excluding the other haplotype. Thus, exclusively the alleles in rare DR11,13 heterozygosities without a DRB1*1301 or 1302 could not be amplified separately. Fourteen primer mixes are used to amplify the specificities DR1-14, and 6 primer mixes for the specificities DR51-53. The sequence homology of the 3' end of intron 1 facilitated the application of only three different sequencing primers for all DRB alleles.

  19. ATF-2 stimulates the human insulin promoter through the conserved CRE2 sequence.

    PubMed

    Hay, Colin W; Ferguson, Laura A; Docherty, Kevin

    2007-02-01

    The insulin promoter contains a number of dissimilar cis-acting regulatory elements that bind a range of tissue specific and ubiquitous transcription factors. Of the regulatory elements within the insulin promoter, the cyclic AMP responsive element (CRE) binds by far the most diverse array of transcription factors. Rodent insulin promoters have a single CRE site, whereas there are four CREs within the human insulin gene, of which CRE2 is the only one conserved between species. The aim of this study was to characterise the human CRE2 site and to investigate the effects of the two principal CRE-associated transcription factors; CREB-1 and ATF-2. Co-transfection of INS-1 pancreatic beta-cells with promoter constructs containing the human insulin gene promoter placed upstream of the firefly luciferase reporter gene and expression plasmids for ATF-2 or CREB-1 showed that ATF-2 stimulated transcriptional activity while CREB-1 elicited an inhibitory effect. Mutagenesis of CRE2 diminished the effect of ATF-2 but not that of CREB-1. ATF-2 was shown to bind to the CRE2 site by electrophoretic mobility shift assay and by chromatin immunoprecipitation, while siRNA mediated knockdown of ATF-2 diminished the stimulatory effects of cAMP related signalling on promoter activity. These results suggest that ATF-2 may be a key regulator of the human insulin promoter possibly stimulating activity in response to extracellular signals.

  20. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  1. Assessment of selected conservation measures for high-temperature process industries

    SciTech Connect

    Kusik, C L; Parameswaran, K; Nadkarni, R; O'Neill, J K; Malhotra, S; Hyde, R; Kinneberg, D; Fox, L; Rossetti, M

    1981-01-01

    Energy conservation projects involving high-temperature processes in various stages of development are assessed to quantify their energy conservation potential; to determine their present status of development; to identify their research and development needs and estimate the associated costs; and to determine the most effective role for the Federal government in developing these technologies. The program analyzed 25 energy conserving processes in the iron and steel, aluminium, copper, magnesium, cement, and glassmaking industries. A preliminary list of other potential energy conservation projects in these industries is also presented in the appendix. (MCW)

  2. Conservation of the C-type lectin fold for accommodating massive sequence variation in archaeal diversity-generating retroelements.

    PubMed

    Handa, Sumit; Paul, Blair G; Miller, Jeffery F; Valentine, David L; Ghosh, Partho

    2016-08-31

    Diversity-generating retroelements (DGRs) provide organisms with a unique means for adaptation to a dynamic environment through massive protein sequence variation. The potential scope of this variation exceeds that of the vertebrate adaptive immune system. DGRs were known to exist only in viruses and bacteria until their recent discovery in archaea belonging to the 'microbial dark matter', specifically in organisms closely related to Nanoarchaeota. However, Nanoarchaeota DGR variable proteins were unassignable to known protein folds and apparently unrelated to characterized DGR variable proteins. To address the issue of how Nanoarchaeota DGR variable proteins accommodate massive sequence variation, we determined the 2.52 Å resolution limit crystal structure of one such protein, AvpA, which revealed a C-type lectin (CLec)-fold that organizes a putative ligand-binding site that is capable of accommodating 10(13) sequences. This fold is surprisingly reminiscent of the CLec-folds of viral and bacterial DGR variable protein, but differs sufficiently to define a new CLec-fold subclass, which is consistent with early divergence between bacterial and archaeal DGRs. The structure also enabled identification of a group of AvpA-like proteins in multiple putative DGRs from uncultivated archaea. These variable proteins may aid Nanoarchaeota and these uncultivated archaea in symbiotic relationships. Our results have uncovered the widespread conservation of the CLec-fold in viruses, bacteria, and archaea for accommodating massive sequence variation. In addition, to our knowledge, this is the first report of an archaeal CLec-fold protein.

  3. 7 CFR 1430.225 - Violations of highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 10 2011-01-01 2011-01-01 false Violations of highly erodible land and wetland conservation provisions. 1430.225 Section 1430.225 Agriculture Regulations of the Department of Agriculture... wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  4. 7 CFR 1412.68 - Compliance with highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Compliance with highly erodible land and wetland conservation provisions. 1412.68 Section 1412.68 Agriculture Regulations of the Department of Agriculture... and wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  5. 7 CFR 1412.68 - Compliance with highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Compliance with highly erodible land and wetland conservation provisions. 1412.68 Section 1412.68 Agriculture Regulations of the Department of Agriculture... and wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  6. 7 CFR 1430.225 - Violations of highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 7 Agriculture 10 2014-01-01 2014-01-01 false Violations of highly erodible land and wetland conservation provisions. 1430.225 Section 1430.225 Agriculture Regulations of the Department of Agriculture... wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  7. 7 CFR 1412.68 - Compliance with highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Compliance with highly erodible land and wetland conservation provisions. 1412.68 Section 1412.68 Agriculture Regulations of the Department of Agriculture... and wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  8. 7 CFR 1412.68 - Compliance with highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 10 2011-01-01 2011-01-01 false Compliance with highly erodible land and wetland conservation provisions. 1412.68 Section 1412.68 Agriculture Regulations of the Department of Agriculture... and wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  9. 7 CFR 1412.68 - Compliance with highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Compliance with highly erodible land and wetland conservation provisions. 1412.68 Section 1412.68 Agriculture Regulations of the Department of Agriculture... and wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  10. 7 CFR 1430.225 - Violations of highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Violations of highly erodible land and wetland conservation provisions. 1430.225 Section 1430.225 Agriculture Regulations of the Department of Agriculture... wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  11. 7 CFR 1430.225 - Violations of highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 7 Agriculture 10 2012-01-01 2012-01-01 false Violations of highly erodible land and wetland conservation provisions. 1430.225 Section 1430.225 Agriculture Regulations of the Department of Agriculture... wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  12. 7 CFR 1430.225 - Violations of highly erodible land and wetland conservation provisions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 7 Agriculture 10 2013-01-01 2013-01-01 false Violations of highly erodible land and wetland conservation provisions. 1430.225 Section 1430.225 Agriculture Regulations of the Department of Agriculture... wetland conservation provisions. The provisions of part 12 of this title apply to this part....

  13. Mitochondrial genome sequences of Artemia tibetiana and Artemia urmiana: assessing molecular changes for high plateau adaptation.

    PubMed

    Zhang, Hangxiao; Luo, Qibin; Sun, Jing; Liu, Fei; Wu, Gang; Yu, Jun; Wang, Weiwei

    2013-05-01

    Brine shrimps, Artemia (Crustacea, Anostraca), inhabit hypersaline environments and have a broad geographical distribution from sea level to high plateaus. Artemia therefore possess significant genetic diversity, which gives them their outstanding adaptability. To understand this remarkable plasticity, we sequenced the mitochondrial genomes of two Artemia tibetiana isolates from the Tibetan Plateau in China and one Artemia urmiana isolate from Lake Urmia in Iran and compared them with the genome of a low-altitude Artemia, A. franciscana. We compared the ratio of the rate of nonsynonymous (Ka) and synonymous (Ks) substitutions (Ka/Ks ratio) in the mitochondrial protein-coding gene sequences and found that atp8 had the highest Ka/Ks ratios in comparisons of A. franciscana with either A. tibetiana or A. urmiana and that atp6 had the highest Ka/Ks ratio between A. tibetiana and A. urmiana. Atp6 may have experienced strong selective pressure for high-altitude adaptation because although A. tibetiana and A. urmiana are closely related they live at different altitudes. We identified two extended termination-associated sequences and three conserved sequence blocks in the D-loop region of the mitochondrial genomes. We propose that sequence variations in the D-loop region and in the subunits of the respiratory chain complexes independently or collectively contribute to the adaptation of Artemia to different altitudes.

  14. Structural sequences are conserved in the genes coding for the alpha, alpha' and beta-subunits of the soybean 7S seed storage protein.

    PubMed Central

    Schuler, M A; Ladin, B F; Pollaco, J C; Freyer, G; Beachy, R N

    1982-01-01

    Cloned DNAs encoding four different proteins have been isolated from recombinant cDNA libraries constructed with Glycine max seed mRNAs. Two cloned DNAs code for the alpha and alpha'-subunits of the 7S seed storage protein (conglycinin). The other cloned cDNAs code for proteins which are synthesized in vitro as 68,000 d., 60,000 d. or 53,000 d. polypeptides. Hybrid selection experiments indicate that, under low stringency hybridization conditions, all four cDNAs hybridize with mRNAs for the alpha and alpha'-subunits and the 68,000 d., 60,000 d. and 53,000 d. in vitro translation products. Within three of the mRNA, there is a conserved sequence of 155 nucleotides which is responsible for this hybridization. The conserved nucleotides in the alpha and alpha'-subunit cDNAs and the 68,000 d. polypeptide cDNAs span both coding and noncoding sequences. The differences in the coding nucleotides outside the conserved region are extensive. This suggests that selective pressure to maintain the 155 conserved nucleotides has been influenced by the structure of the seed mRNA. RNA blot hybridizations demonstrate that mRNA encoding the other major subunit (beta) of the 7S seed storage protein also shares sequence homology with the conserved 155 nucleotide sequence of the alpha and alpha'-subunit mRNAs, but not with other coding sequences. Images PMID:6897678

  15. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    PubMed

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species.

  16. CpG methylation differences between neurons and glia are highly conserved from mouse to human.

    PubMed

    Kessler, Noah J; Van Baak, Timothy E; Baker, Maria S; Laritsky, Eleonora; Coarfa, Cristian; Waterland, Robert A

    2016-01-15

    Understanding epigenetic differences that distinguish neurons and glia is of fundamental importance to the nascent field of neuroepigenetics. A recent study used genome-wide bisulfite sequencing to survey differences in DNA methylation between these two cell types, in both humans and mice. That study minimized the importance of cell type-specific differences in CpG methylation, claiming these are restricted to localized genomic regions, and instead emphasized that widespread and highly conserved differences in non-CpG methylation distinguish neurons and glia. We reanalyzed the data from that study and came to markedly different conclusions. In particular, we found widespread cell type-specific differences in CpG methylation, with a genome-wide tendency for neuronal CpG-hypermethylation punctuated by regions of glia-specific hypermethylation. Alarmingly, our analysis indicated that the majority of genes identified by the primary study as exhibiting cell type-specific CpG methylation differences were misclassified. To verify the accuracy of our analysis, we isolated neuronal and glial DNA from mouse cortex and performed quantitative bisulfite pyrosequencing at nine loci. The pyrosequencing results corroborated our analysis, without exception. Most interestingly, we found that gene-associated neuron vs. glia CpG methylation differences are highly conserved across human and mouse, and are very likely to be functional. In addition to underscoring the importance of independent verification to confirm the conclusions of genome-wide epigenetic analyses, our data indicate that CpG methylation plays a major role in neuroepigenetics, and that the mouse is likely an excellent model in which to study the role of DNA methylation in human neurodevelopment and disease.

  17. Highly conserved Z and molecularly diverged W chromosomes in the fish genus Triportheus (Characiformes, Triportheidae).

    PubMed

    Yano, C F; Bertollo, L A C; Ezaz, T; Trifonov, V; Sember, A; Liehr, T; Cioffi, M B

    2017-03-01

    The main objectives of this study were to test: (1) whether the W-chromosome differentiation matches to species' evolutionary divergence (phylogenetic concordance) and (2) whether sex chromosomes share a common ancestor within a congeneric group. The monophyletic genus Triportheus (Characiformes, Triportheidae) was the model group for this study. All species in this genus so far analyzed have ZW sex chromosome system, where the Z is always the largest chromosome of the karyotype, whereas the W chromosome is highly variable ranging from almost homomorphic to highly heteromorphic. We applied conventional and molecular cytogenetic approaches including C-banding, ribosomal DNA mapping, comparative genomic hybridization (CGH) and cross-species whole chromosome painting (WCP) to test our questions. We developed Z- and W-chromosome paints from T. auritus for cross-species WCP and performed CGH in a representative species (T. signatus) to decipher level of homologies and rates of differentiation of W chromosomes. Our study revealed that the ZW sex chromosome system had a common origin, showing highly conserved Z chromosomes and remarkably divergent W chromosomes. Notably, the W chromosomes have evolved to different shapes and sequence contents within ~15-25 Myr of divergence time. Such differentiation highlights a dynamic process of W-chromosome evolution within congeneric species of Triportheus.

  18. Complete mitochondrial DNA sequence of the endangered giant sable antelope (Hippotragus niger variani): insights into conservation and taxonomy.

    PubMed

    Espregueira Themudo, Gonçalo; Rufino, Ana C; Campos, Paula F

    2015-02-01

    The giant sable antelope is one of the most endangered African bovids. Populations of this iconic animal, the national symbol of Angola, were recently rediscovered, after many decades of presumed extinction. Even so, their numbers are scarce and hence conservation plans are essential. However, fundamental information such as its taxonomic position, time of divergence and degree of genetic variation are still lacking. Here, we used a museum preserved horn as a source of DNA to describe, for the first time, the complete mitochondrial genome of the giant sable antelope, and provide insights into its evolutionary history. Reads generated by shotgun sequencing were mapped against the mitochondrial genome of common sable antelope and the nuclear genomes of cow and sheep. Phylogenetic reconstruction and divergence time estimate give support to the monophyly of the giant sable and a maximum divergence time of 170 thousand years to the closest subspecies. About 7% of the nuclear genome was mapped against the reference. The genetic resources reported here are now available for future work in the field of conservation genetics and phylogeny, in this and related species.

  19. Identification and Characterization of miRNA Transcriptome in Potato by High-Throughput Sequencing

    PubMed Central

    Zhang, Runxuan; Marshall, David; Bryan, Glenn J.; Hornyik, Csaba

    2013-01-01

    Micro RNAs (miRNAs) represent a class of short, non-coding, endogenous RNAs which play important roles in post-transcriptional regulation of gene expression. While the diverse functions of miRNAs in model plants have been well studied, the impact of miRNAs in crop plant biology is poorly understood. Here we used high-throughput sequencing and bioinformatics analysis to analyze miRNAs in the tuber bearing crop potato (Solanum tuberosum). Small RNAs were analysed from leaf and stolon tissues. 28 conserved miRNA families were found and potato-specific miRNAs were identified and validated by RNA gel blot hybridization. The size, origin and predicted targets of conserved and potato specific miRNAs are described. The large number of miRNAs and complex population of small RNAs in potato suggest important roles for these non-coding RNAs in diverse physiological and metabolic pathways. PMID:23437348

  20. Exome Sequence Analysis of 14 Families With High Myopia

    PubMed Central

    Kloss, Bethany A.; Tompson, Stuart W.; Whisenhunt, Kristina N.; Quow, Krystina L.; Huang, Samuel J.; Pavelec, Derek M.; Rosenberg, Thomas; Young, Terri L.

    2017-01-01

    Purpose To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Methods Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. Results In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Conclusions Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder. PMID:28384719

  1. Exome Sequence Analysis of 14 Families With High Myopia.

    PubMed

    Kloss, Bethany A; Tompson, Stuart W; Whisenhunt, Kristina N; Quow, Krystina L; Huang, Samuel J; Pavelec, Derek M; Rosenberg, Thomas; Young, Terri L

    2017-04-01

    To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.

  2. High-utility conserved avian microsatellite markers enable parentage and population studies across a wide range of species

    PubMed Central

    2013-01-01

    Background Microsatellites are widely used for many genetic studies. In contrast to single nucleotide polymorphism (SNP) and genotyping-by-sequencing methods, they are readily typed in samples of low DNA quality/concentration (e.g. museum/non-invasive samples), and enable the quick, cheap identification of species, hybrids, clones and ploidy. Microsatellites also have the highest cross-species utility of all types of markers used for genotyping, but, despite this, when isolated from a single species, only a relatively small proportion will be of utility. Marker development of any type requires skill and time. The availability of sufficient “off-the-shelf” markers that are suitable for genotyping a wide range of species would not only save resources but also uniquely enable new comparisons of diversity among taxa at the same set of loci. No other marker types are capable of enabling this. We therefore developed a set of avian microsatellite markers with enhanced cross-species utility. Results We selected highly-conserved sequences with a high number of repeat units in both of two genetically distant species. Twenty-four primer sets were designed from homologous sequences that possessed at least eight repeat units in both the zebra finch (Taeniopygia guttata) and chicken (Gallus gallus). Each primer sequence was a complete match to zebra finch and, after accounting for degenerate bases, at least 86% similar to chicken. We assessed primer-set utility by genotyping individuals belonging to eight passerine and four non-passerine species. The majority of the new Conserved Avian Microsatellite (CAM) markers amplified in all 12 species tested (on average, 94% in passerines and 95% in non-passerines). This new marker set is of especially high utility in passerines, with a mean 68% of loci polymorphic per species, compared with 42% in non-passerine species. Conclusions When combined with previously described conserved loci, this new set of conserved markers will not only

  3. Applications of high-throughput DNA sequencing to benign hematology

    PubMed Central

    Gallagher, Patrick G.

    2013-01-01

    The development of novel technologies for high-throughput DNA sequencing is having a major impact on our ability to measure and define normal and pathologic variation in humans. This review discusses advances in DNA sequencing that have been applied to benign hematologic disorders, including those affecting the red blood cell, the neutrophil, and other white blood cell lineages. Relevant examples of how these approaches have been used for disease diagnosis, gene discovery, and studying complex traits are provided. High-throughput DNA sequencing technology holds significant promise for impacting clinical care. This includes development of improved disease detection and diagnosis, better understanding of disease progression and stratification of risk of disease-specific complications, and development of improved therapeutic strategies, particularly patient-specific pharmacogenomics-based therapy, with monitoring of therapy by genomic biomarkers. PMID:24021670

  4. Identification and characterization of flowering genes in kiwifruit: sequence conservation and role in kiwifruit flower development

    PubMed Central

    2011-01-01

    Background Flower development in kiwifruit (Actinidia spp.) is initiated in the first growing season, when undifferentiated primordia are established in latent shoot buds. These primordia can differentiate into flowers in the second growing season, after the winter dormancy period and upon accumulation of adequate winter chilling. Kiwifruit is an important horticultural crop, yet little is known about the molecular regulation of flower development. Results To study kiwifruit flower development, nine MADS-box genes were identified and functionally characterized. Protein sequence alignment, phenotypes obtained upon overexpression in Arabidopsis and expression patterns suggest that the identified genes are required for floral meristem and floral organ specification. Their role during budbreak and flower development was studied. A spontaneous kiwifruit mutant was utilized to correlate the extended expression domains of these flowering genes with abnormal floral development. Conclusions This study provides a description of flower development in kiwifruit at the molecular level. It has identified markers for flower development, and candidates for manipulation of kiwifruit growth, phase change and time of flowering. The expression in normal and aberrant flowers provided a model for kiwifruit flower development. PMID:21521532

  5. An improved high throughput sequencing method for studying oomycete communities.

    PubMed

    Sapkota, Rumakanta; Nicolaisen, Mogens

    2015-03-01

    Culture-independent studies using next generation sequencing have revolutionized microbial ecology, however, oomycete ecology in soils is severely lagging behind. The aim of this study was to improve and validate standard techniques for using high throughput sequencing as a tool for studying oomycete communities. The well-known primer sets ITS4, ITS6 and ITS7 were used in the study in a semi-nested PCR approach to target the internal transcribed spacer (ITS) 1 of ribosomal DNA in a next generation sequencing protocol. These primers have been used in similar studies before, but with limited success. We were able to increase the proportion of retrieved oomycete sequences dramatically mainly by increasing the annealing temperature during PCR. The optimized protocol was validated using three mock communities and the method was further evaluated using total DNA from 26 soil samples collected from different agricultural fields in Denmark, and 11 samples from carrot tissue with symptoms of Pythium infection. Sequence data from the Pythium and Phytophthora mock communities showed that our strategy successfully detected all included species. Taxonomic assignments of OTUs from 26 soil sample showed that 95% of the sequences could be assigned to oomycetes including Pythium, Aphanomyces, Peronospora, Saprolegnia and Phytophthora. A high proportion of oomycete reads was consistently present in all 26 soil samples showing the versatility of the strategy. A large diversity of Pythium species including pathogenic and saprophytic species were dominating in cultivated soil. Finally, we analyzed amplicons from carrots with symptoms of cavity spot. This resulted in 94% of the reads belonging to oomycetes with a dominance of species of Pythium that are known to be involved in causing cavity spot, thus demonstrating the usefulness of the method not only in soil DNA but also in a plant DNA background. In conclusion, we demonstrate a successful approach for pyrosequencing of oomycete

  6. Sequence-Based Screening for Rare Enzymes: New Insights into the World of AMDases Reveal a Conserved Motif and 58 Novel Enzymes Clustering in Eight Distinct Families

    PubMed Central

    Maimanakos, Janine; Chow, Jennifer; Gaßmeyer, Sarah K.; Güllert, Simon; Busch, Florian; Kourist, Robert; Streit, Wolfgang R.

    2016-01-01

    Arylmalonate Decarboxylases (AMDases, EC 4.1.1.76) are very rare and mostly underexplored enzymes. Currently only four known and biochemically characterized representatives exist. However, their ability to decarboxylate α-disubstituted malonic acid derivatives to optically pure products without cofactors makes them attractive and promising candidates for the use as biocatalysts in industrial processes. Until now, AMDases could not be separated from other members of the aspartate/glutamate racemase superfamily based on their gene sequences. Within this work, a search algorithm was developed that enables a reliable prediction of AMDase activity for potential candidates. Based on specific sequence patterns and screening methods 58 novel AMDase candidate genes could be identified in this work. Thereby, AMDases with the conserved sequence pattern of Bordetella bronchiseptica’s prototype appeared to be limited to the classes of Alpha-, Beta-, and Gamma-proteobacteria. Amino acid homologies and comparison of gene surrounding sequences enabled the classification of eight enzyme clusters. Particularly striking is the accumulation of genes coding for different transporters of the tripartite tricarboxylate transporters family, TRAP transporters and ABC transporters as well as genes coding for mandelate racemases/muconate lactonizing enzymes that might be involved in substrate uptake or degradation of AMDase products. Further, three novel AMDases were characterized which showed a high enantiomeric excess (>99%) of the (R)-enantiomer of flurbiprofen. These are the recombinant AmdA and AmdV from Variovorax sp. strains HH01 and HH02, originated from soil, and AmdP from Polymorphum gilvum found by a data base search. Altogether our findings give new insights into the class of AMDases and reveal many previously unknown enzyme candidates with high potential for bioindustrial processes. PMID:27610105

  7. cDNA cloning and sequencing of human fibrillarin, a conserved nucleolar protein recognized by autoimmune antisera

    SciTech Connect

    Aris, J.P.; Blobel, G. )

    1991-02-01

    The authors have isolated a 1.1-kilobase cDNA clone that encodes human fibrillarin by screening a hepatoma library in parallel with DNA probes derived from the fibrillarin genes of Saccharomyces cerevisiae (NOP1) and Xenopus laevis. RNA blot analysis indicates that the corresponding mRNA is {approximately}1,300 nucleotides in length. Human fibrillarin expressed in vitro migrates on SDS gels as a 36-kDa protein that is specifically immunoprecipitated by antisera from humans with scleroderma autoimmune disease. Human fibrillarin contains an amino-terminal repetitive domain {approximately}75-80 amino acids in length that is rich in glycine and arginine residues and is similar to amino-terminal domains in the yeast and Xenopus fibrillarins. The occurrence of a putative RNA-binding domain and an RNP consensus sequence within the protein is consistent with the association of fibrillarin with small nucleolar RNAs. Protein sequence alignments show that 67% of amino acids from human fibrillarin are identical to those in yeast fibrillarin and that 81% are identical to those in Xenopus fibrillarin. This identity suggests the evolutionary conservation of an important function early in the pathway for ribosome biosynthesis.

  8. Sequence variation and structural conservation allows development of novel function and immune evasion in parasite surface protein families.

    PubMed

    Higgins, Matthew K; Carrington, Mark

    2014-04-01

    Trypanosoma and Plasmodium species are unicellular, eukaryotic pathogens that have evolved the capacity to survive and proliferate within a human host, causing sleeping sickness and malaria, respectively. They have very different survival strategies. African trypanosomes divide in blood and extracellular spaces, whereas Plasmodium species invade and proliferate within host cells. Interaction with host macromolecules is central to establishment and maintenance of an infection by both parasites. Proteins that mediate these interactions are under selection pressure to bind host ligands without compromising immune avoidance strategies. In both parasites, the expansion of genes encoding a small number of protein folds has established large protein families. This has permitted both diversification to form novel ligand binding sites and variation in sequence that contributes to avoidance of immune recognition. In this review we consider two such parasite surface protein families, one from each species. In each case, known structures demonstrate how extensive sequence variation around a conserved molecular architecture provides an adaptable protein scaffold that the parasites can mobilise to mediate interactions with their hosts. © 2014 The Protein Society.

  9. Binary interactions with high accretion rates onto main sequence stars

    NASA Astrophysics Data System (ADS)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10-2 M ⊙ yr-1 for solar type stars, and up to ≈ 1 M ⊙ yr-1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  10. Molecular signatures (conserved indels) in protein sequences that are specific for the order Pasteurellales and distinguish two of its main clades.

    PubMed

    Naushad, Hafiz Sohail; Gupta, Radhey S

    2012-01-01

    The members of the order Pasteurellales are currently distinguished primarily on the basis of their branching in the rRNA trees and no convincing biochemical or molecular markers are known that distinguish them from all other bacteria. The genome sequences for 20 Pasteurellaceae species/strains are now publicly available. We report here detailed analyses of protein sequences from these genomes to identify conserved signature indels (CSIs) that are specific for either all Pasteurellales or its major clades. We describe more than 23 CSIs in widely distributed genes/proteins that are uniquely shared by all sequenced Pasteurellaceae species/strains but are not found in any other bacteria. Twenty-one additional CSIs are also specific for the Pasteurellales except in some of these cases homologues were not detected in a few species or the CSI was also present in an isolated non-Pasteurellaceae species. The sequenced Pasteurellaceae species formed two distinct clades in a phylogenetic tree based upon concatenated sequences for 10 conserved proteins. The first of these clades consisting of Aggregatibacter, Pasteurella, Actinobacillus succinogenes, Mannheimia succiniciproducens, Haemophilus influenzae and Haemophilus somnus was also independently supported by 13 uniquely shared CSIs that are not present in other Pasteurellaceae species or other bacteria. Another clade consisting of the remaining Pasteurellaceae species (viz. Actinobacillus pleuropneumoniae, Actinobacillus minor, Haemophilus ducryi, Mannheimia haemolytica and Haemophilus parasuis) was also strongly and independently supported by nine CSIs that are uniquely present in these bacteria. The order Pasteurellales is presently made up of a single family, Pasteurellaceae, that encompasses all of its genera. In this context, our identification of two distinct clades within the Pasteurellales, which are supported by both phylogenetic analyses and by multiple highly specific molecular markers, strongly argues for and

  11. Genome-wide analyses reveal a highly conserved Dengue virus envelope peptide which is critical for virus viability and antigenic in humans

    PubMed Central

    Fleith, Renata C.; Lobo, Francisco P.; dos Santos, Paula F.; Rocha, Mariana M.; Bordignon, Juliano; Strottmann, Daisy M.; Patricio, Daniel O.; Pavanelli, Wander R.; Lo Sarzi, Maria; Santos, Claudia N. D.; Ferguson, Brian J.; Mansur, Daniel S.

    2016-01-01

    Targeting regions of proteins that show a high degree of structural conservation has been proposed as a method of developing immunotherapies and vaccines that may bypass the wide genetic variability of RNA viruses. Despite several attempts, a vaccine that protects evenly against the four circulating Dengue virus (DV) serotypes remains elusive. To find critical conserved amino acids in dengue viruses, 120 complete genomes of each serotype were selected at random and used to calculate conservation scores for nucleotide and amino acid sequences. The identified peptide sequences were analysed for their structural conservation and localisation using crystallographic data. The longest, surface exposed, highly conserved peptide of Envelope protein was found to correspond to amino acid residues 250 to 270. Mutation of this peptide in DV1 was lethal, since no replication of the mutant virus was detected in human cells. Antibodies against this peptide were detected in DV naturally infected patients indicating its potential antigenicity. Hence, this study has identified a highly conserved, critical peptide in DV that is a target of antibodies in infected humans. PMID:27805018

  12. Streamlining and core genome conservation among highly divergent members of the SAR11 clade.

    PubMed

    Grote, Jana; Thrash, J Cameron; Huggett, Megan J; Landry, Zachary C; Carini, Paul; Giovannoni, Stephen J; Rappé, Michael S

    2012-01-01

    SAR11 is an ancient and diverse clade of heterotrophic bacteria that are abundant throughout the world's oceans, where they play a major role in the ocean carbon cycle. Correlations between the phylogenetic branching order and spatiotemporal patterns in cell distributions from planktonic ocean environments indicate that SAR11 has evolved into perhaps a dozen or more specialized ecotypes that span evolutionary distances equivalent to a bacterial order. We isolated and sequenced genomes from diverse SAR11 cultures that represent three major lineages and encompass the full breadth of the clade. The new data expand observations about genome evolution and gene content that previously had been restricted to the SAR11 Ia subclade, providing a much broader perspective on the clade's origins, evolution, and ecology. We found small genomes throughout the clade and a very high proportion of core genome genes (48 to 56%), indicating that small genome size is probably an ancestral characteristic. In their level of core genome conservation, the members of SAR11 are outliers, the most conserved free-living bacteria known. Shared features of the clade include low GC content, high gene synteny, a large hypervariable region bounded by rRNA genes, and low numbers of paralogs. Variation among the genomes included genes for phosphorus metabolism, glycolysis, and C1 metabolism, suggesting that adaptive specialization in nutrient resource utilization is important to niche partitioning and ecotype divergence within the clade. These data provide support for the conclusion that streamlining selection for efficient cell replication in the planktonic habitat has occurred throughout the evolution and diversification of this clade. IMPORTANCE The SAR11 clade is the most abundant group of marine microorganisms worldwide, making them key players in the global carbon cycle. Growing knowledge about their biochemistry and metabolism is leading to a more mechanistic understanding of organic carbon

  13. The Highly Conserved MraZ Protein Is a Transcriptional Regulator in Escherichia coli

    PubMed Central

    Eraso, Jesus M.; Markillie, Lye M.; Mitchell, Hugh D.; Taylor, Ronald C.; Orr, Galya

    2014-01-01

    The mraZ and mraW genes are highly conserved in bacteria, both in sequence and in their position at the head of the division and cell wall (dcw) gene cluster. Located directly upstream of the mraZ gene, the Pmra promoter drives the transcription of mraZ and mraW, as well as many essential cell division and cell wall genes, but no regulator of Pmra has been found to date. Although MraZ has structural similarity to the AbrB transition state regulator and the MazE antitoxin and MraW is known to methylate the 16S rRNA, mraZ and mraW null mutants have no detectable phenotypes. Here we show that overproduction of Escherichia coli MraZ inhibited cell division and was lethal in rich medium at high induction levels and in minimal medium at low induction levels. Co-overproduction of MraW suppressed MraZ toxicity, and loss of MraW enhanced MraZ toxicity, suggesting that MraZ and MraW have antagonistic functions. MraZ-green fluorescent protein localized to the nucleoid, suggesting that it binds DNA. Consistent with this idea, purified MraZ directly bound a region of DNA containing three direct repeats between Pmra and the mraZ gene. Excess MraZ reduced the expression of an mraZ-lacZ reporter, suggesting that MraZ acts as a repressor of Pmra, whereas a DNA-binding mutant form of MraZ failed to repress expression. Transcriptome sequencing (RNA-seq) analysis suggested that MraZ also regulates the expression of genes outside the dcw cluster. In support of this, purified MraZ could directly bind to a putative operator site upstream of mioC, one of the repressed genes identified by RNA-seq. PMID:24659771

  14. The highly conserved MraZ protein is a transcriptional regulator in Escherichia coli

    SciTech Connect

    Eraso, Jesus M.; Markillie, Lye Meng; Mitchell, Hugh D.; Taylor, Ronald C.; Orr, Galya; Margolin, William

    2014-05-05

    The mraZ and mraW genes are highly conserved in bacteria, both in sequence and location at the head of the division and cell wall (dcw) gene cluster. Although MraZ has structural similarity to the AbrB transition state regulator and the MazE antitoxin, and MraW is known to methylate ribosomal RNA, mraZ and mraW null mutants have no detectable growth phenotype in any species tested to date, hampering progress in understanding their physiological role. Here we show that overproduction of Escherichia coli MraZ perturbs cell division and the cell envelope, is more lethal at high levels or in minimal growth medium, and that MraW antagonizes these effects. MraZGFP localizes to the nucleoid, suggesting that it binds DNA. Indeed, purified MraZ directly binds a region upstream from its own promoter containing three direct repeats to regulate its own expression and that of downstream cell division and cell wall genes. MraZ-LacZ fusions are repressed by excess MraZ but not when DNA binding by MraZ is inhibited. RNAseq analysis indicates that MraZ is a global transcriptional regulator with numerous targets in addition to dcw genes. One of these targets, mioC, is directly bound by MraZ in a region with three direct repeats.

  15. High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA

    PubMed Central

    Wang, Wenqin; Messing, Joachim

    2011-01-01

    Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804

  16. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  17. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2016-07-12

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  18. Genome sequence of ground tit Pseudopodoces humilis and its adaptation to high altitude

    PubMed Central

    2013-01-01

    Background The mechanism of high-altitude adaptation has been studied in certain mammals. However, in avian species like the ground tit Pseudopodoces humilis, the adaptation mechanism remains unclear. The phylogeny of the ground tit is also controversial. Results Using next generation sequencing technology, we generated and assembled a draft genome sequence of the ground tit. The assembly contained 1.04 Gb of sequence that covered 95.4% of the whole genome and had higher N50 values, at the level of both scaffolds and contigs, than other sequenced avian genomes. About 1.7 million SNPs were detected, 16,998 protein-coding genes were predicted and 7% of the genome was identified as repeat sequences. Comparisons between the ground tit genome and other avian genomes revealed a conserved genome structure and confirmed the phylogeny of ground tit as not belonging to the Corvidae family. Gene family expansion and positively selected gene analysis revealed genes that were related to cardiac function. Our findings contribute to our understanding of the adaptation of this species to extreme environmental living conditions. Conclusions Our data and analysis contribute to the study of avian evolutionary history and provide new insights into the adaptation mechanisms to extreme conditions in animals. PMID:23537097

  19. Savant: genome browser for high-throughput sequencing data.

    PubMed

    Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

    2010-08-15

    The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Savant is freely available at http://compbio.cs.toronto.edu/savant.

  20. De novo sequencing of highly modified therapeutic oligonucleotides by hydrophobic tag sequencing coupled with LC-MS.

    PubMed

    Goto, R; Miyakawa, S; Inomata, E; Takami, T; Yamaura, J; Nakamura, Y

    2017-02-01

    Correct sequences are prerequisite for quality control of therapeutic oligonucleotides. However, there is no definitive method available for determining sequences of highly modified therapeutic RNAs, and thereby, most of the oligonucleotides have been used clinically without direct sequence determination. In this study, we developed a novel sequencing method called 'hydrophobic tag sequencing'. Highly modified oligonucleotides are sequenced by partially digesting oligonucleotides conjugated with a 5'-hydrophobic tag, followed by liquid chromatography-mass spectrometry analysis. 5'-Hydrophobic tag-printed fragments (5'-tag degradates) can be separated in order of their molecular masses from tag-free oligonucleotides by reversed-phase liquid chromatography. As models for the sequencing, the anti-VEGF aptamer (Macugen) and the highly modified 38-mer RNA sequences were analyzed under blind conditions. Most nucleotides were identified from the molecular weight of hydrophobic 5'-tag degradates calculated from monoisotopic mass in simple full mass data. When monoisotopic mass could not be assigned, the nucleotide was estimated using the molecular weight of the most abundant mass. The sequences of Macugen and 38-mer RNA perfectly matched the theoretical sequences. The hydrophobic tag sequencing worked well to obtain simple full mass data, resulting in accurate and clear sequencing. The present study provides for the first time a de novo sequencing technology for highly modified RNAs and contributes to quality control of therapeutic oligonucleotides. Copyright © 2016 John Wiley & Sons, Ltd.

  1. Integrating nested PCR with high-throughput sequencing to characterize mutations of HBV genome in low viral load samples.

    PubMed

    Wang, Xianjun; Xu, Lihui; Chen, Yueming; Liu, Anbing; Wang, Liqian; Xu, Peisong; Liu, Yunhui; Li, Lei; Meng, Fei

    2017-07-01

    Due to the low viral load of hepatitis B virus (HBV) in plasma samples, conventional techniques have limitations to the detection of antiviral resistance mutations. To solve the problem, we developed a fast, highly sensitive, and accurate method to sequence the HBV whole-genome sequencing in plasma samples which had various viral loads from very low to high.Twenty-one plasma samples were collected from patients who were carriers of HBV from the Hangzhou First People's Hospital. Two pairs of conserved, overlapping, nested primers were used to amplify and sequence the whole HBV genome in 8 plasma samples with different viral loads. High-throughput sequencing was performed on Illumina MiSeq platform. Concomitantly, 3 samples were directly sequenced without PCR amplification. We compared amplicon-sequencing with direct sequencing to develop a method for amplifying and characterizing the whole genome of HBV.HBV genome was amplified from all samples and verified by Sanger sequencing, regardless of the viral loads. Sequencing results revealed that only a few reads were mapped to the HBV genome following direct sequencing, while the amplicon-sequencing reads had a good coverage and depth. We identified 50 intrahost single nucleotide variations (iSNVs), 14 of which were low frequency mutations. Interestingly, iSNVs were more common in low viral load samples than in high viral load samples, and mutations in the reverse transcriptase (RT) region were most prevalent.We conclude that amplicon-sequencing is not only a practical method to detect HBV infection with a high sensitivity and accuracy but also enables to detect mutations in the HBV genome in low viral load samples from HBV-infected patients. Thus, our findings provide a new diagnosis method of HBV infection, which is capable of detection of low frequent mutations in low viral load samples.

  2. Compression of Structured High-Throughput Sequencing Data

    PubMed Central

    Campagne, Fabien; Dorff, Kevin C.; Chambwe, Nyasha; Robinson, James T.; Mesirov, Jill P.

    2013-01-01

    Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays. PMID:24260313

  3. Human T-cell recognition of synthetic peptides representing conserved and variant sequences from the merozoite surface protein 2 of Plasmodium falciparum.

    PubMed

    Theander, T G; Hviid, L; Dodoo, D; Afari, E A; Jensen, J B; Rzepczyk, C M

    1997-06-01

    Merozoite surface protein 2 (MSP2) is a malaria vaccine candidate currently undergoing clinical trials. We analyzed the peripheral blood mononuclear cell (PBMC) response to synthetic peptides corresponding to conserved and variant regions of the FCQ-27 allelic form of MSP2 in Ghanaian individuals from an area of hyperendemic malaria transmission and in Danes without exposure to malaria. PBMC from 20-39% of Ghanaians responded to each of the peptides by proliferation and 29-36% had PBMC which produced interferon-gamma (IFN-gamma) in response to peptide stimulation. In Danes, there was no proliferation to two of the peptides and only PBMC from 5% of the individuals proliferated to the other three peptides. IFN-gamma production was not detected to any peptide. In both Danes and Ghanaians in only a few instances was IL-4 detected in the PBMC cultures. Overall PBMC from 79% of the Ghanaians responded by proliferation and/or cytokine secretion to at least one of three peptides tested, whereas responses were only observed in 14% of Danes (P = 0.002). These data suggest that the Ghanaians had expanded peripheral blood T-cell populations recognizing the peptides as a result of natural infection. The findings are encouraging for the development of a vaccine based on these T-epitope containing regions of MSP2, as the peptides were broadly recognized suggesting that they can bind to diverse HLA alleles and also because they include conserved MSP2 sequences. Immunisation with a vaccine construct incorporating the sequences present in these peptides could thus be expected to be immunogenic in a high percentage of individuals and lead to the establishment of memory T-cells, which can be boosted through natural infection.

  4. High-throughput sequencing of small RNAs and anatomical characteristics associated with leaf development in celery.

    PubMed

    Jia, Xiao-Ling; Li, Meng-Yao; Jiang, Qian; Xu, Zhi-Sheng; Wang, Feng; Xiong, Ai-Sheng

    2015-06-09

    MicroRNAs (miRNAs) exhibit diverse and important roles in plant growth, development, and stress responses and regulate gene expression at the post-transcriptional level. Knowledge about the diversity of miRNAs and their roles in leaf development in celery remains unknown. To elucidate the roles of miRNAs in celery leaf development, we identified leaf development-related miRNAs through high-throughput sequencing. Small RNA libraries were constructed using leaves from three stages (10, 20, and 30 cm) of celery cv.'Ventura' and then subjected to high-throughput sequencing and bioinformatics analysis. At Stage 1, Stage 2, and Stage 3 of 'Ventura', a total of 333, 329, and 344 conserved miRNAs (belonging to 35, 35, and 32 families, respectively) were identified. A total of 131 miRNAs were identified as novel in 'Ventura'. Potential miRNA target genes were predicted and annotated using the eggNOG, GO, and KEGG databases to explore gene functions. The abundance of five conserved miRNAs and their corresponding potential target genes were validated. Expression profiles of novel potential miRNAs were also detected. Anatomical characteristics of the leaf blades and petioles at three leaf stages were further analyzed. This study contributes to our understanding on the functions and molecular regulatory mechanisms of miRNAs in celery leaf development.

  5. Compressive biological sequence analysis and archival in the era of high-throughput sequencing technologies.

    PubMed

    Giancarlo, Raffaele; Rombo, Simona E; Utro, Filippo

    2014-05-01

    High-throughput sequencing technologies produce large collections of data, mainly DNA sequences with additional information, requiring the design of efficient and effective methodologies for both their compression and storage. In this context, we first provide a classification of the main techniques that have been proposed, according to three specific research directions that have emerged from the literature and, for each, we provide an overview of the current techniques. Finally, to make this review useful to researchers and technicians applying the existing software and tools, we include a synopsis of the main characteristics of the described approaches, including details on their implementation and availability. Performance of the various methods is also highlighted, although the state of the art does not lend itself to a consistent and coherent comparison among all the methods presented here.

  6. Population Genomic Analysis Reveals Highly Conserved Mitochondrial Genomes in the Yeast Species Lachancea thermotolerans

    PubMed Central

    Freel, Kelle C.; Friedrich, Anne; Hou, Jing; Schacherer, Joseph

    2014-01-01

    The increasing availability of mitochondrial (mt) sequence data from various yeasts provides a tool to study genomic evolution within and between different species. While the genomes from a range of lineages are available, there is a lack of information concerning intraspecific mtDNA diversity. Here, we analyzed the mt genomes of 50 strains from Lachancea thermotolerans, a protoploid yeast species that has been isolated from several locations (Europe, Asia, Australia, South Africa, and North / South America) and ecological sources (fruit, tree exudate, plant material, and grape and agave fermentations). Protein-coding genes from the mtDNA were used to construct a phylogeny, which reflected a similar, yet less resolved topology than the phylogenetic tree of 50 nuclear genes. In comparison to its sister species Lachancea kluyveri, L. thermotolerans has a smaller mt genome. This is due to shorter intergenic regions and fewer introns, of which the latter are only found in COX1. We revealed that L. kluyveri and L. thermotolerans share similar levels of intraspecific divergence concerning the nuclear genomes. However, L. thermotolerans has a more highly conserved mt genome with the coding regions characterized by low rates of nonsynonymous substitution. Thus, in the mt genomes of L. thermotolerans, stronger purifying selection and lower mutation rates potentially shape genome diversity in contract to what was found for L. kluyveri, demonstrating that the factors driving mt genome evolution are different even between closely related species. PMID:25212859

  7. MEGARes: an antimicrobial resistance database for high throughput sequencing

    PubMed Central

    Lakin, Steven M.; Dean, Chris; Noyes, Noelle R.; Dettenwanger, Adam; Ross, Anne Spencer; Doster, Enrique; Rovira, Pablo; Abdo, Zaid; Jones, Kenneth L.; Ruiz, Jaime; Belk, Keith E.; Morley, Paul S.; Boucher, Christina

    2017-01-01

    Antimicrobial resistance has become an imminent concern for public health. As methods for detection and characterization of antimicrobial resistance move from targeted culture and polymerase chain reaction to high throughput metagenomics, appropriate resources for the analysis of large-scale data are required. Currently, antimicrobial resistance databases are tailored to smaller-scale, functional profiling of genes using highly descriptive annotations. Such characteristics do not facilitate the analysis of large-scale, ecological sequence datasets such as those produced with the use of metagenomics for surveillance. In order to overcome these limitations, we present MEGARes (https://megares.meglab.org), a hand-curated antimicrobial resistance database and annotation structure that provides a foundation for the development of high throughput acyclical classifiers and hierarchical statistical analysis of big data. MEGARes can be browsed as a stand-alone resource through the website or can be easily integrated into sequence analysis pipelines through download. Also via the website, we provide documentation for AmrPlusPlus, a user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database. PMID:27899569

  8. MEGARes: an antimicrobial resistance database for high throughput sequencing.

    PubMed

    Lakin, Steven M; Dean, Chris; Noyes, Noelle R; Dettenwanger, Adam; Ross, Anne Spencer; Doster, Enrique; Rovira, Pablo; Abdo, Zaid; Jones, Kenneth L; Ruiz, Jaime; Belk, Keith E; Morley, Paul S; Boucher, Christina

    2017-01-04

    Antimicrobial resistance has become an imminent concern for public health. As methods for detection and characterization of antimicrobial resistance move from targeted culture and polymerase chain reaction to high throughput metagenomics, appropriate resources for the analysis of large-scale data are required. Currently, antimicrobial resistance databases are tailored to smaller-scale, functional profiling of genes using highly descriptive annotations. Such characteristics do not facilitate the analysis of large-scale, ecological sequence datasets such as those produced with the use of metagenomics for surveillance. In order to overcome these limitations, we present MEGARes (https://megares.meglab.org), a hand-curated antimicrobial resistance database and annotation structure that provides a foundation for the development of high throughput acyclical classifiers and hierarchical statistical analysis of big data. MEGARes can be browsed as a stand-alone resource through the website or can be easily integrated into sequence analysis pipelines through download. Also via the website, we provide documentation for AmrPlusPlus, a user-friendly Galaxy pipeline for the analysis of high throughput sequencing data that is pre-packaged for use with the MEGARes database.

  9. 7 CFR Exhibit M to Subpart G of... - Implementation Pr