sites sequence analysis: Topics by Science.gov

Sample records for sites sequence analysis

VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

PubMed

Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

2015-07-07

Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Molecular Analysis of Dehalococcoides 16S Ribosomal DNA from Chloroethene-Contaminated Sites throughout North America and Europe

PubMed Central

Hendrickson, Edwin R.; Payne, Jo Ann; Young, Roslyn M.; Starr, Mark G.; Perry, Michael P.; Fahnestock, Stephen; Ellis, David E.; Ebersole, Richard C.

2002-01-01

The environmental distribution of Dehalococcoides group organisms and their association with chloroethene-contaminated sites were examined. Samples from 24 chloroethene-dechlorinating sites scattered throughout North America and Europe were tested for the presence of members of the Dehalococcoides group by using a PCR assay developed to detect Dehalococcoides 16S rRNA gene (rDNA) sequences. Sequences identified by sequence analysis as sequences of members of the Dehalococcoides group were detected at 21 sites. Full dechlorination of chloroethenes to ethene occurred at these sites. Dehalococcoides sequences were not detected in samples from three sites at which partial dechlorination of chloroethenes occurred, where dechlorination appeared to stop at 1,2-cis-dichloroethene. Phylogenetic analysis of the 16S rDNA amplicons confirmed that Dehalococcoides sequences formed a unique 16S rDNA group. These 16S rDNA sequences were divided into three subgroups based on specific base substitution patterns in variable regions 2 and 6 of the Dehalococcoides 16S rDNA sequence. Analyses also demonstrated that specific base substitution patterns were signature patterns. The specific base substitutions distinguished the three sequence subgroups phylogenetically. These results demonstrated that members of the Dehalococcoides group are widely distributed in nature and can be found in a variety of geological formations and in different climatic zones. Furthermore, the association of these organisms with full dechlorination of chloroethenes suggests that they are promising candidates for engineered bioremediation and may be important contributors to natural attenuation of chloroethenes. PMID:11823182
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
A simple procedure for parallel sequence analysis of both strands of 5'-labeled DNA.

PubMed

Razvi, F; Gargiulo, G; Worcel, A

1983-08-01

Ligation of a 5'-labeled DNA restriction fragment results in a circular DNA molecule carrying the two 32Ps at the reformed restriction site. Double digestions of the circular DNA with the original enzyme and a second restriction enzyme cleavage near the labeled site allows direct chemical sequencing of one 5'-labeled DNA strand. Similar double digestions, using an isoschizomer that cleaves differently at the 32P-labeled site, allows direct sequencing of the now 3'-labeled complementary DNA strand. It is possible to directly sequence both strands of cloned DNA inserts by using the above protocol and a multiple cloning site vector that provides the necessary restriction sites. The simultaneous and parallel visualization of both DNA strands eliminates sequence ambiguities. In addition, the labeled circular molecules are particularly useful for single-hit DNA cleavage studies and DNA footprint analysis. As an example, we show here an analysis of the micrococcal nuclease-induced breaks on the two strands of the somatic 5S RNA gene of Xenopus borealis, which suggests that the enzyme may recognize and cleave small AT-containing palindromes along the DNA helix.
Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

PubMed

Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

2012-01-01

Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.
Active Site Characterization of Proteases Sequences from Different Species of Aspergillus.

PubMed

Morya, V K; Yadav, Virendra K; Yadav, Sangeeta; Yadav, Dinesh

2016-09-01

A total of 129 proteases sequences comprising 43 serine proteases, 36 aspartic proteases, 24 cysteine protease, 21 metalloproteases, and 05 neutral proteases from different Aspergillus species were analyzed for the catalytically active site residues using MEROPS database and various bioinformatics tools. Different proteases have predominance of variable active site residues. In case of 24 cysteine proteases of Aspergilli, the predominant active site residues observed were Gln193, Cys199, His364, Asn384 while for 43 serine proteases, the active site residues namely Asp164, His193, Asn284, Ser349 and Asp325, His357, Asn454, Ser519 were frequently observed. The analysis of 21 metalloproteases of Aspergilli revealed Glu298 and Glu388, Tyr476 as predominant active site residues. In general, Aspergilli species-specific active site residues were observed for different types of protease sequences analyzed. The phylogenetic analysis of these 129 proteases sequences revealed 14 different clans representing different types of proteases with diverse active site residues.
Purification and sequencing of the active site tryptic peptide from penicillin-binding protein 1b of Escherichia coli

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nicholas, R.A.; Suzuki, H.; Hirota, Y.

This paper reports the sequence of the active site peptide of penicillin-binding protein 1b from Escherichia coli. Purified penicillin-binding protein 1b was labeled with (/sup 14/C)penicillin G, digested with trypsin, and partially purified by gel filtration. Upon further purification by high-pressure liquid chromatography, two radioactive peaks were observed, and the major peak, representing over 75% of the applied radioactivity, was submitted to amino acid analysis and sequencing. The sequence Ser-Ile-Gly-Ser-Leu-Ala-Lys was obtained. The active site nucleophile was identified by digesting the purified peptide with aminopeptidase M and separating the radioactive products on high-pressure liquid chromatography. Amino acid analysis confirmed thatmore » the serine residue in the middle of the sequence was covalently bonded to the (/sup 14/C)penicilloyl moiety. A comparison of this sequence to active site sequences of other penicillin-binding proteins and beta-lactamases is presented.« less
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Molecular Characterization of Transgene Integration by Next-Generation Sequencing in Transgenic Cattle

PubMed Central

Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning

2012-01-01

As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species. PMID:23185606
Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis

PubMed Central

Miotto, Olivo; Heiny, AT; Tan, Tin Wee; August, J Thomas; Brusic, Vladimir

2008-01-01

Background The identification of mutations that confer unique properties to a pathogen, such as host range, is of fundamental importance in the fight against disease. This paper describes a novel method for identifying amino acid sites that distinguish specific sets of protein sequences, by comparative analysis of matched alignments. The use of mutual information to identify distinctive residues responsible for functional variants makes this approach highly suitable for analyzing large sets of sequences. To support mutual information analysis, we developed the AVANA software, which utilizes sequence annotations to select sets for comparison, according to user-specified criteria. The method presented was applied to an analysis of influenza A PB2 protein sequences, with the objective of identifying the components of adaptation to human-to-human transmission, and reconstructing the mutation history of these components. Results We compared over 3,000 PB2 protein sequences of human-transmissible and avian isolates, to produce a catalogue of sites involved in adaptation to human-to-human transmission. This analysis identified 17 characteristic sites, five of which have been present in human-transmissible strains since the 1918 Spanish flu pandemic. Sixteen of these sites are located in functional domains, suggesting they may play functional roles in host-range specificity. The catalogue of characteristic sites was used to derive sequence signatures from historical isolates. These signatures, arranged in chronological order, reveal an evolutionary timeline for the adaptation of the PB2 protein to human hosts. Conclusion By providing the most complete elucidation to date of the functional components participating in PB2 protein adaptation to humans, this study demonstrates that mutual information is a powerful tool for comparative characterization of sequence sets. In addition to confirming previously reported findings, several novel characteristic sites within PB2 are reported. Sequence signatures generated using the characteristic sites catalogue characterize concisely the adaptation characteristics of individual isolates. Evolutionary timelines derived from signatures of early human influenza isolates suggest that characteristic variants emerged rapidly, and remained remarkably stable through subsequent pandemics. In addition, the signatures of human-infecting H5N1 isolates suggest that this avian subtype has low pandemic potential at present, although it presents more human adaptation components than most avian subtypes. PMID:18315849
EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

PubMed

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-07-01

EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.
Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

2004-08-06

The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites

PubMed Central

Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.

PubMed

Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.
Ub-ISAP: a streamlined UNIX pipeline for mining unique viral vector integration sites from next generation sequencing data.

PubMed

Kamboj, Atul; Hallwirth, Claus V; Alexander, Ian E; McCowage, Geoffrey B; Kramer, Belinda

2017-06-17

The analysis of viral vector genomic integration sites is an important component in assessing the safety and efficiency of patient treatment using gene therapy. Alongside this clinical application, integration site identification is a key step in the genetic mapping of viral elements in mutagenesis screens that aim to elucidate gene function. We have developed a UNIX-based vector integration site analysis pipeline (Ub-ISAP) that utilises a UNIX-based workflow for automated integration site identification and annotation of both single and paired-end sequencing reads. Reads that contain viral sequences of interest are selected and aligned to the host genome, and unique integration sites are then classified as transcription start site-proximal, intragenic or intergenic. Ub-ISAP provides a reliable and efficient pipeline to generate large datasets for assessing the safety and efficiency of integrating vectors in clinical settings, with broader applications in cancer research. Ub-ISAP is available as an open source software package at https://sourceforge.net/projects/ub-isap/ .
Sirius PSB: a generic system for analysis of biological sequences.

PubMed

Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

2009-12-01

Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.
Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.

PubMed

Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu

2017-06-14

Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.
EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

PubMed Central

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-01-01

EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

The genome-wide DNA sequence specificity of the anti-tumour drug bleomycin in human cells.

PubMed

Murray, Vincent; Chen, Jon K; Tanaka, Mark M

2016-07-01

The cancer chemotherapeutic agent, bleomycin, cleaves DNA at specific sites. For the first time, the genome-wide DNA sequence specificity of bleomycin breakage was determined in human cells. Utilising Illumina next-generation DNA sequencing techniques, over 200 million bleomycin cleavage sites were examined to elucidate the bleomycin genome-wide DNA selectivity. The genome-wide bleomycin cleavage data were analysed by four different methods to determine the cellular DNA sequence specificity of bleomycin strand breakage. For the most highly cleaved DNA sequences, the preferred site of bleomycin breakage was at 5'-GT* dinucleotide sequences (where the asterisk indicates the bleomycin cleavage site), with lesser cleavage at 5'-GC* dinucleotides. This investigation also determined longer bleomycin cleavage sequences, with preferred cleavage at 5'-GT*A and 5'- TGT* trinucleotide sequences, and 5'-TGT*A tetranucleotides. For cellular DNA, the hexanucleotide DNA sequence 5'-RTGT*AY (where R is a purine and Y is a pyrimidine) was the most highly cleaved DNA sequence. It was striking that alternating purine-pyrimidine sequences were highly cleaved by bleomycin. The highest intensity cleavage sites in cellular and purified DNA were very similar although there were some minor differences. Statistical nucleotide frequency analysis indicated a G nucleotide was present at the -3 position (relative to the cleavage site) in cellular DNA but was absent in purified DNA.
The Ensembl genome database project.

PubMed

Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

2002-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

PubMed Central

Ina, Y; Gojobori, T

1994-01-01

To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

PubMed

Rogan, P K; Schneider, T D

1995-01-01

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

2004-08-06

Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene,more » and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less
Genome-Wide Analysis of A-to-I RNA Editing.

PubMed

Savva, Yiannis A; Laurent, Georges St; Reenan, Robert A

2016-01-01

Adenosine (A)-to-inosine (I) RNA editing is a fundamental posttranscriptional modification that ensures the deamination of A-to-I in double-stranded (ds) RNA molecules. Intriguingly, the A-to-I RNA editing system is particularly active in the nervous system of higher eukaryotes, altering a plethora of noncoding and coding sequences. Abnormal RNA editing is highly associated with many neurological phenotypes and neurodevelopmental disorders. However, the molecular mechanisms underlying RNA editing-mediated pathogenesis still remain enigmatic and have attracted increasing attention from researchers. Over the last decade, methods available to perform genome-wide transcriptome analysis, have evolved rapidly. Within the RNA editing field researchers have adopted next-generation sequencing technologies to identify RNA-editing sites within genomes and to elucidate the underlying process. However, technical challenges associated with editing site discovery have hindered efforts to uncover comprehensive editing site datasets, resulting in the general perception that the collections of annotated editing sites represent only a small minority of the total number of sites in a given organism, tissue, or cell type of interest. Additionally to doubts about sensitivity, existing RNA-editing site lists often contain high percentages of false positives, leading to uncertainty about their validity and usefulness in downstream studies. An accurate investigation of A-to-I editing requires properly validated datasets of editing sites with demonstrated and transparent levels of sensitivity and specificity. Here, we describe a high signal-to-noise method for RNA-editing site detection using single-molecule sequencing (SMS). With this method, authentic RNA-editing sites may be differentiated from artifacts. Machine learning approaches provide a procedure to improve upon and experimentally validate sequencing outcomes through use of computationally predicted, iterative feedback loops. Subsequent use of extensive Sanger sequencing validations can generate accurate editing site lists. This approach has broad application and accurate genome-wide editing analysis of various tissues from clinical specimens or various experimental organisms is now a possibility.
Molecular cloning and analysis of Schizosaccharomyces pombe Reb1p: sequence-specific recognition of two sites in the far upstream rDNA intergenic spacer.

PubMed Central

Zhao, A; Guo, A; Liu, Z; Pape, L

1997-01-01

The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645
Automated frame selection process for high-resolution microendoscopy

NASA Astrophysics Data System (ADS)

Ishijima, Ayumu; Schwarz, Richard A.; Shin, Dongsuk; Mondrik, Sharon; Vigneswaran, Nadarajah; Gillenwater, Ann M.; Anandasabapathy, Sharmila; Richards-Kortum, Rebecca

2015-04-01

We developed an automated frame selection algorithm for high-resolution microendoscopy video sequences. The algorithm rapidly selects a representative frame with minimal motion artifact from a short video sequence, enabling fully automated image analysis at the point-of-care. The algorithm was evaluated by quantitative comparison of diagnostically relevant image features and diagnostic classification results obtained using automated frame selection versus manual frame selection. A data set consisting of video sequences collected in vivo from 100 oral sites and 167 esophageal sites was used in the analysis. The area under the receiver operating characteristic curve was 0.78 (automated selection) versus 0.82 (manual selection) for oral sites, and 0.93 (automated selection) versus 0.92 (manual selection) for esophageal sites. The implementation of fully automated high-resolution microendoscopy at the point-of-care has the potential to reduce the number of biopsies needed for accurate diagnosis of precancer and cancer in low-resource settings where there may be limited infrastructure and personnel for standard histologic analysis.
Precise assignment of the heavy-strand promoter of mouse mitochondrial DNA: cognate start sites are not required for transcriptional initiation.

PubMed Central

Chang, D D; Clayton, D A

1986-01-01

Transcription of the heavy strand of mouse mitochondrial DNA starts from two closely spaced, distinct sites located in the displacement loop region of the genome. We report here an analysis of regulatory sequences required for faithful transcription from these two sites. Data obtained from in vitro assays demonstrated that a 51-base-pair region, encompassing nucleotides -40 to +11 of the downstream start site, contains sufficient information for accurate transcription from both start sites. Deletion of the 3' flanking sequences, including one or both start sites to -17, resulted in the initiation of transcription by the mitochondrial RNA polymerase from alternative sites within vector DNA sequences. This feature places the mouse heavy-strand promoter uniquely among other known mitochondrial promoters, all of which absolutely require cognate start sites for transcription. Comparison of the heavy-strand promoter with those of other vertebrate mitochondrial DNAs revealed a remarkably high rate of sequence divergence among species. Images PMID:3785226
Characterization of the genetic elements required for site-specific integration of plasmid pSE211 in Saccharopolyspora erythraea.

PubMed Central

Brown, D P; Idler, K B; Katz, L

1990-01-01

The 18.1-kilobase plasmid pSE211 integrates into the chromosome of Saccharopolyspora erythraea at a specific attB site. Restriction analysis of the integrated plasmid, pSE211int, and adjacent chromosomal sequences allowed identification of attP, the plasmid attachment site. Nucleotide sequencing of attP, attB, attL, and attR revealed a 57-base-pair sequence common to all sites with no duplications of adjacent plasmid or chromosomal sequences in the integrated state, indicating that integration takes place through conservative, reciprocal strand exchange. An analysis of the sequences indicated the presence of a putative gene for Phe-tRNA at attB which is preserved at attL after integration has occurred. A comparison of the attB site for a number of actinomycete plasmids is presented. Integration at attB was also observed when a 2.4-kilobase segment of pSE211 containing attP and the adjacent plasmid sequence was used to transform a pSE211- host. Nucleotide sequencing of this segment revealed the presence of two complete open reading frames (ORFs) and a segment of a third ORF. The ORF adjacent to attP encodes a putative polypeptide 437 amino acids in length that shows similarity, at its C-terminal domain, to sequences of site-specific recombinases of the integrase family. The adjacent ORF encodes a putative 98-amino-acid basic polypeptide that contains a helix-turn-helix motif at its N terminus which corresponds to domains in the Xis proteins of a number of bacteriophages. A proposal for the function of this polypeptide is presented. The deduced amino acid sequence of the third ORF did not reveal similarities to polypeptide sequences in the current data banks. Images FIG. 2 FIG. 3 PMID:2180909
A simple algorithm for quantifying DNA methylation levels on multiple independent CpG sites in bisulfite genomic sequencing electropherograms.

PubMed

Leakey, Tatiana I; Zielinski, Jerzy; Siegfried, Rachel N; Siegel, Eric R; Fan, Chun-Yang; Cooney, Craig A

2008-06-01

DNA methylation at cytosines is a widely studied epigenetic modification. Methylation is commonly detected using bisulfite modification of DNA followed by PCR and additional techniques such as restriction digestion or sequencing. These additional techniques are either laborious, require specialized equipment, or are not quantitative. Here we describe a simple algorithm that yields quantitative results from analysis of conventional four-dye-trace sequencing. We call this method Mquant and we compare it with the established laboratory method of combined bisulfite restriction assay (COBRA). This analysis of sequencing electropherograms provides a simple, easily applied method to quantify DNA methylation at specific CpG sites.
Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

PubMed

Singh, Vinod Kumar; Krishnamachari, Annangarachari

2016-09-01

Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.

PubMed

Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart

2014-01-15

High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing

PubMed Central

Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

2016-01-01

Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
The low information content of Neurospora splicing signals: implications for RNA splicing and intron origin.

PubMed

Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M

2015-05-01

When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Application of the MIDAS approach for analysis of lysine acetylation sites.

PubMed

Evans, Caroline A; Griffiths, John R; Unwin, Richard D; Whetton, Anthony D; Corfe, Bernard M

2013-01-01

Multiple Reaction Monitoring Initiated Detection and Sequencing (MIDAS™) is a mass spectrometry-based technique for the detection and characterization of specific post-translational modifications (Unwin et al. 4:1134-1144, 2005), for example acetylated lysine residues (Griffiths et al. 18:1423-1428, 2007). The MIDAS™ technique has application for discovery and analysis of acetylation sites. It is a hypothesis-driven approach that requires a priori knowledge of the primary sequence of the target protein and a proteolytic digest of this protein. MIDAS essentially performs a targeted search for the presence of modified, for example acetylated, peptides. The detection is based on the combination of the predicted molecular weight (measured as mass-charge ratio) of the acetylated proteolytic peptide and a diagnostic fragment (product ion of m/z 126.1), which is generated by specific fragmentation of acetylated peptides during collision induced dissociation performed in tandem mass spectrometry (MS) analysis. Sequence information is subsequently obtained which enables acetylation site assignment. The technique of MIDAS was later trademarked by ABSciex for targeted protein analysis where an MRM scan is combined with full MS/MS product ion scan to enable sequence confirmation.
The Human Transcript Database: A Catalogue of Full Length cDNA Inserts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bouckk John; Michael McLeod; Kim Worley

1999-09-10

The BCM Search Launcher provided improved access to web-based sequence analysis services during the granting period and beyond. The Search Launcher web site grouped analysis procedures by function and provided default parameters that provided reasonable search results for most applications. For instance, most queries were automatically masked for repeat sequences prior to sequence database searches to avoid spurious matches. In addition to the web-based access and arrangements that were made using the functions easier, the BCM Search Launcher provided unique value-added applications like the BEAUTY sequence database search tool that combined information about protein domains and sequence database search resultsmore » to give an enhanced, more complete picture of the reliability and relative value of the information reported. This enhanced search tool made evaluating search results more straight-forward and consistent. Some of the favorite features of the web site are the sequence utilities and the batch client functionality that allows processing of multiple samples from the command line interface. One measure of the success of the BCM Search Launcher is the number of sites that have adopted the models first developed on the site. The graphic display on the BLAST search from the NCBI web site is one such outgrowth, as is the display of protein domain search results within BLAST search results, and the design of the Biology Workbench application. The logs of usage and comments from users confirm the great utility of this resource.« less
In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

PubMed Central

Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

2011-01-01

To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533
PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context

PubMed Central

Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing

2016-01-01

Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community. PMID:27282833
Molecular Analysis of Methanogen Richness in Landfill and Marshland Targeting 16S rDNA Sequences

PubMed Central

Yadav, Shailendra; Kundu, Sharbadeb; Ghosh, Sankar K.; Maitra, S. S.

2015-01-01

Methanogens, a key contributor in global carbon cycling, methane emission, and alternative energy production, generate methane gas via anaerobic digestion of organic matter. The methane emission potential depends upon methanogenic diversity and activity. Since they are anaerobes and difficult to isolate and culture, their diversity present in the landfill sites of Delhi and marshlands of Southern Assam, India, was analyzed using molecular techniques like 16S rDNA sequencing, DGGE, and qPCR. The sequencing results indicated the presence of methanogens belonging to the seventh order and also the order Methanomicrobiales in the Ghazipur and Bhalsawa landfill sites of Delhi. Sequences, related to the phyla Crenarchaeota (thermophilic) and Thaumarchaeota (mesophilic), were detected from marshland sites of Southern Assam, India. Jaccard analysis of DGGE gel using Gel2K showed three main clusters depending on the number and similarity of band patterns. The copy number analysis of hydrogenotrophic methanogens using qPCR indicates higher abundance in landfill sites of Delhi as compared to the marshlands of Southern Assam. The knowledge about “methanogenic archaea composition” and “abundance” in the contrasting ecosystems like “landfill” and “marshland” may reorient our understanding of the Archaea inhabitants. This study could shed light on the relationship between methane-dynamics and the global warming process. PMID:26568700

The Function of Neuroendocrine Cells in Prostate Cancer

DTIC Science & Technology

2013-04-01

integration site. We then performed deep sequencing and aligned reads to the genome. Our analysis revealed that both histological phenotypes are derived from...lentiviral integration site analysis . (B) Laser capture microdissection was performed on individual glands containing both squamous and...lentiviral integration site analysis . LTR: long terminal repeat (viral DNA), PCR: polymerase chain reaction. (D) Venn diagrams depict shared lentiviral
Characterization and functional analysis of hypoxia-inducible factor HIF1α and its inhibitor HIF1αn in tilapia.

PubMed

Li, Hong Lian; Gu, Xiao Hui; Li, Bi Jun; Chen, Xiao; Lin, Hao Ran; Xia, Jun Hong

2017-01-01

Hypoxia is a major cause of fish morbidity and mortality in the aquatic environment. Hypoxia-inducible factors are very important modulators in the transcriptional response to hypoxic stress. In this study, we characterized and conducted functional analysis of hypoxia-inducible factor HIF1α and its inhibitor HIF1αn in Nile tilapia (Oreochromis niloticus). By cloning and Sanger sequencing, we obtained the full length cDNA sequences for HIF1α (2686bp) and HIF1αn (1308bp), respectively. The CDS of HIF1α includes 15 exons encoding 768 amino acid residues and the CDS of HIF1αn contains 8 exons encoding 354 amino acid residues. The complete CDS sequences of HIF1α and HIF1αn cloned from tilapia shared very high homology with known genes from other fishes. HIF1α show differentiated expression in different tissues (brain, heart, gill, spleen, liver) and at different hypoxia exposure times (6h, 12h, 24h). HIF1αn expression level under hypoxia is generally increased (6h, 12h, 24h) and shows extremely highly upregulation in brain tissue under hypoxia. A functional determination site analysis in the protein sequences between fish and land animals identified 21 amino acid sites in HIF1α and 2 sites in HIF1αn as significantly associated sites (α = 0.05). Phylogenetic tree-based positive selection analysis suggested 22 sites in HIF1α as positively selected sites with a p-value of at least 95% for fish lineages compared to the land animals. Our study could be important for clarifying the mechanism of fish adaptation to aquatic hypoxia environment.
Characterization and functional analysis of hypoxia-inducible factor HIF1α and its inhibitor HIF1αn in tilapia

PubMed Central

Li, Hong Lian; Gu, Xiao Hui; Li, Bi Jun; Chen, Xiao; Lin, Hao Ran; Xia, Jun Hong

2017-01-01

Hypoxia is a major cause of fish morbidity and mortality in the aquatic environment. Hypoxia-inducible factors are very important modulators in the transcriptional response to hypoxic stress. In this study, we characterized and conducted functional analysis of hypoxia-inducible factor HIF1α and its inhibitor HIF1αn in Nile tilapia (Oreochromis niloticus). By cloning and Sanger sequencing, we obtained the full length cDNA sequences for HIF1α (2686bp) and HIF1αn (1308bp), respectively. The CDS of HIF1α includes 15 exons encoding 768 amino acid residues and the CDS of HIF1αn contains 8 exons encoding 354 amino acid residues. The complete CDS sequences of HIF1α and HIF1αn cloned from tilapia shared very high homology with known genes from other fishes. HIF1α show differentiated expression in different tissues (brain, heart, gill, spleen, liver) and at different hypoxia exposure times (6h, 12h, 24h). HIF1αn expression level under hypoxia is generally increased (6h, 12h, 24h) and shows extremely highly upregulation in brain tissue under hypoxia. A functional determination site analysis in the protein sequences between fish and land animals identified 21 amino acid sites in HIF1α and 2 sites in HIF1αn as significantly associated sites (α = 0.05). Phylogenetic tree-based positive selection analysis suggested 22 sites in HIF1α as positively selected sites with a p-value of at least 95% for fish lineages compared to the land animals. Our study could be important for clarifying the mechanism of fish adaptation to aquatic hypoxia environment. PMID:28278251
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

PubMed Central

Marck, C

1988-01-01

DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831
Vaginal microbial flora analysis by next generation sequencing and microarrays; can microbes indicate vaginal origin in a forensic context?

PubMed

Benschop, Corina C G; Quaak, Frederike C A; Boon, Mathilde E; Sijen, Titia; Kuiper, Irene

2012-03-01

Forensic analysis of biological traces generally encompasses the investigation of both the person who contributed to the trace and the body site(s) from which the trace originates. For instance, for sexual assault cases, it can be beneficial to distinguish vaginal samples from skin or saliva samples. In this study, we explored the use of microbial flora to indicate vaginal origin. First, we explored the vaginal microbiome for a large set of clinical vaginal samples (n = 240) by next generation sequencing (n = 338,184 sequence reads) and found 1,619 different sequences. Next, we selected 389 candidate probes targeting genera or species and designed a microarray, with which we analysed a diverse set of samples; 43 DNA extracts from vaginal samples and 25 DNA extracts from samples from other body sites, including sites in close proximity of or in contact with the vagina. Finally, we used the microarray results and next generation sequencing dataset to assess the potential for a future approach that uses microbial markers to indicate vaginal origin. Since no candidate genera/species were found to positively identify all vaginal DNA extracts on their own, while excluding all non-vaginal DNA extracts, we deduce that a reliable statement about the cellular origin of a biological trace should be based on the detection of multiple species within various genera. Microarray analysis of a sample will then render a microbial flora pattern that is probably best analysed in a probabilistic approach.
Identification of the regulatory autophosphorylation site of autophosphorylation-dependent protein kinase (auto-kinase). Evidence that auto-kinase belongs to a member of the p21-activated kinase family.

PubMed

Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D

1998-08-15

Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2.
Identification of the regulatory autophosphorylation site of autophosphorylation-dependent protein kinase (auto-kinase). Evidence that auto-kinase belongs to a member of the p21-activated kinase family.

PubMed Central

Yu, J S; Chen, W J; Ni, M H; Chan, W H; Yang, S D

1998-01-01

Autophosphorylation-dependent protein kinase (auto-kinase) was identified from pig brain and liver on the basis of its unique autophosphorylation/activation property [Yang, Fong, Yu and Liu (1987) J. Biol. Chem. 262, 7034-7040; Yang, Chang and Soderling (1987) J. Biol. Chem. 262, 9421-9427]. Its substrate consensus sequence motif was determined as being -R-X-(X)-S*/T*-X3-S/T-. To characterize auto-kinase further, we partly sequenced the kinase purified from pig liver. The N-terminal sequence (VDGGAKTSDKQKKKAXMTDE) and two internal peptide sequences (EKLRTIV and LQNPEK/ILTP/FI) of auto-kinase were obtained. These sequences identify auto-kinase as a C-terminal catalytic fragment of p21-activated protein kinase 2 (PAK2 or gamma-PAK) lacking its N-terminal regulatory region. Auto-kinase can be recognized by an antibody raised against the C-terminal peptide of human PAK2 by immunoblotting. Furthermore the autophosphorylation site sequence of auto-kinase was successfully predicted on the basis of its substrate consensus sequence motif and the known PAK2 sequence, and was further demonstrated to be RST(P)MVGTPYWMAPEVVTR by phosphoamino acid analysis, manual Edman degradation and phosphopeptide mapping via the help of phosphorylation site analysis of a synthetic peptide corresponding to the sequence of PAK2 from residues 396 to 418. During the activation process, auto-kinase autophosphorylates mainly on a single threonine residue Thr402 (according to the sequence numbering of human PAK2). In addition, a phospho-specific antibody against a synthetic phosphopeptide containing this identified sequence was generated and shown to be able to differentially recognize the activated auto-kinase autophosphorylated at Thr402 but not the non-phosphorylated/inactive auto-kinase. Immunoblot analysis with this phospho-specific antibody further revealed that the change in phosphorylation level of Thr402 of auto-kinase was well correlated with the activity change of the kinase during both autophosphorylation/activation and protein phosphatase-mediated dephosphorylation/inactivation processes. Taken together, our results identify Thr402 as the regulatory autophosphorylation site of auto-kinase, which is a C-terminal catalytic fragment of PAK2. PMID:9693111
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases.

PubMed

Zhu, Lihua Julie; Lawrence, Michael; Gupta, Ankit; Pagès, Hervé; Kucukural, Alper; Garber, Manuel; Wolfe, Scot A

2017-05-15

Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed. Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity. The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html .
Ensembl 2002: accommodating comparative genomics.

PubMed

Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

2003-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Regulation of the alpha-glucuronidase-encoding gene ( aguA) from Aspergillus niger.

PubMed

de Vries, R P; van de Vondervoort, P J I; Hendriks, L; van de Belt, M; Visser, J

2002-09-01

The alpha-glucuronidase gene aguA from Aspergillus niger was cloned and characterised. Analysis of the promoter region of aguA revealed the presence of four putative binding sites for the major carbon catabolite repressor protein CREA and one putative binding site for the transcriptional activator XLNR. In addition, a sequence motif was detected which differed only in the last nucleotide from the XLNR consensus site. A construct in which part of the aguA coding region was deleted still resulted in production of a stable mRNA upon transformation of A. niger. The putative XLNR binding sites and two of the putative CREA binding sites were mutated individually in this construct and the effects on expression were examined in A. niger transformants. Northern analysis of the transformants revealed that the consensus XLNR site is not actually functional in the aguA promoter, whereas the sequence that diverges from the consensus at a single position is functional. This indicates that XLNR is also able to bind to the sequence GGCTAG, and the XLNR binding site consensus should therefore be changed to GGCTAR. Both CREA sites are functional, indicating that CREA has a strong influence on aguA expression. A detailed expression analysis of aguA in four genetic backgrounds revealed a second regulatory system involved in activation of aguA gene expression. This system responds to the presence of glucuronic and galacturonic acids, and is not dependent on XLNR.
Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules.

PubMed

Turatsinze, Jean-Valery; Thomas-Chollier, Morgane; Defrance, Matthieu; van Helden, Jacques

2008-01-01

This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.
Sequence signatures of allosteric proteins towards rational design.

PubMed

Namboodiri, Saritha; Verma, Chandra; Dhar, Pawan K; Giuliani, Alessandro; Nair, Achuthsankar S

2010-12-01

Allostery is the phenomenon of changes in the structure and activity of proteins that appear as a consequence of ligand binding at sites other than the active site. Studying mechanistic basis of allostery leading to protein design with predetermined functional endpoints is an important unmet need of synthetic biology. Here, we screened the amino acid sequence landscape in search of sequence-signatures of allostery using Recurrence Quantitative Analysis (RQA) method. A characteristic vector, comprised of 10 features extracted from RQA was defined for amino acid sequences. Using Principal Component Analysis, four factors were found to be important determinants of allosteric behavior. Our sequence-based predictor method shows 82.6% accuracy, 85.7% sensitivity and 77.9% specificity with the current dataset. Further, we show that Laminarity-Mean-hydrophobicity representing repeated hydrophobic patches is the most crucial indicator of allostery. To our best knowledge this is the first report that describes sequence determinants of allostery based on hydrophobicity. As an outcome of these findings, we plan to explore possibility of inducing allostery in proteins.
The Complete Nucleotide Sequence of the Mitochondrial Genome of Bactrocera minax (Diptera: Tephritidae)

PubMed Central

Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

2014-01-01

The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5′ end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the poly(T) stretch among the ten tephritid species and two tandem repeats were present in the CR. PMID:24964138
Comparative Genomics and Transcriptional Analysis of Prophages Identified in the Genomes of Lactobacillus gasseri, Lactobacillus salivarius, and Lactobacillus casei†

PubMed Central

Ventura, Marco; Canchaya, Carlos; Bernini, Valentina; Altermann, Eric; Barrangou, Rodolphe; McGrath, Stephen; Claesson, Marcus J.; Li, Yin; Leahy, Sinead; Walker, Carey D.; Zink, Ralf; Neviani, Erasmo; Steele, Jim; Broadbent, Jeff; Klaenhammer, Todd R.; Fitzgerald, Gerald F.; O'Toole, Paul W.; van Sinderen, Douwe

2006-01-01

Lactobacillus gasseri ATCC 33323, Lactobacillus salivarius subsp. salivarius UCC 118, and Lactobacillus casei ATCC 334 contain one (LgaI), four (Sal1, Sal2, Sal3, Sal4), and one (Lca1) distinguishable prophage sequences, respectively. Sequence analysis revealed that LgaI, Lca1, Sal1, and Sal2 prophages belong to the group of Sfi11-like pac site and cos site Siphoviridae, respectively. Phylogenetic investigation of these newly described prophage sequences revealed that they have not followed an evolutionary development similar to that of their bacterial hosts and that they show a high degree of diversity, even within a species. The attachment sites were determined for all these prophage elements; LgaI as well as Sal1 integrates in tRNA genes, while prophage Sal2 integrates in a predicted arginino-succinate lyase-encoding gene. In contrast, Lca1 and the Sal3 and Sal4 prophage remnants are integrated in noncoding regions in the L. casei ATCC 334 and L. salivarius UCC 118 genomes. Northern analysis showed that large parts of the prophage genomes are transcriptionally silent and that transcription is limited to genome segments located near the attachment site. Finally, pulsed-field gel electrophoresis followed by Southern blot hybridization with specific prophage probes indicates that these prophage sequences are narrowly distributed within lactobacilli. PMID:16672450
Phylogenetic and Functional Analysis of Metagenome Sequence from High-Temperature Archaeal Habitats Demonstrate Linkages between Metabolic Potential and Geochemistry

PubMed Central

Inskeep, William P.; Jay, Zackary J.; Herrgard, Markus J.; Kozubal, Mark A.; Rusch, Douglas B.; Tringe, Susannah G.; Macur, Richard E.; Jennings, Ryan deM.; Boyd, Eric S.; Spear, John R.; Roberto, Francisco F.

2013-01-01

Geothermal habitats in Yellowstone National Park (YNP) provide an unparalleled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze, and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (∼40–45 Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G + C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH). These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high-temperature systems of YNP. PMID:23720654
In and out of the rRNA genes: characterization of Pokey elements in the sequenced Daphnia genome

PubMed Central

2013-01-01

Background Only a few transposable elements are known to exhibit site-specific insertion patterns, including the well-studied R-element retrotransposons that insert into specific sites within the multigene rDNA. The only known rDNA-specific DNA transposon, Pokey (superfamily: piggyBac) is found in the freshwater microcrustacean, Daphnia pulex. Here, we present a genome-wide analysis of Pokey based on the recently completed whole genome sequencing project for D. pulex. Results Phylogenetic analysis of Pokey elements recovered from the genome sequence revealed the presence of four lineages corresponding to two divergent autonomous families and two related lineages of non-autonomous miniature inverted repeat transposable elements (MITEs). The MITEs are also found at the same 28S rRNA gene insertion site as the Pokey elements, and appear to have arisen as deletion derivatives of autonomous elements. Several copies of the full-length Pokey elements may be capable of producing an active transposase. Surprisingly, both families of Pokey possess a series of 200 bp repeats upstream of the transposase that is derived from the rDNA intergenic spacer (IGS). The IGS sequences within the Pokey elements appear to be evolving in concert with the rDNA units. Finally, analysis of the insertion sites of Pokey elements outside of rDNA showed a target preference for sites similar to the specific sequence that is targeted within rDNA. Conclusions Based on the target site preference of Pokey elements and the concerted evolution of a segment of the element with the rDNA unit, we propose an evolutionary path by which the ancestors of Pokey elements have invaded the rDNA niche. We discuss how specificity for the rDNA unit may have evolved and how this specificity has played a role in the long-term survival of these elements in the subgenus Daphnia. PMID:24059783
Application of the High Resolution Melting analysis for genetic mapping of Sequence Tagged Site markers in narrow-leafed lupin (Lupinus angustifolius L.).

PubMed

Kamel, Katarzyna A; Kroc, Magdalena; Święcicki, Wojciech

2015-01-01

Sequence tagged site (STS) markers are valuable tools for genetic and physical mapping that can be successfully used in comparative analyses among related species. Current challenges for molecular markers genotyping in plants include the lack of fast, sensitive and inexpensive methods suitable for sequence variant detection. In contrast, high resolution melting (HRM) is a simple and high-throughput assay, which has been widely applied in sequence polymorphism identification as well as in the studies of genetic variability and genotyping. The present study is the first attempt to use the HRM analysis to genotype STS markers in narrow-leafed lupin (Lupinus angustifolius L.). The sensitivity and utility of this method was confirmed by the sequence polymorphism detection based on melting curve profiles in the parental genotypes and progeny of the narrow-leafed lupin mapping population. Application of different approaches, including amplicon size and a simulated heterozygote analysis, has allowed for successful genetic mapping of 16 new STS markers in the narrow-leafed lupin genome.
Comparative analysis and molecular characterization of a gene BANF1 encoded a DNA-binding protein during mitosis from the Giant Panda and Black Bear.

PubMed

Zeng, Yichun; Hou, Yi-Ling; Ding, Xiang; Hou, Wan-Ru; Li, Jian

2014-01-01

Barrier to autointegration factor 1 (BANF1) is a DNA-binding protein found in the nucleus and cytoplasm of eukaryotic cells that functions to establish nuclear architecture during mitosis. The cDNA and the genomic sequence of BANF1 were cloned from the Giant Panda (Ailuropoda melanoleuca) and Black Bear (Ursus thibetanus mupinensis) using RT-PCR technology and Touchdown-PCR, respectively. The cDNA of the BANF1 cloned from Giant Panda and Black Bear is 297 bp in size, containing an open reading frame of 270 bp encoding 89 amino acids. The length of the genomic sequence from Giant Panda is 521 bp, from Black Bear is 536 bp, which were found both to possess 2 exons. Alignment analysis indicated that the nucleotide sequence and the deduced amino acid sequence are highly conserved to some mammalian species studied. Topology prediction showed there is one Protein kinase C phosphorylation site, one Casein kinase II phosphorylation site, one Tyrosine kinase phosphorylation site, one N-myristoylation site, and one Amidation site in the BANF1 protein of the Giant Panda, and there is one Protein kinase C phosphorylation site, one Tyrosine kinase phosphorylation site, one N-myristoylation site, and one Amidation site in the BANF1 protein of the Black Bear. The BANF1 gene can be readily expressed in E. coli. Results showed that the protein BANF1 fusion with the N-terminally His-tagged form gave rise to the accumulation of an expected 14 kD polypeptide that formed inclusion bodies. The expression products obtained could be used to purify the proteins and study their function further.
VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites.

PubMed

Spinozzi, Giulio; Calabria, Andrea; Brasca, Stefano; Beretta, Stefano; Merelli, Ivan; Milanesi, Luciano; Montini, Eugenio

2017-11-25

Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time. Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).

Diversity and Characterization of Sulfate-Reducing Bacteria in Groundwater at a Uranium Mill Tailings Site

PubMed Central

Chang, Yun-Juan; Peacock, Aaron D.; Long, Philip E.; Stephen, John R.; McKinley, James P.; Macnaughton, Sarah J.; Hussain, A. K. M. Anwar; Saxton, Arnold M.; White, David C.

2001-01-01

Microbially mediated reduction and immobilization of U(VI) to U(IV) plays a role in both natural attenuation and accelerated bioremediation of uranium-contaminated sites. To realize bioremediation potential and accurately predict natural attenuation, it is important to first understand the microbial diversity of such sites. In this paper, the distribution of sulfate-reducing bacteria (SRB) in contaminated groundwater associated with a uranium mill tailings disposal site at Shiprock, N.Mex., was investigated. Two culture-independent analyses were employed: sequencing of clone libraries of PCR-amplified dissimilatory sulfite reductase (DSR) gene fragments and phospholipid fatty acid (PLFA) biomarker analysis. A remarkable diversity among the DSR sequences was revealed, including sequences from δ-Proteobacteria, gram-positive organisms, and the Nitrospira division. PLFA analysis detected at least 52 different mid-chain-branched saturate PLFA and included a high proportion of 10me16:0. Desulfotomaculum and Desulfotomaculum-like sequences were the most dominant DSR genes detected. Those belonging to SRB within δ-Proteobacteria were mainly recovered from low-uranium (≤302 ppb) samples. One Desulfotomaculum-like sequence cluster overwhelmingly dominated high-U (>1,500 ppb) sites. Logistic regression showed a significant influence of uranium concentration over the dominance of this cluster of sequences (P = 0.0001). This strong association indicates that Desulfotomaculum has remarkable tolerance and adaptation to high levels of uranium and suggests the organism's possible involvement in natural attenuation of uranium. The in situ activity level of Desulfotomaculum in uranium-contaminated environments and its comparison to the activities of other SRB and other functional groups should be an important area for future research. PMID:11425735
Tn5401, a new class II transposable element from Bacillus thuringiensis.

PubMed Central

Baum, J A

1994-01-01

A new class II (Tn3-like) transposable element, designated Tn5401, was recovered from a sporulation-deficient variant of Bacillus thuringiensis subsp. morrisoni EG2158 following its insertion into a recombinant plasmid. Sequence analysis of the insert revealed a 4,837-bp transposon with two large open reading frames, in the same orientation, encoding proteins of 36 kDa (306 residues) and 116 kDa (1,005 residues) and 53-bp terminal inverted repeats. The deduced amino acid sequence for the 36-kDa protein shows 24% sequence identity with the TnpI recombinase of the B. thuringiensis transposon Tn4430, a member of the phage integrase family of site-specific recombinases. The deduced amino acid sequence for the 116-kDa protein shows 42% sequence identity with the transposase of Tn3 but only 28% identity with the TnpA transposase of Tn4430. Two small open reading frames of unknown function, designated orf1 (85 residues) and orf2 (74 residues), were also identified. Southern blot analysis indicated that Tn5401, in contrast to Tn4430, is not commonly found among different subspecies of B. thuringiensis and is not typically associated with known insecticidal crystal protein genes. Transposition was studied with B. thuringiensis by using plasmid pEG922, a temperature-sensitive shuttle vector containing Tn5401. Tn5401 transposed to both chromosomal and plasmid target sites but displayed an apparent preference for plasmid sites. Transposition was replicative and resulted in the generation of a 5-bp duplication at the target site. Transcriptional start sites within Tn5401 were mapped by primer extension analysis. Two promoters, designated PL and PR, direct the transcription of orf1-orf2 and tnpI-tnpA, respectively, and are negatively regulated by TnpI. Sequence comparison of the promoter regions of Tn5401 and Tn4430 suggests that the conserved sequence element ATGTCCRCTAAY mediates TnpI binding and cointegrate resolution. The same element is contained within the 53-bp terminal inverted repeats, thus accounting for their unusual lengths and suggesting an additional role for TnpI in regulating Tn5401 transposition. Images PMID:7514590
Identification of Genomic Insertion and Flanking Sequence of G2-EPSPS and GAT Transgenes in Soybean Using Whole Genome Sequencing Method.

PubMed

Guo, Bingfu; Guo, Yong; Hong, Huilong; Qiu, Li-Juan

2016-01-01

Molecular characterization of sequence flanking exogenous fragment insertion is essential for safety assessment and labeling of genetically modified organism (GMO). In this study, the T-DNA insertion sites and flanking sequences were identified in two newly developed transgenic glyphosate-tolerant soybeans GE-J16 and ZH10-6 based on whole genome sequencing (WGS) method. More than 22.4 Gb sequence data (∼21 × coverage) for each line was generated on Illumina HiSeq 2500 platform. The junction reads mapped to boundaries of T-DNA and flanking sequences in these two events were identified by comparing all sequencing reads with soybean reference genome and sequence of transgenic vector. The putative insertion loci and flanking sequences were further confirmed by PCR amplification, Sanger sequencing, and co-segregation analysis. All these analyses supported that exogenous T-DNA fragments were integrated in positions of Chr19: 50543767-50543792 and Chr17: 7980527-7980541 in these two transgenic lines. Identification of genomic insertion sites of G2-EPSPS and GAT transgenes will facilitate the utilization of their glyphosate-tolerant traits in soybean breeding program. These results also demonstrated that WGS was a cost-effective and rapid method for identifying sites of T-DNA insertions and flanking sequences in soybean.
Molecular Cloning and Sequence Analysis of a Phenylalanine Ammonia-Lyase Gene from Dendrobium

PubMed Central

Cai, Yongping; Lin, Yi

2013-01-01

In this study, a phenylalanine ammonia-lyase (PAL) gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748) has 2,458 bps and contains a complete open reading frame (ORF) of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum. PMID:23638048
Identification of succinimide sites in proteins by N-terminal sequence analysis after alkaline hydroxylamine cleavage.

PubMed Central

Kwong, M. Y.; Harris, R. J.

1994-01-01

Under favorable conditions, Asp or Asn residues can undergo rearrangement to a succinimide (cyclic imide), which may also serve as an intermediate for deamidation and/or isoaspartate formation. Direct identification of such succinimides by peptide mapping is hampered by their lability at neutral and alkaline pH. We determined that incubation in 2 M hydroxylamine, 0.2 M Tris buffer, pH 9, for 2 h at 45 degrees C will specifically cleave on the C-terminal side of succinimides without cleavage at Asn-Gly bonds; yields are typically approximately 50%. N-terminal sequence analysis can then be used to identify an internal sequence generated by cleavage of the succinimide, hence identifying the succinimide site. PMID:8142891
Structure of the human gene encoding the protein repair L-isoaspartyl (D-aspartyl) O-methyltransferase.

PubMed

DeVry, C G; Tsai, W; Clarke, S

1996-11-15

The protein L-isoaspartyl/D-aspartyl O-methyltransferase (EC 2.1.1.77) catalyzes the first step in the repair of proteins damaged in the aging process by isomerization or racemization reactions at aspartyl and asparaginyl residues. A single gene has been localized to human chromosome 6 and multiple transcripts arising through alternative splicing have been identified. Restriction enzyme mapping, subcloning, and DNA sequence analysis of three overlapping clones from a human genomic library in bacteriophage P1 indicate that the gene spans approximately 60 kb and is composed of 8 exons interrupted by 7 introns. Analysis of intron/exon splice junctions reveals that all of the donor and acceptor splice sites are in agreement with the mammalian consensus splicing sequence. Determination of transcription initiation sites by primer extension analysis of poly(A)+ mRNA from human brain identifies multiple start sites, with a major site 159 nucleotides upstream from the ATG start codon. Sequence analysis of the 5'-untranslated region demonstrates several potential cis-acting DNA elements including SP1, ETF, AP1, AP2, ARE, XRE, CREB, MED-1, and half-palindromic ERE motifs. The promoter of this methyltransferase gene lacks an identifiable TATA box but is characterized by a CpG island which begins approximately 723 nucleotides upstream of the major transcriptional start site and extends through exon 1 and into the first intron. These features are characteristic of housekeeping genes and are consistent with the wide tissue distribution observed for this methyltransferase activity.
Novel and canine genotypes of Giardia duodenalis in harbor seals ( Phoca vitulina richardsi).

PubMed

Gaydos, J K; Miller, W A; Johnson, C; Zornetzer, H; Melli, A; Packham, A; Jeffries, S J; Lance, M M; Conrad, P A

2008-12-01

Feces of harbor seals (Phoca vitulina richardsi) and hybrid glaucous-winged/western gulls (Larus glaucescens / occidentalis) from Washington State's inland marine waters were examined for Giardia and Cryptosporidium spp. to determine if genotypes carried by these wildlife species were the same genotypes that commonly infect humans and domestic animals. Using immunomagnetic separation followed by direct fluorescent antibody detection, Giardia spp. cysts were detected in 42% of seal fecal samples (41/97). Giardia-positive samples came from 90% of the sites (9/10) and the prevalence of positive seal fecal samples differed significantly among study sites. Fecal samples collected from seal haulout sites with over 400 animals were 4.7 times more likely to have Giardia spp. cysts than samples collected at smaller haulout sites. In gulls, a single Giardia sp. cyst was detected in 4% of fecal samples (3/78). Cryptosporidium spp. oocysts were not detected in any of the seals or gulls tested. Sequence analysis of a 398 bp segment of G. duodenalis DNA at the glutamate dehydrogenase locus suggested that 11 isolates originating from seals throughout the region were a novel genotype and 3 isolates obtained from a single site in south Puget Sound were the G. duodenalis canine genotype D. Real-time TaqMan PCR amplification and subsequent sequencing of a 52 bp small subunit ribosomal DNA region from novel harbor seal genotype isolates showed sequence homology to canine genotypes C and D. Sequence analysis of the 52 bp small subunit ribosomal DNA products from the 3 canine genotype isolates from seals produced mixed sequences at could not be evaluated.
Signal peptide discrimination and cleavage site identification using SVM and NN.

PubMed

Kazemian, H B; Yusuf, S A; White, K

2014-02-01

About 15% of all proteins in a genome contain a signal peptide (SP) sequence, at the N-terminus, that targets the protein to intracellular secretory pathways. Once the protein is targeted correctly in the cell, the SP is cleaved, releasing the mature protein. Accurate prediction of the presence of these short amino-acid SP chains is crucial for modelling the topology of membrane proteins, since SP sequences can be confused with transmembrane domains due to similar composition of hydrophobic amino acids. This paper presents a cascaded Support Vector Machine (SVM)-Neural Network (NN) classification methodology for SP discrimination and cleavage site identification. The proposed method utilises a dual phase classification approach using SVM as a primary classifier to discriminate SP sequences from Non-SP. The methodology further employs NNs to predict the most suitable cleavage site candidates. In phase one, a SVM classification utilises hydrophobic propensities as a primary feature vector extraction using symmetric sliding window amino-acid sequence analysis for discrimination of SP and Non-SP. In phase two, a NN classification uses asymmetric sliding window sequence analysis for prediction of cleavage site identification. The proposed SVM-NN method was tested using Uni-Prot non-redundant datasets of eukaryotic and prokaryotic proteins with SP and Non-SP N-termini. Computer simulation results demonstrate an overall accuracy of 0.90 for SP and Non-SP discrimination based on Matthews Correlation Coefficient (MCC) tests using SVM. For SP cleavage site prediction, the overall accuracy is 91.5% based on cross-validation tests using the novel SVM-NN model. © 2013 Published by Elsevier Ltd.
Identification of a sequence element on the 3' side of AAUAAA which is necessary for simian virus 40 late mRNA 3'-end processing.

PubMed Central

Sadofsky, M; Connelly, S; Manley, J L; Alwine, J C

1985-01-01

Our previous studies of the 3'-end processing of simian virus 40 late mRNAs indicated the existence of an essential element (or elements) downstream of the AAUAAA signal. We report here the use of transient expression analysis to study a functional element which we located within the sequence AGGUUUUUU, beginning 59 nucleotides downstream of the recognized signal AAUAAA. Deletion of this element resulted in (i) at least a 75% drop in 3'-end processing at the normal site and (ii) appearance of readthrough transcripts with alternate 3' ends. Some flexibility in the downstream position of this element relative to the AAUAAA was noted by deletion analysis. Using computer sequence comparison, we located homologous regions within downstream sequences of other genes, suggesting a generalized sequence element. In addition, specific complementarity is noted between the downstream element and U4 RNA. The possibility that this complementarity could participate in 3'-end site selection is discussed. Images PMID:3016512
Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir.

PubMed

Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat

2017-07-01

The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.
Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis

PubMed Central

Moore, Michael; Zhang, Chaolin; Gantman, Emily Conn; Mele, Aldo; Darnell, Jennifer C.; Darnell, Robert B.

2014-01-01

Summary Identifying sites where RNA binding proteins (RNABPs) interact with target RNAs opens the door to understanding the vast complexity of RNA regulation. UV-crosslinking and immunoprecipitation (CLIP) is a transformative technology in which RNAs purified from in vivo cross-linked RNA-protein complexes are sequenced to reveal footprints of RNABP:RNA contacts. CLIP combined with high throughput sequencing (HITS-CLIP) is a generalizable strategy to produce transcriptome-wide RNA binding maps with higher accuracy and resolution than standard RNA immunoprecipitation (RIP) profiling or purely computational approaches. Applying CLIP to Argonaute proteins has expanded the utility of this approach to mapping binding sites for microRNAs and other small regulatory RNAs. Finally, recent advances in data analysis take advantage of crosslinked-induced mutation sites (CIMS) to refine RNA-binding maps to single-nucleotide resolution. Once IP conditions are established, HITS-CLIP takes approximately eight days to prepare RNA for sequencing. Established pipelines for data analysis, including for CIMS, take 3-4 days. PMID:24407355
Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

Treesearch

M.N. lslam-Faridi; C.D. Nelson; S.P. DiFazio; L.E. Gunter; G.A. Tuskan

2009-01-01

The 185-285 rDNA and 55 rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 185-285 rDNA sites and one 55 rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis-type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones...
Structural analysis of DNA binding by C.Csp231I, a member of a novel class of R-M controller proteins regulating gene expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shevtsov, M. B.; Streeter, S. D.; Thresh, S.-J.

2015-02-01

The structure of the new class of controller proteins (exemplified by C.Csp231I) in complex with its 21 bp DNA-recognition sequence is presented, and the molecular basis of sequence recognition in this class of proteins is discussed. An unusual extended spacer between the dimer binding sites suggests a novel interaction between the two C-protein dimers. In a wide variety of bacterial restriction–modification systems, a regulatory ‘controller’ protein (or C-protein) is required for effective transcription of its own gene and for transcription of the endonuclease gene found on the same operon. We have recently turned our attention to a new class ofmore » controller proteins (exemplified by C.Csp231I) that have quite novel features, including a much larger DNA-binding site with an 18 bp (∼60 Å) spacer between the two palindromic DNA-binding sequences and a very different recognition sequence from the canonical GACT/AGTC. Using X-ray crystallography, the structure of the protein in complex with its 21 bp DNA-recognition sequence was solved to 1.8 Å resolution, and the molecular basis of sequence recognition in this class of proteins was elucidated. An unusual aspect of the promoter sequence is the extended spacer between the dimer binding sites, suggesting a novel interaction between the two C-protein dimers when bound to both recognition sites correctly spaced on the DNA. A U-bend model is proposed for this tetrameric complex, based on the results of gel-mobility assays, hydrodynamic analysis and the observation of key contacts at the interface between dimers in the crystal.« less
Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships.

PubMed

Gold, Nicola D; Jackson, Richard M

2006-02-03

The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins

PubMed Central

Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie

2005-01-01

Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486
Analysis of Facultative Lithotroph Distribution and Diversity on Volcanic Deposits by Use of the Large Subunit of Ribulose 1,5-Bisphosphate Carboxylase/Oxygenase†

PubMed Central

Nanba, K.; King, G. M.; Dunfield, K.

2004-01-01

A 492- to 495-bp fragment of the gene coding for the large subunit of the form I ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO) (rbcL) was amplified by PCR from facultatively lithotrophic aerobic CO-oxidizing bacteria, colorless and purple sulfide-oxidizing microbial mats, and genomic DNA extracts from tephra and ash deposits from Kilauea volcano, for which atmospheric CO and hydrogen have been previously documented as important substrates. PCR products from the mats and volcanic sites were used to construct rbcL clone libraries. Phylogenetic analyses showed that the rbcL sequences from all isolates clustered with form IC rbcL sequences derived from facultative lithotrophs. In contrast, the microbial mat clone sequences clustered with sequences from obligate lithotrophs representative of form IA rbcL. Clone sequences from volcanic sites fell within the form IC clade, suggesting that these sites were dominated by facultative lithotrophs, an observation consistent with biogeochemical patterns at the sites. Based on phylogenetic and statistical analyses, clone libraries differed significantly among volcanic sites, indicating that they support distinct lithotrophic assemblages. Although some of the clone sequences were similar to known rbcL sequences, most were novel. Based on nucleotide diversity and average pairwise difference, a forested site and an 1894 lava flow were found to support the most diverse and least diverse lithotrophic populations, respectively. These indices of diversity were not correlated with rates of atmospheric CO and hydrogen uptake but were correlated with estimates of respiration and microbial biomass. PMID:15066819
Analysis of facultative lithotroph distribution and diversity on volcanic deposits by use of the large subunit of ribulose 1,5-bisphosphate carboxylase/oxygenase.

PubMed

Nanba, K; King, G M; Dunfield, K

2004-04-01

A 492- to 495-bp fragment of the gene coding for the large subunit of the form I ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO) (rbcL) was amplified by PCR from facultatively lithotrophic aerobic CO-oxidizing bacteria, colorless and purple sulfide-oxidizing microbial mats, and genomic DNA extracts from tephra and ash deposits from Kilauea volcano, for which atmospheric CO and hydrogen have been previously documented as important substrates. PCR products from the mats and volcanic sites were used to construct rbcL clone libraries. Phylogenetic analyses showed that the rbcL sequences from all isolates clustered with form IC rbcL sequences derived from facultative lithotrophs. In contrast, the microbial mat clone sequences clustered with sequences from obligate lithotrophs representative of form IA rbcL. Clone sequences from volcanic sites fell within the form IC clade, suggesting that these sites were dominated by facultative lithotrophs, an observation consistent with biogeochemical patterns at the sites. Based on phylogenetic and statistical analyses, clone libraries differed significantly among volcanic sites, indicating that they support distinct lithotrophic assemblages. Although some of the clone sequences were similar to known rbcL sequences, most were novel. Based on nucleotide diversity and average pairwise difference, a forested site and an 1894 lava flow were found to support the most diverse and least diverse lithotrophic populations, respectively. These indices of diversity were not correlated with rates of atmospheric CO and hydrogen uptake but were correlated with estimates of respiration and microbial biomass.
Deep Sequencing of Random Mutant Libraries Reveals the Active Site of the Narrow Specificity CphA Metallo-β-Lactamase is Fragile to Mutations.

PubMed

Sun, Zhizeng; Mehta, Shrenik C; Adamski, Carolyn J; Gibbs, Richard A; Palzkill, Timothy

2016-09-12

CphA is a Zn(2+)-dependent metallo-β-lactamase that efficiently hydrolyzes only carbapenem antibiotics. To understand the sequence requirements for CphA function, single codon random mutant libraries were constructed for residues in and near the active site and mutants were selected for E. coli growth on increasing concentrations of imipenem, a carbapenem antibiotic. At high concentrations of imipenem that select for phenotypically wild-type mutants, the active-site residues exhibit stringent sequence requirements in that nearly all residues in positions that contact zinc, the substrate, or the catalytic water do not tolerate amino acid substitutions. In addition, at high imipenem concentrations a number of residues that do not directly contact zinc or substrate are also essential and do not tolerate substitutions. Biochemical analysis confirmed that amino acid substitutions at essential positions decreased the stability or catalytic activity of the CphA enzyme. Therefore, the CphA active - site is fragile to substitutions, suggesting active-site residues are optimized for imipenem hydrolysis. These results also suggest that resistance to inhibitors targeted to the CphA active site would be slow to develop because of the strong sequence constraints on function.
msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.

PubMed

Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James

2018-02-01

Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.

Genomic sequences of murine gamma B- and gamma C-crystallin-encoding genes: promoter analysis and complete evolutionary pattern of mouse, rat and human gamma-crystallins.

PubMed

Graw, J; Liebstein, A; Pietrowski, D; Schmitt-John, T; Werner, T

1993-12-22

The murine genes, gamma B-cry and gamma C-cry, encoding the gamma B- and gamma C-crystallins, were isolated from a genomic DNA library. The complete nucleotide (nt) sequences of both genes were determined from 661 and 711 bp, respectively, upstream from the first exon to the corresponding polyadenylation sites, comprising more than 2650 and 2890 bp, respectively. The new sequences were compared to the partial cDNA sequences available for the murine gamma B-cry and gamma C-cry, as well as to the corresponding genomic sequences from rat and man, at both the nt and predicted amino acid (aa) sequence levels. In the gamma B-cry promoter region, a canonical CCAAT-box, a TATA-box, putative NF-I and C/EBP sites were detected. An R-repeat is inserted 366 bp upstream from the transcription start point. In contrast, the gamma C-cry promoter does not contain a CCAAT-box, but some other putative binding sites for transcription factors (AP-2, UBP-1, LBP-1) were located by computer analysis. The promoter regions of all six gamma-cry from mouse, rat and human, except human psi gamma F-cry, were analyzed for common sequence elements. A complex sequence element of about 70-80 bp was found in the proximal promoter, which contains a gamma-cry-specific and almost invariant sequence (crygpel) of 14 nt, and ends with the also invariant TATA-box. Within the complex sequence element, a minimum of three further features specific for the gamma A-, gamma B- and gamma D/E/F-cry genes can be defined, at least two of which were recently shown to be functional. In addition to these four sequence elements, a subtype-specific structure of inverted repeats with different-sized spacers can be deduced from the multiple sequence alignment. A phylogenetic analysis based on the promoter region, as well as the complete exon 3 of all gamma-cry from mouse, rat and man, suggests separation of only five gamma-cry subtypes (gamma A-, gamma B-, gamma C-, gamma D- and gamma E/F-cry) prior to species separation.
Architecture of a Fur Binding Site: a Comparative Analysis

PubMed Central

Lavrrar, Jennifer L.; McIntosh, Mark A.

2003-01-01

Fur is an iron-binding transcriptional repressor that recognizes a 19-bp consensus site of the sequence 5′-GATAATGATAATCATTATC-3′. This site can be defined as three adjacent hexamers of the sequence 5′-GATAAT-3′, with the third being slightly imperfect (an F-F-F configuration), or as two hexamers in the forward orientation separated by one base pair from a third hexamer in the reverse orientation (an F-F-x-R configuration). Although Fur can bind synthetic DNA sequences containing the F-F-F arrangement, most natural binding sites are variations of the F-F-x-R arrangement. The studies presented here compared the ability of Fur to recognize synthetic DNA sequences containing two to four adjacent hexamers with binding to sequences containing variations of the F-F-x-R arrangement (including natural operator sequences from the entS and fepB promoter regions of Escherichia coli). Gel retardation assays showed that the F-F-x-R architecture was necessary for high-affinity Fur-DNA interactions and that contiguous hexamers were not recognized as effectively. In addition, the stoichiometry of Fur at each binding site was determined, showing that Fur interacted with its minimal 19-bp binding site as two overlapping dimers. These data confirm the proposed overlapping-dimer binding model, where the unit of interaction with a single Fur dimer is two inverted hexamers separated by a C:G base pair, with two overlapping units comprising the 19-bp consensus binding site required for the high-affinity interaction with two Fur dimers. PMID:12644489
Sequencing of the amylopullulanase (apu) gene of Thermoanaerobacter ethanolicus 39E, and identification of the active site by site-directed mutagenesis.

PubMed

Mathupala, S P; Lowe, S E; Podkovyrov, S M; Zeikus, J G

1993-08-05

The complete nucleotide sequence of the gene encoding the dual active amylopullulanase of Thermoanaerobacter ethanolicus 39E (formerly Clostridium thermohydrosulfuricum) was determined. The structural gene (apu) contained a single open reading frame 4443 base pairs in length, corresponding to 1481 amino acids, with an estimated molecular weight of 162,780. Analysis of the deduced sequence of apu with sequences of alpha-amylases and alpha-1,6 debranching enzymes enabled the identification of four conserved regions putatively involved in substrate binding and in catalysis. The conserved regions were localized within a 2.9-kilobase pair gene fragment, which encoded a M(r) 100,000 protein that maintained the dual activities and thermostability of the native enzyme. The catalytic residues of amylopullulanase were tentatively identified by using hydrophobic cluster analysis for comparison of amino acid sequences of amylopullulanase and other amylolytic enzymes. Asp597, Glu626, and Asp703 were individually modified to their respective amide form, or the alternate acid form, and in all cases both alpha-amylase and pullulanase activities were lost, suggesting the possible involvement of 3 residues in a catalytic triad, and the presence of a putative single catalytic site within the enzyme. These findings substantiate amylopullulanase as a new type of amylosaccharidase.
Cloning and sequence analysis of a cDNA encoding the alpha-subunit of mouse beta-N-acetylhexosaminidase and comparison with the human enzyme.

PubMed Central

Beccari, T; Hoade, J; Orlacchio, A; Stirling, J L

1992-01-01

cDNAs encoding the mouse beta-N-acetylhexosaminidase alpha-subunit were isolated from a mouse testis library. The longest of these (1.7 kb) was sequenced and showed 83% similarity with the human alpha-subunit cDNA sequence. The 5' end of the coding sequence was obtained from a genomic DNA clone. Alignment of the human and mouse sequences showed that all three putative N-glycosylation sites are conserved, but that the mouse alpha-subunit has an additional site towards the C-terminus. All eight cysteines in the human sequence are conserved in the mouse. There are an additional two cysteines in the mouse alpha-subunit signal peptide. All amino acids affected in Tay-Sachs-disease mutations are conserved in the mouse. Images Fig. 1. PMID:1379046
Direct repeat sequences in the Streptomyces chitinase-63 promoter direct both glucose repression and chitin induction

PubMed Central

Ni, Xiangyang; Westpheling, Janet

1997-01-01

The chi63 promoter directs glucose-sensitive, chitin-dependent transcription of a gene involved in the utilization of chitin as carbon source. Analysis of 5′ and 3′ deletions of the promoter region revealed that a 350-bp segment is sufficient for wild-type levels of expression and regulation. The analysis of single base changes throughout the promoter region, introduced by random and site-directed mutagenesis, identified several sequences to be important for activity and regulation. Single base changes at −10, −12, −32, −33, −35, and −37 upstream of the transcription start site resulted in loss of activity from the promoter, suggesting that bases in these positions are important for RNA polymerase interaction. The sequences centered around −10 (TATTCT) and −35 (TTGACC) in this promoter are, in fact, prototypical of eubacterial promoters. Overlapping the RNA polymerase binding site is a perfect 12-bp direct repeat sequence. Some base changes within this direct repeat resulted in constitutive expression, suggesting that this sequence is an operator for negative regulation. Other base changes resulted in loss of glucose repression while retaining the requirement for chitin induction, suggesting that this sequence is also involved in glucose repression. The fact that cis-acting mutations resulted in glucose resistance but not inducer independence rules out the possibility that glucose repression acts exclusively by inducer exclusion. The fact that mutations that affect glucose repression and chitin induction fall within the same direct repeat sequence module suggests that the direct repeat sequence facilitates both chitin induction and glucose repression. PMID:9371809
[Study on ITS sequences of Aconitum vilmorinianum and its medicinal adulterant].

PubMed

Zhang, Xiao-nan; Du, Chun-hua; Fu, De-huan; Gao, Li; Zhou, Pei-jun; Wang, Li

2012-09-01

To analyze and compare the ITS sequences of Aconitum vilmorinianum and its medicinal adulterant Aconitum austroyunnanense. Total genomic DNA were extracted from sample materials by improved CTAB method, ITS sequences of samples were amplified using PCR systems, directly sequenced and analyzed using software DNAStar, ClustalX1.81 and MEGA 4.0. 299 consistent sites, 19 variable sites and 13 informative sites were found in ITS1 sequences, 162 consistent sites, 2 variable sites and 1 informative sites were found in 5.8S sequences, 217 consistent sites, 3 variable sites and 1 informative site were found in ITS2 sequences. Base transition and transversion was not found only in 5.8S sequences, 2 sites transition and 1 site transversion were found in ITS1 sequences, only 1 site transversion was found in ITS2 sequences comparting the ITS sequences data matrix. By analyzing the ITS sequences data matrix from 2 population of Aconitum vilmorinianum and 3 population of Aconitum austroyunnanense, we found a stable informative site at the 596th base in ITS2 sequences, in all the samples of Aconitum vilmorinianum the base was C, and in all the samples of Aconitum austroyunnanense the base was A. Aconitum vilmorinianum and Aconitum austroyunnanense can be identified by their characters of ITS sequences, and the variable sites in ITS1 sequences are more than in ITS2 sequences.
[Sequencing and analysis of the complete genome of a rabies virus isolate from Sika deer].

PubMed

Zhao, Yun-Jiao; Guo, Li; Huang, Ying; Zhang, Li-Shi; Qian, Ai-Dong

2008-05-01

One DRV strain was isolated from Sika Deer brain and sequenced. Nine overlapped gene fragments were amplified by RT-PCR through 3'-RACE and 5'-RACE method, and the complete DRV genome sequence was assembled. The length of the complete genome is 11863bp. The DRV genome organization was similar to other rabies viruses which were composed of five genes and the initiation sites and termination sites were highly conservative. There were mutated amino acids in important antigen sites of nucleoprotein and glycoprotein. The nucleotide and amino acid homologies of gene N, P, M, G, L in strains with completed genomie sequencing were compared. Compared with N gene sequence of other typical rabies viruses, a phylogenetic tree was established . These results indicated that DRV belonged to gene type 1. The highest homology compared with Chinese vaccine strain 3aG was 94%, and the lowest was 71% compared with WCBV. These findings provided theoretical reference for further research in rabies virus.
In vivo binding of PRDM9 reveals interactions with noncanonical genomic sites

PubMed Central

Grey, Corinne; Clément, Julie A.J.; Buard, Jérôme; Leblanc, Benjamin; Gut, Ivo; Gut, Marta; Duret, Laurent

2017-01-01

In mouse and human meiosis, DNA double-strand breaks (DSBs) initiate homologous recombination and occur at specific sites called hotspots. The localization of these sites is determined by the sequence-specific DNA binding domain of the PRDM9 histone methyl transferase. Here, we performed an extensive analysis of PRDM9 binding in mouse spermatocytes. Unexpectedly, we identified a noncanonical recruitment of PRDM9 to sites that lack recombination activity and the PRDM9 binding consensus motif. These sites include gene promoters, where PRDM9 is recruited in a DSB-dependent manner. Another subset reveals DSB-independent interactions between PRDM9 and genomic sites, such as the binding sites for the insulator protein CTCF. We propose that these DSB-independent sites result from interactions between hotspot-bound PRDM9 and genomic sequences located on the chromosome axis. PMID:28336543
Interactive computer programs for the graphic analysis of nucleotide sequence data.

PubMed Central

Luckow, V A; Littlewood, R K; Rownd, R H

1984-01-01

A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437
Theory on the mechanism of site-specific DNA-protein interactions in the presence of traps

NASA Astrophysics Data System (ADS)

Niranjani, G.; Murugan, R.

2016-08-01

The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites (SBSs). It is an intriguing question how the naturally occurring TFs and their SBSs are designed to manage the retarding effects of such randomly occurring traps. We develop a simple random walk model on the site-specific binding of TFs with genomic DNA in the presence of sequence traps. Our dynamical model predicts that (a) the retarding effects of traps will be minimum when the traps are arranged around the SBS such that there is a negative correlation between the binding strength of TFs with traps and the distance of traps from the SBS and (b) the retarding effects of sequence traps can be appeased by the condensed conformational state of DNA. Our computational analysis results on the distribution of sequence traps around the putative binding sites of various TFs in mouse and human genome clearly agree well the theoretical predictions. We propose that the distribution of traps can be used as an additional metric to efficiently identify the SBSs of TFs on genomic DNA.
Diazotrophic Community Structure and Function in Two Successional Stages of Biological Soil Crusts from the Colorado Plateau and Chihuahuan Desert

USGS Publications Warehouse

Yeager, C.M.; Kornosky, J.L.; Housman, D.C.; Grote, E.E.; Belnap, J.; Kuske, C.R.

2004-01-01

The objective of this study was to characterize the community structure and activity of N2-fixing microorganisms in mature and poorly developed biological soil crusts from both the Colorado Plateau and Chihuahuan Desert. Nitrogenase activity was approximately 10 and 2.5 times higher in mature crusts than in poorly developed crusts at the Colorado Plateau site and Chihuahuan Desert site, respectively. Analysis of nifH sequences by clone sequencing and the terminal restriction fragment length polymorphism technique indicated that the crust diazotrophic community was 80 to 90% heterocystous cyanobacteria most closely related to Nostoc spp. and that the composition of N2-fixing species did not vary significantly between the poorly developed and mature crusts at either site. In contrast, the abundance of nifH sequences was approximately 7.5 times greater (per microgram of total DNA) in mature crusts than in poorly developed crusts at a given site as measured by quantitative PCR. 16S rRNA gene clone sequencing and microscopic analysis of the cyanobacterial community within both crust types demonstrated a transition from a Microcoleus vaginatus-dominated, poorly developed crust to mature crusts harboring a greater percentage of Nostoc and Scytonema spp. We hypothesize that ecological factors, such as soil instability and water stress, may constrain the growth of N2-fixing microorganisms at our study sites and that the transition to a mature, nitrogen-producing crust initially requires bioengineering of the surface microenvironment by Microcoleus vaginatus.
Microbial Community Structure and Diversity in an Integrated System of Anaerobic-Aerobic Reactors and a Constructed Wetland for the Treatment of Tannery Wastewater in Modjo, Ethiopia

PubMed Central

Desta, Adey Feleke; Assefa, Fassil; Leta, Seyoum; Stomeo, Francesca; Wamalwa, Mark; Njahira, Moses; Appolinaire, Djikeng

2014-01-01

A culture-independent approach was used to elucidate the microbial diversity and structure in the anaerobic-aerobic reactors integrated with a constructed wetland for the treatment of tannery wastewater in Modjo town, Ethiopia. The system has been running with removal efficiencies ranging from 94%–96% for COD, 91%–100% for SO42- and S2-, 92%–94% for BOD, 56%–82% for total Nitrogen and 2%–90% for NH3-N. 16S rRNA gene clone libraries were constructed and microbial community assemblies were determined by analysis of a total of 801 unique clone sequences from all the sites. Operational Taxonomic Unit (OTU) - based analysis of the sequences revealed highly diverse communities in each of the reactors and the constructed wetland. A total of 32 phylotypes were identified with the dominant members affiliated to Clostridia (33%), Betaproteobacteria (10%), Bacteroidia (10%), Deltaproteobacteria (9%) and Gammaproteobacteria (6%). Sequences affiliated to the class Clostridia were the most abundant across all sites. The 801 sequences were assigned to 255 OTUs, of which 3 OTUs were shared among the clone libraries from all sites. The shared OTUs comprised 80 sequences belonging to Clostridiales Family XIII Incertae Sedis, Bacteroidetes and unclassified bacterial group. Significantly different communities were harbored by the anaerobic, aerobic and rhizosphere sites of the constructed wetland. Numerous representative genera of the dominant bacterial classes obtained from the different sample sites of the integrated system have been implicated in the removal of various carbon- containing pollutants of natural and synthetic origins. To our knowledge, this is the first report of microbial community structure in tannery wastewater treatment plant from Ethiopia. PMID:25541981
CENP-B binds a novel centromeric sequence in the Asian mouse Mus caroli.

PubMed Central

Kipling, D; Mitchell, A R; Masumoto, H; Wilson, H E; Nicol, L; Cooke, H J

1995-01-01

Minor satellite DNA, found at Mus musculus centromeres, is not present in the genome of the Asian mouse Mus caroli. This repetitive sequence family is speculated to have a role in centromere function by providing an array of binding sites for the centromere-associated protein CENP-B. The apparent absence of CENP-B binding sites in the M. caroli genome poses a major challenge to this hypothesis. Here we describe two abundant satellite DNA sequences present at M. caroli centromeres. These satellites are organized as tandem repeat arrays, over 1 Mb in size, of either 60- or 79-bp monomers. All autosomes carry both satellites and small amounts of a sequence related to the M. musculus major satellite. The Y chromosome contains small amounts of both major satellite and the 60-bp satellite, whereas the X chromosome carries only major satellite sequences. M. caroli chromosomes segregate in M. caroli x M. musculus interspecific hybrid cell lines, indicating that the two sets of chromosomes can interact with the same mitotic spindle. Using a polyclonal CENP-B antiserum, we demonstrate that M. caroli centromeres can bind murine CENP-B in such an interspecific cell line, despite the absence of canonical 17-bp CENP-B binding sites in the M. caroli genome. Sequence analysis of the 79-bp M. caroli satellite reveals a 17-bp motif that contains all nine bases previously shown to be necessary for in vitro binding of CENP-B. This M. caroli motif binds CENP-B from HeLa cell nuclear extract in vitro, as indicated by gel mobility shift analysis. We therefore suggest that this motif also causes CENP-B to associate with M. caroli centromeres in vivo. Despite the sequence differences, M. caroli presents a third, novel mammalian centromeric sequence producing an array of binding sites for CENP-B. PMID:7623797
Characterization of genetic elements required for site-specific integration of Lactobacillus delbrueckii subsp. bulgaricus bacteriophage mv4 and construction of an integration-proficient vector for Lactobacillus plantarum.

PubMed Central

Dupont, L; Boizet-Bonhoure, B; Coddeville, M; Auvray, F; Ritzenthaler, P

1995-01-01

Temperate phage mv4 integrates its DNA into the chromosome of Lactobacillus delbrueckii subsp. bulgaricus strains via site-specific recombination. Nucleotide sequencing of a 2.2-kb attP-containing phage fragment revealed the presence of four open reading frames. The larger open reading frame, close to the attP site, encoded a 427-amino-acid polypeptide with similarity in its C-terminal domain to site-specific recombinases of the integrase family. Comparison of the sequences of attP, bacterial attachment site attB, and host-phage junctions attL and attR identified a 17-bp common core sequence, where strand exchange occurs during recombination. Analysis of the attB sequence indicated that the core region overlaps the 3' end of a tRNA(Ser) gene. Phage mv4 DNA integration into the tRNA(Ser) gene preserved an intact tRNA(Ser) gene at the attL site. An integration vector based on the mv4 attP site and int gene was constructed. This vector transforms a heterologous host, L. plantarum, through site-specific integration into the tRNA(Ser) gene of the genome and will be useful for development of an efficient integration system for a number of additional bacterial species in which an identical tRNA gene is present. PMID:7836291
Protein Information Resource: a community resource for expert annotation of protein data

PubMed Central

Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

2001-01-01

The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
SfiI genomic cleavage map of Escherichia coli K-12 strain MG1655.

PubMed Central

Perkins, J D; Heath, J D; Sharma, B R; Weinstock, G M

1992-01-01

An SfiI restriction map of Escherichia coli K-12 strain MG1655 is presented. The map contains thirty-one cleavage sites separating fragments ranging in size from 407 kb to 3.7 kb. Several techniques were used in the construction of this map, including CHEF pulsed field gel electrophoresis; physical analysis of a set of twenty-six auxotrophic transposon insertions; correlation with the restriction map of Kohara and coworkers using the commercially available E. coli Gene Mapping Membranes; analysis of publicly available sequence information; and correlation of the above data with the combined genetic and physical map developed by Rudd, et al. The combination of these techniques has yielded a map in which all but one site can be localized within a range of +/- 2 kb, and over half the sites can be localized precisely by sequence data. Two sites present in the EcoSeq5 sequence database are not cleaved in MG1655 and four sites are noted to be sensitive to methylation by the dcm methylase. This map, combined with the NotI physical map of MG1655, can aid in the rapid, precise mapping of several different types of genetic alterations, including transposon mediated mutations and other insertions, inversions, deletions and duplications. Images PMID:1312707
Molecular characterization of diazotrophic and denitrifying bacteria associated with mangrove roots.

PubMed

Flores-Mireles, Ana L; Winans, Stephen C; Holguin, Gina

2007-11-01

An analysis of the molecular diversity of N(2) fixers and denitrifiers associated with mangrove roots was performed using terminal restriction length polymorphism (T-RFLP) of nifH (N(2) fixation) and nirS and nirK (denitrification), and the compositions and structures of these communities among three sites were compared. The number of operational taxonomic units (OTU) for nifH was higher than that for nirK or nirS at all three sites. Site 3, which had the highest organic matter and sand content in the rhizosphere sediment, as well as the lowest pore water oxygen concentration, had the highest nifH diversity. Principal component analysis of biogeochemical parameters identified soil texture, organic matter content, pore water oxygen concentration, and salinity as the main variables that differentiated the sites. Nonmetric multidimensional scaling (MDS) analyses of the T-RFLP data using the Bray-Curtis coefficient, group analyses, and pairwise comparisons between the sites clearly separated the OTU of site 3 from those of sites 1 and 2. For nirS, there were statistically significant differences in the composition of OTU among the sites, but the variability was less than for nifH. OTU defined on the basis of nirK were highly similar, and the three sites were not clearly separated on the basis of these sequences. The phylogenetic trees of nifH, nirK, and nirS showed that most of the cloned sequences were more similar to sequences from the rhizosphere isolates than to those from known strains or from other environments.
Molecular Characterization of Diazotrophic and Denitrifying Bacteria Associated with Mangrove Roots▿

PubMed Central

Flores-Mireles, Ana L.; Winans, Stephen C.; Holguin, Gina

2007-01-01

An analysis of the molecular diversity of N2 fixers and denitrifiers associated with mangrove roots was performed using terminal restriction length polymorphism (T-RFLP) of nifH (N2 fixation) and nirS and nirK (denitrification), and the compositions and structures of these communities among three sites were compared. The number of operational taxonomic units (OTU) for nifH was higher than that for nirK or nirS at all three sites. Site 3, which had the highest organic matter and sand content in the rhizosphere sediment, as well as the lowest pore water oxygen concentration, had the highest nifH diversity. Principal component analysis of biogeochemical parameters identified soil texture, organic matter content, pore water oxygen concentration, and salinity as the main variables that differentiated the sites. Nonmetric multidimensional scaling (MDS) analyses of the T-RFLP data using the Bray-Curtis coefficient, group analyses, and pairwise comparisons between the sites clearly separated the OTU of site 3 from those of sites 1 and 2. For nirS, there were statistically significant differences in the composition of OTU among the sites, but the variability was less than for nifH. OTU defined on the basis of nirK were highly similar, and the three sites were not clearly separated on the basis of these sequences. The phylogenetic trees of nifH, nirK, and nirS showed that most of the cloned sequences were more similar to sequences from the rhizosphere isolates than to those from known strains or from other environments. PMID:17827324
Molecular genetic analysis of ancient cattle bones excavated from archaeological sites in Jeju, Korea.

PubMed

Kim, Jae-Hwan; Oh, Ju-Hyung; Song, Ji-Hoon; Jeon, Jin-Tae; Han, Sang-Hyun; Jung, Yong-Hwan; Oh, Moon-You

2005-12-31

Ancient cattle bones were excavated from archaeological sites in Jeju, Korea. We used molecular genetic techniques to identify the species and establish its relationship to extant cattle breeds. Ancient DNA was extracted from four sources: a humerus (Gonae site, A.D. 700-800), two fragments of radius, and a tooth (Kwakji site, A.D. 0-900). The mitochondrial DNA (mtDNA) D-loop regions were cloned, sequenced, and compared with previously reported sequences of various cattle breeds (9 Asian, 8 European, and 3 African). The results revealed that these bones were of the breed, Bos taurus, and a phylogenetic tree indicated that the four cattle bones formed a monophyletic group with Jeju native black cattle. However, the patterns of sequence variation and reports from archaeological sites suggest that a few wild cattle, with a different maternal lineage, may have existed on Jeju Island. Our results will contribute to further studies of the origin of Jeju native cattle and the possible existence of local wild cattle.
Genetic analysis of tumorigenesis: XXXII. Localization of constitutionally amplified KRAS sequences to Chinese hamster chromosomes X and Y by in situ hybridization.

PubMed

Stenman, G; Anisowicz, A; Sager, R

1988-11-01

The KRAS gene is constitutionally amplified in the Chinese hamster. We have mapped the amplified sequences by in situ hybridization to two major sites on the X and Y chromosomes, Xq4 and Yp2. No autosomal site was detected despite a search under relaxed hybridization conditions. KRAS DNA is amplified about 50-fold compared to a human cell line known to have a diploid number of KRAS sequences, whereas mRNA expression is 5- to 10-fold lower than in normal human cells. While mRNA expression levels do not necessarily parallel gene copy number, the low expression level strongly suggests that the amplified sequences are transcriptionally silent. It is suggested that the amplified sequences arose from the original KRAS gene on chromosome 8 and that the KRAS sequences on the Y chromosome arose by X-Y recombination.

Cluster analysis of S. Cerevisiae nucleosome binding sites

NASA Astrophysics Data System (ADS)

Suvorova, Y.; Korotkov, E.

2017-12-01

It is well known that major part of a eukaryotic genome is wrapped around histone proteins forming nucleosomes. It was also demonstrated that the DNA sequence itself is playing an important role in the nucleosome positioning process. In this work, a cluster analysis of 67 517 nucleosome binding sites from the S. Cerevisiae genome was carried out. The classification method is based on the self-adjusting dinucleotides position weight matrix. As a result, 135 significant clusters were discovered that contain 43225 sequences (which constitutes 64% of the initial set). The meaning of the found classes is discussed, as well as the possibility of the further usage.
Sequences required for induction of neurotensin receptor gene expression during neuronal differentiation of N1E-115 neuroblastoma cells.

PubMed

Tavares, D; Tully, K; Dobner, P R

1999-10-15

The promoter region of the mouse high affinity neurotensin receptor (Ntr-1) gene was characterized, and sequences required for expression in neuroblastoma cell lines that express high affinity NT-binding sites were characterized. Me(2)SO-induced neuronal differentiation of N1E-115 neuroblastoma cells increased both the expression of the endogenous Ntr-1 gene and reporter genes driven by NTR-1 promoter sequences by 3-4-fold. Deletion analysis revealed that an 83-base pair promoter region containing the transcriptional start site is required for Me(2)SO activation. Detailed mutational analysis of this region revealed that a CACCC box and the central region of a large GC-rich palindrome are the crucial cis-regulatory elements required for Me(2)SO induction. The CACCC box is bound by at least one factor that is induced upon Me(2)SO treatment of N1E-115 cells. The Me(2)SO effect was found to be both selective and cell type-restricted. Basal expression in the neuroblastoma cell lines required a distinct set of sequences, including an Sp1-like sequence, and a sequence resembling an NGFI-A-binding site; however, a more distal 5' sequence was found to repress basal activity in N1E-115 cells. These results provide evidence that Ntr-1 gene regulation involves both positive and negative regulatory elements located in the 5'-flanking region and that Ntr-1 gene activation involves the coordinate activation or induction of several factors, including a CACCC box binding complex.
GATA: A graphic alignment tool for comparative sequenceanalysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

PubMed Central

Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

1993-01-01

A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829
Identification of a p53-response element in the promoter of the proline oxidase gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maxwell, Steve A.; Kochevar, Gerald J.

2008-05-02

Proline oxidase (POX) is a p53-induced proapoptotic gene. We investigated whether p53 could bind directly to the POX gene promoter. Chromatin immunoprecipitation (ChIP) assays detected p53 bound to POX upstream gene sequences. In support of the ChIP results, sequence analysis of the POX gene and its 5' flanking sequences revealed a potential p53-binding site, GGGCTTGTCTTCGTGTGACTTCTGTCT, located at 1161 base pairs (bp) upstream of the transcriptional start site. A 711-bp DNA fragment containing the candidate p53-binding site exhibited reporter gene activity that was induced by p53. In contrast, the same DNA region lacking the candidate p53-binding site did not show significantmore » p53-response activity. Electrophoretic mobility shift assay (EMSA) in ACHN renal carcinoma cell nuclear lysates confirmed that p53 could bind to the 711-bp POX DNA fragment. We concluded from these experiments that a p53-binding site is positioned at -1161 to -1188 bp upstream of the POX transcriptional start site.« less
Cloning, characterization and sequence comparison of the gene coding for IMP dehydrogenase from Pyrococcus furiosus.

PubMed

Collart, F R; Osipiuk, J; Trent, J; Olsen, G J; Huberman, E

1996-10-03

We have cloned and characterized the gene encoding inosine monophosphate dehydrogenase (IMPDH) from Pyrococcus furiosus (Pf), a hyperthermophillic archeon. Sequence analysis of the Pf gene indicated an open reading frame specifying a protein of 485 amino acids (aa) with a calculated M(r) of 52900. Canonical Archaea promoter elements, Box A and Box B, are located -49 and -17 nucleotides (nt), respectively, upstream of the putative start codon. The sequence of the putative active-site region conforms to the IMPDH signature motif and contains a putative active-site cysteine. Phylogenetic relationships derived by using all available IMPDH sequences are consistent with trees developed for other molecules; they do not precisely resolve the history of Pf IMPDH but indicate a close similarity to bacterial IMPDH proteins. The phylogenetic analysis indicates that a gene duplication occurred prior to the division between rodents and humans, accounting for the Type I and II isoforms identified in mice and humans.
Genome-wide identification and analysis of A-to-I RNA editing events in bovine by transcriptome sequencing

PubMed Central

Salehi, Abdolreza; Rivera, Rocío Melissa

2018-01-01

RNA editing increases the diversity of the transcriptome and proteome. Adenosine-to-inosine (A-to-I) editing is the predominant type of RNA editing in mammals and it is catalyzed by the adenosine deaminases acting on RNA (ADARs) family. Here, we used a largescale computational analysis of transcriptomic data from brain, heart, colon, lung, spleen, kidney, testes, skeletal muscle and liver, from three adult animals in order to identify RNA editing sites in bovine. We developed a computational pipeline and used a rigorous strategy to identify novel editing sites from RNA-Seq data in the absence of corresponding DNA sequence information. Our methods take into account sequencing errors, mapping bias, as well as biological replication to reduce the probability of obtaining a false-positive result. We conducted a detailed characterization of sequence and structural features related to novel candidate sites and found 1,600 novel canonical A-to-I editing sites in the nine bovine tissues analyzed. Results show that these sites 1) occur frequently in clusters and short interspersed nuclear elements (SINE) repeats, 2) have a preference for guanines depletion/enrichment in the flanking 5′/3′ nucleotide, 3) occur less often in coding sequences than other regions of the genome, and 4) have low evolutionary conservation. Further, we found that a positive correlation exists between expression of ADAR family members and tissue-specific RNA editing. Most of the genes with predicted A-to-I editing in each tissue were significantly enriched in biological terms relevant to the function of the corresponding tissue. Lastly, the results highlight the importance of the RNA editome in nervous system regulation. The present study extends the list of RNA editing sites in bovine and provides pipelines that may be used to investigate the editome in other organisms. PMID:29470549
Human germline and pan-cancer variomes and their distinct functional profiles

PubMed Central

Pan, Yang; Karagiannis, Konstantinos; Zhang, Haichen; Dingerdissen, Hayley; Shamsaddini, Amirhossein; Wan, Quan; Simonyan, Vahan; Mazumder, Raja

2014-01-01

Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations. PMID:25232094
Leaf margin phenotype-specific restriction-site-associated DNA-derived markers for pineapple (Ananas comosus L.)

PubMed Central

Urasaki, Naoya; Goeku, Satoko; Kaneshima, Risa; Takamine, Tomonori; Tarora, Kazuhiko; Takeuchi, Makoto; Moromizato, Chie; Yonamine, Kaname; Hosaka, Fumiko; Terakami, Shingo; Matsumura, Hideo; Yamamoto, Toshiya; Shoda, Moriyuki

2015-01-01

To explore genome-wide DNA polymorphisms and identify DNA markers for leaf margin phenotypes, a restriction-site-associated DNA sequencing analysis was employed to analyze three bulked DNAs of F1 progeny from a cross between a ‘piping-leaf-type’ cultivar, ‘Yugafu’, and a ‘spiny-tip-leaf-type’ variety, ‘Yonekura’. The parents were both Ananas comosus var. comosus. From the analysis, piping-leaf and spiny-tip-leaf gene-specific restriction-site-associated DNA sequencing tags were obtained and designated as PLSTs and STLSTs, respectively. The five PLSTs and two STSLTs were successfully converted to cleaved amplified polymorphic sequence (CAPS) or simple sequence repeat (SSR) markers using the sequence differences between alleles. Based on the genotyping of the F1 with two SSR and three CAPS markers, the five PLST markers were mapped in the vicinity of the P locus, with the closest marker, PLST1_SSR, being located 1.5 cM from the P locus. The two CAPS markers from STLST1 and STLST3 perfectly assessed the ‘spiny-leaf type’ as homozygotes of the recessive s allele of the S gene. The recombination value between the S locus and STLST loci was 2.4, and STLSTs were located 2.2 cM from the S locus. SSR and CAPS markers are applicable to marker-assisted selection of leaf margin phenotypes in pineapple breeding. PMID:26175625
Leaf margin phenotype-specific restriction-site-associated DNA-derived markers for pineapple (Ananas comosus L.).

PubMed

Urasaki, Naoya; Goeku, Satoko; Kaneshima, Risa; Takamine, Tomonori; Tarora, Kazuhiko; Takeuchi, Makoto; Moromizato, Chie; Yonamine, Kaname; Hosaka, Fumiko; Terakami, Shingo; Matsumura, Hideo; Yamamoto, Toshiya; Shoda, Moriyuki

2015-06-01

To explore genome-wide DNA polymorphisms and identify DNA markers for leaf margin phenotypes, a restriction-site-associated DNA sequencing analysis was employed to analyze three bulked DNAs of F1 progeny from a cross between a 'piping-leaf-type' cultivar, 'Yugafu', and a 'spiny-tip-leaf-type' variety, 'Yonekura'. The parents were both Ananas comosus var. comosus. From the analysis, piping-leaf and spiny-tip-leaf gene-specific restriction-site-associated DNA sequencing tags were obtained and designated as PLSTs and STLSTs, respectively. The five PLSTs and two STSLTs were successfully converted to cleaved amplified polymorphic sequence (CAPS) or simple sequence repeat (SSR) markers using the sequence differences between alleles. Based on the genotyping of the F1 with two SSR and three CAPS markers, the five PLST markers were mapped in the vicinity of the P locus, with the closest marker, PLST1_SSR, being located 1.5 cM from the P locus. The two CAPS markers from STLST1 and STLST3 perfectly assessed the 'spiny-leaf type' as homozygotes of the recessive s allele of the S gene. The recombination value between the S locus and STLST loci was 2.4, and STLSTs were located 2.2 cM from the S locus. SSR and CAPS markers are applicable to marker-assisted selection of leaf margin phenotypes in pineapple breeding.
Genome-wide analysis of Tol2 transposon reintegration in zebrafish.

PubMed

Kondrychyn, Igor; Garcia-Lecea, Marta; Emelyanov, Alexander; Parinov, Sergey; Korzh, Vladimir

2009-09-08

Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. We performed a large-scale enhancer trap (ET) screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.
The DNA Methylome of Human Peripheral Blood Mononuclear Cells

PubMed Central

Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing

2010-01-01

DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693
Terminal Restriction Fragment Length Polymorphism Analysis Program, a Web-Based Research Tool for Microbial Community Analysis

PubMed Central

Marsh, Terence L.; Saxman, Paul; Cole, James; Tiedje, James

2000-01-01

Rapid analysis of microbial communities has proven to be a difficult task. This is due, in part, to both the tremendous diversity of the microbial world and the high complexity of many microbial communities. Several techniques for community analysis have emerged over the past decade, and most take advantage of the molecular phylogeny derived from 16S rRNA comparative sequence analysis. We describe a web-based research tool located at the Ribosomal Database Project web site (http://www.cme.msu.edu/RDP/html/analyses.html) that facilitates microbial community analysis using terminal restriction fragment length polymorphism of 16S ribosomal DNA. The analysis function (designated TAP T-RFLP) permits the user to perform in silico restriction digestions of the entire 16S sequence database and derive terminal restriction fragment sizes, measured in base pairs, from the 5′ terminus of the user-specified primer to the 3′ terminus of the restriction endonuclease target site. The output can be sorted and viewed either phylogenetically or by size. It is anticipated that the site will guide experimental design as well as provide insight into interpreting results of community analysis with terminal restriction fragment length polymorphisms. PMID:10919828
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
The LAM-PCR Method to Sequence LV Integration Sites.

PubMed

Wang, Wei; Bartholomae, Cynthia C; Gabriel, Richard; Deichmann, Annette; Schmidt, Manfred

2016-01-01

Integrating viral gene transfer vectors are commonly used gene delivery tools in clinical gene therapy trials providing stable integration and continuous gene expression of the transgene in the treated host cell. However, integration of the reverse-transcribed vector DNA into the host genome is a potentially mutagenic event that may directly contribute to unwanted side effects. A comprehensive and accurate analysis of the integration site (IS) repertoire is indispensable to study clonality in transduced cells obtained from patients undergoing gene therapy and to identify potential in vivo selection of affected cell clones. To date, next-generation sequencing (NGS) of vector-genome junctions allows sophisticated studies on the integration repertoire in vitro and in vivo. We have explored the use of the Illumina MiSeq Personal Sequencer platform to sequence vector ISs amplified by non-restrictive linear amplification-mediated PCR (nrLAM-PCR) and LAM-PCR. MiSeq-based high-quality IS sequence retrieval is accomplished by the introduction of a double-barcode strategy that substantially minimizes the frequency of IS sequence collisions compared to the conventionally used single-barcode protocol. Here, we present an updated protocol of (nr)LAM-PCR for the analysis of lentiviral IS using a double-barcode system and followed by deep sequencing using the MiSeq device.
Microbial community analysis of a coastal hot spring in Kagoshima, Japan, using molecular- and culture-based approaches.

PubMed

Nishiyama, Minako; Yamamoto, Shuichi; Kurosawa, Norio

2013-08-01

Ibusuki hot spring is located on the coastline of Kagoshima Bay, Japan. The hot spring water is characterized by high salinity, high temperature, and neutral pH. The hot spring is covered by the sea during high tide, which leads to severe fluctuations in several environmental variables. A combination of molecular- and culture-based techniques was used to determine the bacterial and archaeal diversity of the hot spring. A total of 48 thermophilic bacterial strains were isolated from two sites (Site 1: 55.6°C; Site 2: 83.1°C) and they were categorized into six groups based on their 16S rRNA gene sequence similarity. Two groups (including 32 isolates) demonstrated low sequence similarity with published species, suggesting that they might represent novel taxa. The 148 clones from the Site 1 bacterial library included 76 operational taxonomy units (OTUs; 97% threshold), while 132 clones from the Site 2 bacterial library included 31 OTUs. Proteobacteria, Bacteroidetes, and Firmicutes were frequently detected in both clone libraries. The clones were related to thermophilic, mesophilic and psychrophilic bacteria. Approximately half of the sequences in bacterial clone libraries shared <92% sequence similarity with their closest sequences in a public database, suggesting that the Ibusuki hot spring may harbor a unique and novel bacterial community. By contrast, 77 clones from the Site 2 archaeal library contained only three OTUs, most of which were affiliated with Thaumarchaeota.
Polymorphism at codon 36 of the p53 gene.

PubMed

Felix, C A; Brown, D L; Mitsudomi, T; Ikagaki, N; Wong, A; Wasserman, R; Womer, R B; Biegel, J A

1994-01-01

A polymorphism at codon 36 in exon 4 of the p53 gene was identified by single strand conformation polymorphism (SSCP) analysis and direct sequencing of genomic DNA PCR products. The polymorphic allele, present in the heterozygous state in genomic DNAs of four of 100 individuals (4%), changes the codon 36 CCG to CCA, eliminates a FinI restriction site and creates a BccI site. Including this polymorphism there are four known polymorphisms in the p53 coding sequence.
Monitoring and source tracking of tetracycline resistance genes in lagoons and groundwater adjacent to swine production facilities over a 3-year period

USGS Publications Warehouse

Koike, S.; Krapac, I.G.; Oliver, H.D.; Yannarell, A.C.; Chee-Sanford, J. C.; Aminov, R.I.; Mackie, R.I.

2007-01-01

To monitor the dissemination of resistance genes into the environment, we determined the occurrence of tetracycline resistance (Tcr) genes in groundwater underlying two swine confinement operations. Monitoring well networks (16 wells at site A and 6 wells at site C) were established around the lagoons at each facility. Groundwater (n = 124) and lagoon (n = 12) samples were collected from the two sites at six sampling times from 2000 through 2003. Total DNA was extracted, and PCR was used to detect seven Tcr genes [tet(M), tet(O), tet(Q), tet(W), tet(C), tet(H), and tet(Z)]. The concentration of Tcr genes was quantified by real-time quantitative PCR. To confirm the Tcr gene source in groundwater, comparative analysis of tet(W) gene sequences was performed on groundwater and lagoon samples. All seven Tcr genes were continually detected in groundwater during the 3-year monitoring period at both sites. At site A, elevated detection frequency and concentration of Tcr genes were observed in the wells located down-gradient of the lagoon. Comparative analysis of tet(W) sequences revealed that the impacted groundwater contained gene sequences almost identical (99.8% identity) to those in the lagoon, but these genes were not found in background libraries. Novel sequence clusters and unique indigenous resistance gene pools were also found in the groundwater. Thus, antibiotic resistance genes in groundwater are affected by swine manure, but they are also part of the indigenous gene pool. Copyright ?? 2007, American Society for Microbiology. All Rights Reserved.
Monitoring and Source Tracking of Tetracycline Resistance Genes in Lagoons and Groundwater Adjacent to Swine Production Facilities over a 3-Year Period▿

PubMed Central

Koike, S.; Krapac, I. G.; Oliver, H. D.; Yannarell, A. C.; Chee-Sanford, J. C.; Aminov, R. I.; Mackie, R. I.

2007-01-01

To monitor the dissemination of resistance genes into the environment, we determined the occurrence of tetracycline resistance (Tcr) genes in groundwater underlying two swine confinement operations. Monitoring well networks (16 wells at site A and 6 wells at site C) were established around the lagoons at each facility. Groundwater (n = 124) and lagoon (n = 12) samples were collected from the two sites at six sampling times from 2000 through 2003. Total DNA was extracted, and PCR was used to detect seven Tcr genes [tet(M), tet(O), tet(Q), tet(W), tet(C), tet(H), and tet(Z)]. The concentration of Tcr genes was quantified by real-time quantitative PCR. To confirm the Tcr gene source in groundwater, comparative analysis of tet(W) gene sequences was performed on groundwater and lagoon samples. All seven Tcr genes were continually detected in groundwater during the 3-year monitoring period at both sites. At site A, elevated detection frequency and concentration of Tcr genes were observed in the wells located down-gradient of the lagoon. Comparative analysis of tet(W) sequences revealed that the impacted groundwater contained gene sequences almost identical (99.8% identity) to those in the lagoon, but these genes were not found in background libraries. Novel sequence clusters and unique indigenous resistance gene pools were also found in the groundwater. Thus, antibiotic resistance genes in groundwater are affected by swine manure, but they are also part of the indigenous gene pool. PMID:17545324
Exome Sequencing Identified a Splice Site Mutation in FHL1 that Causes Uruguay Syndrome, an X-Linked Disorder With Skeletal Muscle Hypertrophy and Premature Cardiac Death.

PubMed

Xue, Yuan; Schoser, Benedikt; Rao, Aliz R; Quadrelli, Roberto; Vaglio, Alicia; Rupp, Verena; Beichler, Christine; Nelson, Stanley F; Schapacher-Tilp, Gudrun; Windpassinger, Christian; Wilcox, William R

2016-04-01

Previously, we reported a rare X-linked disorder, Uruguay syndrome in a single family. The main features are pugilistic facies, skeletal deformities, and muscular hypertrophy despite a lack of exercise and cardiac ventricular hypertrophy leading to premature death. An ≈19 Mb critical region on X chromosome was identified through identity-by-descent analysis of 3 affected males. Exome sequencing was conducted on one affected male to identify the disease-causing gene and variant. A splice site variant (c.502-2A>G) in the FHL1 gene was highly suspicious among other candidate genes and variants. FHL1A is the predominant isoform of FHL1 in cardiac and skeletal muscle. Sequencing cDNA showed the splice site variant led to skipping of exons 6 of the FHL1A isoform, equivalent to the FHL1C isoform. Targeted analysis showed that this splice site variant cosegregated with disease in the family. Western blot and immunohistochemical analysis of muscle from the proband showed a significant decrease in protein expression of FHL1A. Real-time polymerase chain reaction analysis of different isoforms of FHL1 demonstrated that the FHL1C is markedly increased. Mutations in the FHL1 gene have been reported in disorders with skeletal and cardiac myopathy but none has the skeletal or facial phenotype seen in patients with Uruguay syndrome. Our data suggest that a novel FHL1 splice site variant results in the absence of FHL1A and the abundance of FHL1C, which may contribute to the complex and severe phenotype. Mutation screening of the FHL1 gene should be considered for patients with uncharacterized myopathies and cardiomyopathies. © 2016 American Heart Association, Inc.

Comparative analysis and molecular characterization of genomic sequences and proteins of FABP4 and FABP5 from the giant panda (Ailuropoda melanoleuca).

PubMed

Song, B; Hou, Y L; Ding, X; Wang, T; Wang, F; Zhong, J C; Xu, T; Zhong, J; Hou, W R; Shuai, S R

2014-02-20

Fatty acid binding proteins (FABPs) are a family of small, highly conserved cytoplasmic proteins that bind long-chain fatty acids and other hydrophobic ligands. In this study, cDNA and genomic sequences of FABP4 and FABP5 were cloned successfully from the giant panda (Ailuropoda melanoleuca) using reverse transcription polymerase chain reaction (RT-PCR) technology and touchdown-PCR. The cDNAs of FABP4 and FABP5 cloned from the giant panda were 400 and 413 bp in length, containing an open reading frame of 399 and 408 bp, encoding 132 and 135 amino acids, respectively. The genomic sequences of FABP4 and FABP5 were 3976 and 3962 bp, respectively, which each contained four exons and three introns. Sequence alignment indicated a high degree of homology with reported FABP sequences of other mammals at both the amino acid and DNA levels. Topology prediction revealed seven protein kinase C phosphorylation sites, two casein kinase II phosphorylation sites, two N-myristoylation sites, and one cytosolic fatty acid-binding protein signature in the FABP4 protein, and three N-glycosylation sites, three protein kinase C phosphorylation sites, one casein kinase II phosphorylation site, one N-myristoylation site, one amidation site, and one cytosolic fatty acid-binding protein signature in the FABP5 protein. The FABP4 and FABP5 genes were overexpressed in Escherichia coli BL21 and they produced the expected 16.8- and 17.0-kDa polypeptides. The results obtained in this study provide information for further in-depth research of this system, which has great value of both theoretical and practical significance.
Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm.

PubMed

Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong

2017-01-01

Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Transcription initiation from the dihydrofolate reductase promoter is positioned by HIP1 binding at the initiation site.

PubMed

Means, A L; Farnham, P J

1990-02-01

We have identified a sequence element that specifies the position of transcription initiation for the dihydrofolate reductase gene. Unlike the functionally analogous TATA box that directs RNA polymerase II to initiate transcription 30 nucleotides downstream, the positioning element of the dihydrofolate reductase promoter is located directly at the site of transcription initiation. By using DNase I footprint analysis, we have shown that a protein binds to this initiator element. Transcription initiated at the dihydrofolate reductase initiator element when 28 nucleotides were inserted between it and all other upstream sequences, or when it was placed on either side of the DNA helix, suggesting that there is no strict spatial requirement between the initiator and an upstream element. Although neither a single Sp1-binding site nor a single initiator element was sufficient for transcriptional activity, the combination of one Sp1-binding site and the dihydrofolate reductase initiator element cloned into a plasmid vector resulted in transcription starting at the initiator element. We have also shown that the simian virus 40 late major initiation site has striking sequence homology to the dihydrofolate reductase initiation site and that the same, or a similar, protein binds to both sites. Examination of the sequences at other RNA polymerase II initiation sites suggests that we have identified an element that is important in the transcription of other housekeeping genes. We have thus named the protein that binds to the initiator element HIP1 (Housekeeping Initiator Protein 1).
Detection of canonical A-to-G editing events at 3′ UTRs and microRNA target sites in human lungs using next-generation sequencing

PubMed Central

Soundararajan, Ramani; Stearns, Timothy M.; Griswold, Anthony J.; Mehta, Arpit; Czachor, Alexander; Fukumoto, Jutaro; Lockey, Richard F.; King, Benjamin L.; Kolliputi, Narasaiah

2015-01-01

RNA editing is a post-transcriptional modification of RNA. The majority of these changes result from adenosine deaminase acting on RNA (ADARs) catalyzing the conversion of adenosine residues to inosine in double-stranded RNAs (dsRNAs). Massively parallel sequencing has enabled the identification of RNA editing sites in human transcriptomes. In this study, we sequenced DNA and RNA from human lungs and identified RNA editing sites with high confidence via a computational pipeline utilizing stringent analysis thresholds. We identified a total of 3,447 editing sites that overlapped in three human lung samples, and with 50% of these sites having canonical A-to-G base changes. Approximately 27% of the edited sites overlapped with Alu repeats, and showed A-to-G clustering (>3 clusters in 100 bp). The majority of edited sites mapped to either 3′ untranslated regions (UTRs) or introns close to splice sites; whereas, only few sites were in exons resulting in non-synonymous amino acid changes. Interestingly, we identified 652 A-to-G editing events in the 3′ UTR of 205 target genes that mapped to 932 potential miRNA target binding sites. Several of these miRNA edited sites were validated in silico. Additionally, we validated several A-to-G edited sites by Sanger sequencing. Altogether, our study suggests a role for RNA editing in miRNA-mediated gene regulation and splicing in human lungs. In this study, we have generated a RNA editome of human lung tissue that can be compared with other RNA editomes across different lung tissues to delineate a role for RNA editing in normal and diseased states. PMID:26486088
Detection of canonical A-to-G editing events at 3' UTRs and microRNA target sites in human lungs using next-generation sequencing.

PubMed

Soundararajan, Ramani; Stearns, Timothy M; Griswold, Anthony L; Mehta, Arpit; Czachor, Alexander; Fukumoto, Jutaro; Lockey, Richard F; King, Benjamin L; Kolliputi, Narasaiah

2015-11-03

RNA editing is a post-transcriptional modification of RNA. The majority of these changes result from adenosine deaminase acting on RNA (ADARs) catalyzing the conversion of adenosine residues to inosine in double-stranded RNAs (dsRNAs). Massively parallel sequencing has enabled the identification of RNA editing sites in human transcriptomes. In this study, we sequenced DNA and RNA from human lungs and identified RNA editing sites with high confidence via a computational pipeline utilizing stringent analysis thresholds. We identified a total of 3,447 editing sites that overlapped in three human lung samples, and with 50% of these sites having canonical A-to-G base changes. Approximately 27% of the edited sites overlapped with Alu repeats, and showed A-to-G clustering (>3 clusters in 100 bp). The majority of edited sites mapped to either 3' untranslated regions (UTRs) or introns close to splice sites; whereas, only few sites were in exons resulting in non-synonymous amino acid changes. Interestingly, we identified 652 A-to-G editing events in the 3' UTR of 205 target genes that mapped to 932 potential miRNA target binding sites. Several of these miRNA edited sites were validated in silico. Additionally, we validated several A-to-G edited sites by Sanger sequencing. Altogether, our study suggests a role for RNA editing in miRNA-mediated gene regulation and splicing in human lungs. In this study, we have generated a RNA editome of human lung tissue that can be compared with other RNA editomes across different lung tissues to delineate a role for RNA editing in normal and diseased states.
Molecular Targeting of Prostate Cancer During Androgen Ablation: Inhibition of CHES1/FOXN3

DTIC Science & Technology

2013-05-01

the DNA sequences (~25^6 reads/sample) were mapped to the human genome reference sequence (hg19...tumor the AR has a genomic abnormality, placing the novel sequence 3’ of the transcriptional start site. However, it is unclear if a genomic alteration...exon/intron organization of the CHES1 gene was determined by BLAST analysis of the human genome using the 1,473-bp CHES1 cDNA sequence
Immune response to synthetic peptides representing antigenic sites on the glycoprotein of infectious hematopoietic necrosis virus

USGS Publications Warehouse

Emmenegger, Eveline J.; Huang, C.; LaPatra, S.; Winton, James R.

1995-01-01

Summary ― Monoclonal antibodies against infectious hematopoietic necrosis virus have been used to react with recombinant expression products in immunoblots and to select neutralization-resistant mutants for sequence analysis. These strategies identified neutralizing and non-neutralizing antigenic sites on the viral glycoprotein. Synthetic peptides based upon the amino acid sequences of these antigenic sites were synthesized and were injected together with an adjuvant into rainbow trout. The constructs generally failed to stimulate neutralizing antibodies in the fish. These results indicate that we need to understand more about the ability of peptide antigens to stimulate fish immune systems.
Identification of 16S Ribosomal DNA-Defined Bacterial Populations at a Shallow Submarine Hydrothermal Vent near Milos Island (Greece)

PubMed Central

Sievert, Stefan M.; Kuever, Jan; Muyzer, Gerard

2000-01-01

In a recent publication (S. M. Sievert, T. Brinkhoff, G. Muyzer, W. Ziebis, and J. Kuever, Appl. Environ. Microbiol. 65:3834–3842, 1999) we described spatiotemporal changes in the bacterial community structure at a shallow-water hydrothermal vent in the Aegean Sea near the isle of Milos (Greece). Here we describe identification and phylogenetic analysis of the predominant bacterial populations at the vent site and their distribution at the vent site as determined by sequencing of DNA molecules (bands) excised from denaturing gradient gels. A total of 36 bands could be sequenced, and there were representatives of eight major lineages of the domain Bacteria. Cytophaga-Flavobacterium and Acidobacterium were the most frequently retrieved bacterial groups. Less than 33% of the sequences exhibited 90% or more identity with cultivated organisms. The predominance of putative heterotrophic populations in the sequences retrieved is explained by the input of allochthonous organic matter at the vent site. PMID:10877814
Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

NASA Astrophysics Data System (ADS)

Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

2016-09-01

Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.
DNA sequence analysis of simian virus 40 mutants with deletions mapping in the leader region of the late viral mRNA's: mutants with deletions similar in size and position exhibit varied phenotypes.

PubMed

Barkan, A; Mertz, J E

1981-02-01

The nucleotide sequences of 10 viable yet partially defective deletion mutants of simian virus 40 were determined. The deletions mapped within, and, in many cases, 5' to, the predominant leader sequence of the late viral mRNA's. They ranged from 74 to 187 nucleotide pairs in length. Six of the mutants had lost the sequence that corresponds to the "cap" site (5' terminus) of the most abundant class of 16S mRNA's. One of these mutants had a deletion that extended 103 nucleotide pairs into the region preceding this primary cap site and, therefore, was missing many secondary cap sites as well. A seventh mutant lacked the entire major 16S leader sequence except for the first six nucleotides at its 5' end and the last nine at its 3' end. Although these mutants differed in the size and position of their deletions, we were unable to discover any simple correlations between their growth characteristics and their DNA sequences. This finding indicates that the secondary structures of the RNA transcripts may play a more important role than the exact nucleotide sequence of the RNAs in determining how they function within the cell.
Contacts between the factor TUF and RPG sequences.

PubMed

Vignais, M L; Huet, J; Buhler, J M; Sentenac, A

1990-08-25

The yeast TUF factor binds specifically to RPG-like sequences involved in multiple functions at enhancers, silencers, and telomeres. We have characterized the interaction of TUF with its optimal binding sequence, rpg-1 (1-ACACCCATACATTT-14), using a gel DNA-binding assay in combination with methylation protection and mutagenesis experiments. As many as 10 base pairs appear to be engaged in factor binding. Analysis of a collection of 30 different RPG mutants demonstrated the importance of 8 base pairs at position 2, 3, 4, 5, 6, 7, 10, and 12 and the critical role of the central GC pair at position 5. Methylation protection data on four different natural sites confirmed a close contact at positions 4, 5, 6, and 10 and suggested additional contacts at base pairs 8, 12, and 13. The derived consensus sequence was RCAAYCCRYNCAYY. A quantitative band shift analysis was used to determine the equilibrium dissociation constant for the complex of TUF and its optimal binding site rpg-1. The specific dissociation constant (K8) was found to be 1.3 x 10(-11) M. The comparison of the K8 value with the dissociation constant obtained for nonspecific DNA sites (Kn8 = 8.7 x 10(-6) M) shows the high binding selectivity of TUF for its specific RPG target.
Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

PubMed

Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

2016-09-01

S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.
Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively.

PubMed

Clifford, Jacob; Adami, Christoph

2015-09-02

Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.
Expression of the Caulobacter heat shock gene dnaK is developmentally controlled during growth at normal temperatures.

PubMed Central

Gomes, S L; Gober, J W; Shapiro, L

1990-01-01

Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

NASA Astrophysics Data System (ADS)

Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

2017-02-01

Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

PubMed

Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

2013-07-01

Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.
Site directed recombination

DOEpatents

Jurka, Jerzy W.

1997-01-01

Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

DOE PAGES

Li, Jie; Overall, Christopher C.; Johnson, Rudd C.; ...

2015-09-21

The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Jie; Overall, Christopher C.; Johnson, Rudd C.

The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
Kinetic analysis of bypass of abasic site by the catalytic core of yeast DNA polymerase eta.

PubMed

Yang, Juntang; Wang, Rong; Liu, Binyan; Xue, Qizhen; Zhong, Mengyu; Zeng, Hao; Zhang, Huidong

2015-09-01

Abasic sites (Apurinic/apyrimidinic (AP) sites), produced ∼ 50,000 times/cell/day, are very blocking and miscoding. To better understand miscoding mechanisms of abasic site for yeast DNA polymerase η, pre-steady-state nucleotide incorporation and LC-MS/MS sequence analysis of extension product were studied using pol η(core) (catalytic core, residues 1-513), which can completely eliminate the potential effects of the C-terminal C2H2 motif of pol η on dNTP incorporation. The extension beyond the abasic site was very inefficient. Compared with incorporation of dCTP opposite G, the incorporation efficiencies opposite abasic site were greatly reduced according to the order of dGTP > dATP > dCTP and dTTP. Pol η(core) showed no fast burst phase for any incorporation opposite G or abasic site, suggesting that the catalytic step is not faster than the dissociation of polymerase from DNA. LC-MS/MS sequence analysis of extension products showed that 53% products were dGTP misincorporation, 33% were dATP and 14% were -1 frameshift, indicating that Pol η(core) bypasses abasic site by a combined G-rule, A-rule and -1 frameshift deletions. Compared with full-length pol η, pol η(core) relatively reduced the efficiency of incorporation of dCTP opposite G, increased the efficiencies of dNTP incorporation opposite abasic site and the exclusive incorporation of dGTP opposite abasic site, but inhibited the extension beyond abasic site, and increased the priority in extension of A: abasic site relative to G: abasic site. This study provides further understanding in the mutation mechanism of abasic sites for yeast DNA polymerase η. Copyright © 2015 Elsevier B.V. All rights reserved.

Nucleotide sequence analysis of the L gene of Newcastle disease virus: homologies with Sendai and vesicular stomatitis viruses.

PubMed Central

Yusoff, K; Millar, N S; Chambers, P; Emmerson, P T

1987-01-01

The nucleotide sequence of the L gene of the Beaudette C strain of Newcastle disease virus (NDV) has been determined. The L gene is 6704 nucleotides long and encodes a protein of 2204 amino acids with a calculated molecular weight of 248822. Mung bean nuclease mapping of the 5' terminus of the L gene mRNA indicates that the transcription of the L gene is initiated 11 nucleotides upstream of the translational start site. Comparison with the amino acid sequences of the L genes of Sendai virus and vesicular stomatitis virus (VSV) suggests that there are several regions of homology between the sequences. These data provide further evidence for an evolutionary relationship between the Paramyxoviridae and the Rhabdoviridae. A non-coding sequence of 46 nucleotides downstream of the presumed polyadenylation site of the L gene may be part of a negative strand leader RNA. Images PMID:3035486
Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deutscher, J.; Pevec, B.; Beyreuther, K.

1986-10-21

The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less
Land utilization and water resource inventories over extended test sites

NASA Technical Reports Server (NTRS)

Hoffer, R. M.

1972-01-01

In addition to the work on the corn blight this year, several other analysis tests were completed which resulted in significant findings. These aspects are discussed as follows: (1) field spectral measurements of soil conditions; (2) analysis of extended test site data; this discussion involves three different sets of data analysis sequences; (3) urban land use analysis, for studying water runoff potentials; and (4) thermal data quality study, as an expansion of our water resources studies involving temperature calibration techniques.
Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

PubMed

Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

2017-01-01

Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

PubMed Central

Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

2017-01-01

Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
Seroprevalence of feline immunodeficiency virus (FIV) and feline leukemia virus (FeLV) in shelter cats on the island of Newfoundland, Canada.

PubMed

Munro, Hannah J; Berghuis, Lesley; Lang, Andrew S; Rogers, Laura; Whitney, Hugh

2014-04-01

Feline immunodeficiency virus (FIV) and feline leukemia virus (FeLV) are retroviruses found within domestic and wild cat populations. These viruses cause severe illnesses that eventually lead to death. Housing cats communally for long periods of time makes shelters at high risk for virus transmission among cats. We tested 548 cats from 5 different sites across the island of Newfoundland for FIV and FeLV. The overall seroprevalence was 2.2% and 6.2% for FIV and FeLV, respectively. Two sites had significantly higher seroprevalence of FeLV infection than the other 3 sites. Analysis of sequences from the FeLV env gene (envelope gene) from 6 positive cats showed that 4 fell within the FeLV subtype-A, while 2 sequences were most closely related to FeLV subtype-B and endogenous feline leukemia virus (en FeLV). Varying seroprevalence and the variation in sequences at different sites demonstrate that some shelters are at greater risk of FeLV infections and recombination can occur at sites of high seroprevalence.
Analysis of sequencing data for probing RNA secondary structures and protein-RNA binding in studying posttranscriptional regulations.

PubMed

Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y

2016-11-01

High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci.

PubMed

Zhu, Tianqi; Dos Reis, Mario; Yang, Ziheng

2015-03-01

Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.
Population-genetic analysis of HvABCG31 promoter sequence in wild barley (Hordeum vulgare ssp. spontaneum)

PubMed Central

2012-01-01

Background The cuticle is an important adaptive structure whose origin played a crucial role in the transition of plants from aqueous to terrestrial conditions. HvABCG31/Eibi1 is an ABCG transporter gene, involved in cuticle formation that was recently identified in wild barley (Hordeum vulgare ssp. spontaneum). To study the genetic variation of HvABCG31 in different habitats, its 2 kb promoter region was sequenced from 112 wild barley accessions collected from five natural populations from southern and northern Israel. The sites included three mesic and two xeric habitats, and differed in annual rainfall, soil type, and soil water capacity. Results Phylogenetic analysis of the aligned HvABCG31 promoter sequences clustered the majority of accessions (69 out of 71) from the three northern mesic populations into one cluster, while all 21 accessions from the Dead Sea area, a xeric southern population, and two isolated accessions (one from a xeric population at Mitzpe Ramon and one from the xeric ‘African Slope’ of “Evolution Canyon”) formed the second cluster. The southern arid populations included six haplotypes, but they differed from the consensus sequence at a large number of positions, while the northern mesic populations included 15 haplotypes that were, on average, more similar to the consensus sequence. Most of the haplotypes (20 of 22) were unique to a population. Interestingly, higher genetic variation occurred within populations (54.2%) than among populations (45.8%). Analysis of the promoter region detected a large number of transcription factor binding sites: 121–128 and 121–134 sites in the two southern arid populations, and 123–128,125–128, and 123–125 sites in the three northern mesic populations. Three types of TFBSs were significantly enriched: those related to GA (gibberellin), Dof (DNA binding with one finger), and light. Conclusions Drought stress and adaptive natural selection may have been important determinants in the observed sequence variation of HvABCG31 promoter. Abiotic stresses may be involved in the HvABCG31 gene transcription regulations, generating more protective cuticles in plants under stresses. PMID:23006777
Genetic Diversity of Hepatitis A Virus in China: VP3-VP1-2A Genes and Evidence of Quasispecies Distribution in the Isolates

PubMed Central

Cao, Jingyuan; Zhou, Wenting; Yi, Yao; Jia, Zhiyuan; Bi, Shengli

2013-01-01

Hepatitis A virus (HAV) is the most common cause of infectious hepatitis throughout the world, spread largely by the fecal-oral route. To characterize the genetic diversity of the virus circulating in China where HAV in endemic, we selected the outbreak cases with identical sequences in VP1-2A junction region and compiled a panel of 42 isolates. The VP3-VP1-2A regions of the HAV capsid-coding genes were further sequenced and analyzed. The quasispecies distribution was evaluated by cloning the VP3 and VP1-2A genes in three clinical samples. Phylogenetic analysis demonstrated that the same genotyping results could be obtained whether using the complete VP3, VP1, or partial VP1-2A genes for analysis in this study, although some differences did exist. Most isolates clustered in sub-genotype IA, and fewer in sub-genotype IB. No amino acid mutations were found at the published neutralizing epitope sites, however, several unique amino acid substitutions in the VP3 or VP1 region were identified, with two amino acid variants closely located to the immunodominant site. Quasispecies analysis showed the mutation frequencies were in the range of 7.22x10-4 -2.33x10-3 substitutions per nucleotide for VP3, VP1, or VP1-2A. When compared with the consensus sequences, mutated nucleotide sites represented the minority of all the analyzed sequences sites. HAV replicated as a complex distribution of closely genetically related variants referred to as quasispecies, and were under negative selection. The results indicate that diverse HAV strains and quasispecies inside the viral populations are presented in China, with unique amino acid substitutions detected close to the immunodominant site, and that the possibility of antigenic escaping mutants cannot be ruled out and needs to be further analyzed. PMID:24069343
Phylogenetic relatedness determined between antibiotic resistance and 16S rRNA genes in actinobacteria.

PubMed

Sagova-Mareckova, Marketa; Ulanova, Dana; Sanderova, Petra; Omelka, Marek; Kamenik, Zdenek; Olsovska, Jana; Kopecky, Jan

2015-04-01

Distribution and evolutionary history of resistance genes in environmental actinobacteria provide information on intensity of antibiosis and evolution of specific secondary metabolic pathways at a given site. To this day, actinobacteria producing biologically active compounds were isolated mostly from soil but only a limited range of soil environments were commonly sampled. Consequently, soil remains an unexplored environment in search for novel producers and related evolutionary questions. Ninety actinobacteria strains isolated at contrasting soil sites were characterized phylogenetically by 16S rRNA gene, for presence of erm and ABC transporter resistance genes and antibiotic production. An analogous analysis was performed in silico with 246 and 31 strains from Integrated Microbial Genomes (JGI_IMG) database selected by the presence of ABC transporter genes and erm genes, respectively. In the isolates, distances of erm gene sequences were significantly correlated to phylogenetic distances based on 16S rRNA genes, while ABC transporter gene distances were not. The phylogenetic distance of isolates was significantly correlated to soil pH and organic matter content of isolation sites. In the analysis of JGI_IMG datasets the correlation between phylogeny of resistance genes and the strain phylogeny based on 16S rRNA genes or five housekeeping genes was observed for both the erm genes and ABC transporter genes in both actinobacteria and streptomycetes. However, in the analysis of sequences from genomes where both resistance genes occurred together the correlation was observed for both ABC transporter and erm genes in actinobacteria but in streptomycetes only in the erm gene. The type of erm resistance gene sequences was influenced by linkage to 16S rRNA gene sequences and site characteristics. The phylogeny of ABC transporter gene was correlated to 16S rRNA genes mainly above the genus level. The results support the concept of new specific secondary metabolite scaffolds occurring more likely in taxonomically distant producers but suggest that the antibiotic selection of gene pools is also influenced by site conditions.
The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solovyev, V.V.; Salamov, A.A.; Lawrence, C.B.

1994-12-31

Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF codingmore » potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.« less
Phylogeny and population dynamics of respiratory syncytial virus (Rsv) A and B.

PubMed

Martinelli, Marianna; Frati, Elena Rosanna; Zappa, Alessandra; Ebranati, Erika; Bianchi, Silvia; Pariani, Elena; Amendola, Antonella; Zehender, Gianguglielmo; Tanzi, Elisabetta

2014-08-30

Respiratory syncytial virus (RSV) is a major cause of lower respiratory tract infections in infants and young children. RSV is characterised by high variability, especially in the G glycoprotein, which may play a significant role in RSV pathogenicity by allowing immune evasion. To reconstruct the origin and phylodynamic history of RSV, we evaluated the genetic diversity and evolutionary dynamics of RSV A and RSV B isolated from children under 3 years old infected in Italy from 2006 to 2012. Phylogenetic analysis revealed that most of the RSV A sequences clustered with the NA1 genotype, and RSV B sequences were included in the Buenos Aires genotype. The mean evolutionary rates for RSV A and RSV B were estimated to be 2.1 × 10(-3) substitutions (subs)/site/year and 3.03 × 10(-3) subs/site/year, respectively. The time of most recent common ancestor for the tree root went back to the 1940s (95% highest posterior density-HPD: 1927-1951) for RSV A and the 1950s (95%HPD: 1951-1960) for RSV B. The RSV A Bayesian skyline plot (BSP) showed a decrease in transmission events ending in about 2005, when a sharp growth restored the original viral population size. RSV B BSP showed a similar trend. Site-specific selection analysis identified 10 codons under positive selection in RSV A sequences and only one site in RSV B sequences. Although RSV remains difficult to control due to its antigenic diversity, it is important to monitor changes in its coding sequences, to permit the identification of future epidemic strains and to implement vaccine and therapy strategies. Copyright © 2014 Elsevier B.V. All rights reserved.
MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods

PubMed Central

Tamura, Koichiro; Peterson, Daniel; Peterson, Nicholas; Stecher, Glen; Nei, Masatoshi; Kumar, Sudhir

2011-01-01

Comparative analysis of molecular sequence data is essential for reconstructing the evolutionary histories of species and inferring the nature and extent of selective forces shaping the evolution of genes and species. Here, we announce the release of Molecular Evolutionary Genetics Analysis version 5 (MEGA5), which is a user-friendly software for mining online databases, building sequence alignments and phylogenetic trees, and using methods of evolutionary bioinformatics in basic biology, biomedicine, and evolution. The newest addition in MEGA5 is a collection of maximum likelihood (ML) analyses for inferring evolutionary trees, selecting best-fit substitution models (nucleotide or amino acid), inferring ancestral states and sequences (along with probabilities), and estimating evolutionary rates site-by-site. In computer simulation analyses, ML tree inference algorithms in MEGA5 compared favorably with other software packages in terms of computational efficiency and the accuracy of the estimates of phylogenetic trees, substitution parameters, and rate variation among sites. The MEGA user interface has now been enhanced to be activity driven to make it easier for the use of both beginners and experienced scientists. This version of MEGA is intended for the Windows platform, and it has been configured for effective use on Mac OS X and Linux desktops. It is available free of charge from http://www.megasoftware.net. PMID:21546353
Sieve analysis of breakthrough HIV-1 sequences in HVTN 505 identifies vaccine pressure targeting the CD4 binding site of Env-gp120.

PubMed

deCamp, Allan C; Rolland, Morgane; Edlefsen, Paul T; Sanders-Buell, Eric; Hall, Breana; Magaret, Craig A; Fiore-Gartland, Andrew J; Juraska, Michal; Carpp, Lindsay N; Karuna, Shelly T; Bose, Meera; LePore, Steven; Miller, Shana; O'Sullivan, Annemarie; Poltavee, Kultida; Bai, Hongjun; Dommaraju, Kalpana; Zhao, Hong; Wong, Kim; Chen, Lennie; Ahmed, Hasan; Goodman, Derrick; Tay, Matthew Z; Gottardo, Raphael; Koup, Richard A; Bailer, Robert; Mascola, John R; Graham, Barney S; Roederer, Mario; O'Connell, Robert J; Michael, Nelson L; Robb, Merlin L; Adams, Elizabeth; D'Souza, Patricia; Kublin, James; Corey, Lawrence; Geraghty, Daniel E; Frahm, Nicole; Tomaras, Georgia D; McElrath, M Juliana; Frenkel, Lisa; Styrchak, Sheila; Tovanabutra, Sodsai; Sobieszczyk, Magdalena E; Hammer, Scott M; Kim, Jerome H; Mullins, James I; Gilbert, Peter B

2017-01-01

Although the HVTN 505 DNA/recombinant adenovirus type 5 vector HIV-1 vaccine trial showed no overall efficacy, analysis of breakthrough HIV-1 sequences in participants can help determine whether vaccine-induced immune responses impacted viruses that caused infection. We analyzed 480 HIV-1 genomes sampled from 27 vaccine and 20 placebo recipients and found that intra-host HIV-1 diversity was significantly lower in vaccine recipients (P ≤ 0.04, Q-values ≤ 0.09) in Gag, Pol, Vif and envelope glycoprotein gp120 (Env-gp120). Furthermore, Env-gp120 sequences from vaccine recipients were significantly more distant from the subtype B vaccine insert than sequences from placebo recipients (P = 0.01, Q-value = 0.12). These vaccine effects were associated with signatures mapping to CD4 binding site and CD4-induced monoclonal antibody footprints. These results suggest either (i) no vaccine efficacy to block acquisition of any viral genotype but vaccine-accelerated Env evolution post-acquisition; or (ii) vaccine efficacy against HIV-1s with Env sequences closest to the vaccine insert combined with increased acquisition due to other factors, potentially including the vaccine vector.
Sieve analysis of breakthrough HIV-1 sequences in HVTN 505 identifies vaccine pressure targeting the CD4 binding site of Env-gp120

PubMed Central

Edlefsen, Paul T.; Sanders-Buell, Eric; Hall, Breana; Magaret, Craig A.; Fiore-Gartland, Andrew J.; Juraska, Michal; Carpp, Lindsay N.; Karuna, Shelly T.; Bose, Meera; LePore, Steven; Miller, Shana; O'Sullivan, Annemarie; Poltavee, Kultida; Bai, Hongjun; Dommaraju, Kalpana; Zhao, Hong; Wong, Kim; Chen, Lennie; Ahmed, Hasan; Goodman, Derrick; Tay, Matthew Z.; Gottardo, Raphael; Koup, Richard A.; Bailer, Robert; Mascola, John R.; Graham, Barney S.; Roederer, Mario; O’Connell, Robert J.; Michael, Nelson L.; Robb, Merlin L.; Adams, Elizabeth; D’Souza, Patricia; Kublin, James; Corey, Lawrence; Geraghty, Daniel E.; Frahm, Nicole; Tomaras, Georgia D.; McElrath, M. Juliana; Frenkel, Lisa; Styrchak, Sheila; Tovanabutra, Sodsai; Sobieszczyk, Magdalena E.; Hammer, Scott M.; Kim, Jerome H.; Mullins, James I.; Gilbert, Peter B.

2017-01-01

Although the HVTN 505 DNA/recombinant adenovirus type 5 vector HIV-1 vaccine trial showed no overall efficacy, analysis of breakthrough HIV-1 sequences in participants can help determine whether vaccine-induced immune responses impacted viruses that caused infection. We analyzed 480 HIV-1 genomes sampled from 27 vaccine and 20 placebo recipients and found that intra-host HIV-1 diversity was significantly lower in vaccine recipients (P ≤ 0.04, Q-values ≤ 0.09) in Gag, Pol, Vif and envelope glycoprotein gp120 (Env-gp120). Furthermore, Env-gp120 sequences from vaccine recipients were significantly more distant from the subtype B vaccine insert than sequences from placebo recipients (P = 0.01, Q-value = 0.12). These vaccine effects were associated with signatures mapping to CD4 binding site and CD4-induced monoclonal antibody footprints. These results suggest either (i) no vaccine efficacy to block acquisition of any viral genotype but vaccine-accelerated Env evolution post-acquisition; or (ii) vaccine efficacy against HIV-1s with Env sequences closest to the vaccine insert combined with increased acquisition due to other factors, potentially including the vaccine vector. PMID:29149197
Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

PubMed

Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

2015-09-01

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

PubMed Central

Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

2015-01-01

The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648
Molecular cloning, overexpression, purification, and sequence analysis of the giant panda (Ailuropoda melanoleuca) ferritin light polypeptide.

PubMed

Fu, L; Hou, Y L; Ding, X; Du, Y J; Zhu, H Q; Zhang, N; Hou, W R

2016-08-30

The complementary DNA (cDNA) of the giant panda (Ailuropoda melanoleuca) ferritin light polypeptide (FTL) gene was successfully cloned using reverse transcription-polymerase chain reaction technology. We constructed a recombinant expression vector containing FTL cDNA and overexpressed it in Escherichia coli using pET28a plasmids. The expressed protein was then purified by nickel chelate affinity chromatography. The cloned cDNA fragment was 580 bp long and contained an open reading frame of 525 bp. The deduced protein sequence was composed of 175 amino acids and had an estimated molecular weight of 19.90 kDa, with an isoelectric point of 5.53. Topology prediction revealed one N-glycosylation site, two casein kinase II phosphorylation sites, one N-myristoylation site, two protein kinase C phosphorylation sites, and one cell attachment sequence. Alignment indicated that the nucleotide and deduced amino acid sequences are highly conserved across several mammals, including Homo sapiens, Cavia porcellus, Equus caballus, and Felis catus, among others. The FTL gene was readily expressed in E. coli, which gave rise to the accumulation of a polypeptide of the expected size (25.50 kDa, including an N-terminal polyhistidine tag).
Identification and positional distribution analysis of transcription factor binding sites for genes from the wheat fl-cDNA sequences.

PubMed

Chen, Zhen-Yong; Guo, Xiao-Jiang; Chen, Zhong-Xu; Chen, Wei-Ying; Wang, Ji-Rui

2017-06-01

The binding sites of transcription factors (TFs) in upstream DNA regions are called transcription factor binding sites (TFBSs). TFBSs are important elements for regulating gene expression. To date, there have been few studies on the profiles of TFBSs in plants. In total, 4,873 sequences with 5' upstream regions from 8530 wheat fl-cDNA sequences were used to predict TFBSs. We found 4572 TFBSs for the MADS TF family, which was twice as many as for bHLH (1951), B3 (1951), HB superfamily (1914), ERF (1820), and AP2/ERF (1725) TFs, and was approximately four times higher than the remaining TFBS types. The percentage of TFBSs and TF members showed a distinct distribution in different tissues. Overall, the distribution of TFBSs in the upstream regions of wheat fl-cDNA sequences had significant difference. Meanwhile, high frequencies of some types of TFBSs were found in specific regions in the upstream sequences. Both TFs and fl-cDNA with TFBSs predicted in the same tissues exhibited specific distribution preferences for regulating gene expression. The tissue-specific analysis of TFs and fl-cDNA with TFBSs provides useful information for functional research, and can be used to identify relationships between tissue-specific TFs and fl-cDNA with TFBSs. Moreover, the positional distribution of TFBSs indicates that some types of wheat TFBS have different positional distribution preferences in the upstream regions of genes.

Precise determination, cross-recognition, and functional analysis of the double-strand origins of the rolling-circle replication plasmids in haloarchaea.

PubMed

Zhou, Ligang; Zhou, Meixian; Sun, Chaomin; Han, Jing; Lu, Qiuhe; Zhou, Jian; Xiang, Hua

2008-08-01

The precise nick site in the double-strand origin (DSO) of pZMX201, a 1,668-bp rolling-circle replication (RCR) plasmid from the haloarchaeon Natrinema sp. CX2021, was determined by electron microscopy and DSO mapping. In this plasmid, DSO nicking occurred between residues C404 and G405 within a heptanucleotide sequence (TCTC/GGC) located in the stem region of an imperfect hairpin structure. This nick site sequence was conserved among the haloarchaeal RCR plasmids, including pNB101, suggesting that the DSO nick site might be the same for all members of this plasmid family. Interestingly, the DSOs of pZMX201 and pNB101 were found to be cross-recognized in RCR initiation and termination in a hybrid plasmid system. Mutation analysis of the DSO from pZMX201 (DSO(Z)) in this hybrid plasmid system revealed that: (i) the nucleotides in the middle of the conserved TCTCGGC sequence play more-important roles in the initiation and termination process; (ii) the left half of the hairpin structure is required for initiation but not for termination; and (iii) a 36-bp sequence containing TCTCGGC and the downstream sequence is essential and sufficient for termination. In conclusion, these haloarchaeal plasmids, with novel features that are different from the characteristics of both single-stranded DNA phages and bacterial RCR plasmids, might serve as a good model for studying the evolution of RCR replicons.
Application of Inter-Simple Sequence Repeat Markers in the Analysis of Populations of the Chagas Disease Vector Triatoma infestans (Hemiptera, Reduviidae)

PubMed Central

Pérez de Rosas, Alicia R.; Restelli, María F.; Fernández, Cintia J.; Blariza, María J.; García, Beatriz A.

2017-01-01

Here we apply inter-simple sequence repeat (ISSR) markers to explore the fine-scale genetic structure and dispersal in populations of Triatoma infestans. Five selected primers from 30 primers were used to amplify ISSRs by polymerase chain reaction. A total of 90 polymorphic bands were detected across 134 individuals captured from 11 peridomestic sites from the locality of San Martín (Capayán Department, Catamarca Province, Argentina). Significant levels of genetic differentiation suggest limited gene flow among sampling sites. Spatial autocorrelation analysis confirms that dispersal occurs on the scale of ∼469 m, suggesting that insecticide spraying should be extended at least within a radius of ∼500 m around the infested area. Moreover, Bayesian clustering algorithms indicated genetic exchange among different sites analyzed, supporting the hypothesis of an important role of peridomestic structures in the process of reinfestation. PMID:28115670
[Comparison of genotype characteristics between the circulating mumps virus strain in Beijing area and the vaccine strain].

PubMed

Chen, Meng; Zhang, Tie-gang; Chen, Li-juan; Wu, Jiang; Yang, Jie; Zhang, Wei

2009-11-01

To compare the genetic characteristics of mumps virus strain circulating in Beijing with vaccine strain and to preliminarily analysis the reasons of vaccine ineffectiveness. The following methods were used: Isolation and identification of the mumps virus which had been circulating in Beijing, immunization history analysis, SH gene sequence analysis and comparison genotype homology with reference strains and analysis of the key amino acid sites of HN variation. In 38 mumps cases that virus had been isolated from, another seven cases were IgM negative. In 2007 and 2008, the positive rates on virus isolation, RT-PCR and IgM-decreased significantly, while the cases with immunization history had an increase. Cases without histories of vaccination had both higher positive rates on virus isolation and IgM. Thirty-eight strains belonged to F genotype virus, but vaccine strain was A genotype. The circulating viruses showed 5.6% sequence divergence on SH gene nucleotide and 16.0% - 18.1% from vaccine strain. Conservative hydrophobic amino acids on SH protein of some Beijing strains had changed. For example, there were 6 strains, from No.8: L-->F. The circulating viruses showed 2.3% sequence divergence on HN protein amino acid sequences and 4.2% - 5.3% from vaccine strain. Amino acids sites, which deciding the ability of cross-neutralization of the Beijing strains and vaccine strains were different. At the 354 and 356 sites, all the Beijing strains were different from the vaccine strains. The N-glycosylation sites on HN of Beijing strains were also different from those on vaccine strains. Locations 464 - 466 appeared to be NCS on Beijing strain, but locations 464 - 466 were NCR on the vaccine strains. Another 18 unknown function amino acids sites of all Beijing strains were different from those on vaccine strains. In recent years, genotype F became the main genotype of circulating strains in Beijing without genotype variation, but larger difference was found between them. There was a big difference between SH and HN protein of Beijing strains and vaccine strain, which might explain the ineffectiveness of the vaccine.
Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies

PubMed Central

Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros

2015-01-01

Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths. PMID:26227865
Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.

PubMed

Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar

2011-12-01

Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.
Cloning, expression and phylogenetic analysis of Hemolin, from the Chinese oak silkmoth, Antheraea pernyi.

PubMed

Li, Wenli; Terenius, Olle; Hirai, Makoto; Nilsson, Anders S; Faye, Ingrid

2005-01-01

The Chinese oak silk moth Antheraea pernyi is an important silk producer. To understand microbial resistance of this moth, we cloned Hemolin, encoding a multifunctional immune protein belonging to the immunoglobulin superfamily, and examined the expression in gonads and fat body. The ApHemolin amino acid sequence was compared to other Hemolin sequences in order to predict functional sites. Several sites were conserved; among them a phosphate binding site, which according to 3D structure modelling does not appear in neuroglian, the phylogenetically closest related protein. In addition, two conserved KDG sequences in the C-C' loop of immunoglobulin domains 1 and 3, give rise to gamma-turns, which is a common motif in the C'-C'' loop of the hypervariable region L2 in vertebrate immunoglobulins. The comparisons also show variable regions of specific interest for future studies of hemolin and its interaction with microbial entities.
Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

NASA Astrophysics Data System (ADS)

Shi, Jinming

In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.
BIOCHEMICAL AND PHYLOGENETIC CHARACTERIZATION OF TWO NOVEL DEEP-SEA THERMOCOCCUS ISOLATES WITH POTENTIALLY BIOTECHNOLOGICAL APPLICATIONS

EPA Science Inventory

The partial 16S rDNA gene sequences of two thermophilic archaeal strains, TY and TYS, previously isolated from the Guaymas Basin hydrothermal vent site were determined. Lipid analyses and a comparative analysis performed with 16S rDNA sequences of similar thermophilic species sho...
Reference genotype and exome data from an Australian Aboriginal population for health-based research

PubMed Central

Tang, Dave; Anderson, Denise; Francis, Richard W.; Syn, Genevieve; Jamieson, Sarra E.; Lassmann, Timo; Blackwell, Jenefer M.

2016-01-01

Genetic analyses, including genome-wide association studies and whole exome sequencing (WES), provide powerful tools for the analysis of complex and rare genetic diseases. To date there are no reference data for Aboriginal Australians to underpin the translation of health-based genomic research. Here we provide a catalogue of variants called after sequencing the exomes of 72 Aboriginal individuals to a depth of 20X coverage in ∼80% of the sequenced nucleotides. We determined 320,976 single nucleotide variants (SNVs) and 47,313 insertions/deletions using the Genome Analysis Toolkit. We had previously genotyped a subset of the Aboriginal individuals (70/72) using the Illumina Omni2.5 BeadChip platform and found ~99% concordance at overlapping sites, which suggests high quality genotyping. Finally, we compared our SNVs to six publicly available variant databases, such as dbSNP and the Exome Sequencing Project, and 70,115 of our SNVs did not overlap any of the single nucleotide polymorphic sites in all the databases. Our data set provides a useful reference point for genomic studies on Aboriginal Australians. PMID:27070114
Reference genotype and exome data from an Australian Aboriginal population for health-based research.

PubMed

Tang, Dave; Anderson, Denise; Francis, Richard W; Syn, Genevieve; Jamieson, Sarra E; Lassmann, Timo; Blackwell, Jenefer M

2016-04-12

Genetic analyses, including genome-wide association studies and whole exome sequencing (WES), provide powerful tools for the analysis of complex and rare genetic diseases. To date there are no reference data for Aboriginal Australians to underpin the translation of health-based genomic research. Here we provide a catalogue of variants called after sequencing the exomes of 72 Aboriginal individuals to a depth of 20X coverage in ∼80% of the sequenced nucleotides. We determined 320,976 single nucleotide variants (SNVs) and 47,313 insertions/deletions using the Genome Analysis Toolkit. We had previously genotyped a subset of the Aboriginal individuals (70/72) using the Illumina Omni2.5 BeadChip platform and found ~99% concordance at overlapping sites, which suggests high quality genotyping. Finally, we compared our SNVs to six publicly available variant databases, such as dbSNP and the Exome Sequencing Project, and 70,115 of our SNVs did not overlap any of the single nucleotide polymorphic sites in all the databases. Our data set provides a useful reference point for genomic studies on Aboriginal Australians.
LEDGF/p75 interacts with mRNA splicing factors and targets HIV-1 integration to highly spliced genes

PubMed Central

Singh, Parmit Kumar; Plumb, Matthew R.; Ferris, Andrea L.; Iben, James R.; Wu, Xiaolin; Fadel, Hind J.; Luke, Brian T.; Esnault, Caroline; Poeschla, Eric M.; Hughes, Stephen H.; Kvaratskhelia, Mamuka; Levin, Henry L.

2015-01-01

The host chromatin-binding factor LEDGF/p75 interacts with HIV-1 integrase and directs integration to active transcription units. To understand how LEDGF/p75 recognizes transcription units, we sequenced 1 million HIV-1 integration sites isolated from cultured HEK293T cells. Analysis of integration sites showed that cancer genes were preferentially targeted, raising concerns about using lentivirus vectors for gene therapy. Additional analysis led to the discovery that introns and alternative splicing contributed significantly to integration site selection. These correlations were independent of transcription levels, size of transcription units, and length of the introns. Multivariate analysis with five parameters previously found to predict integration sites showed that intron density is the strongest predictor of integration density in transcription units. Analysis of previously published HIV-1 integration site data showed that integration density in transcription units in mouse embryonic fibroblasts also correlated strongly with intron number, and this correlation was absent in cells lacking LEDGF. Affinity purification showed that LEDGF/p75 is associated with a number of splicing factors, and RNA sequencing (RNA-seq) analysis of HEK293T cells lacking LEDGF/p75 or the LEDGF/p75 integrase-binding domain (IBD) showed that LEDGF/p75 contributes to splicing patterns in half of the transcription units that have alternative isoforms. Thus, LEDGF/p75 interacts with splicing factors, contributes to exon choice, and directs HIV-1 integration to transcription units that are highly spliced. PMID:26545813
Assessment of In-Situ Reductive Dechlorination Using Compound-Specific Stable Isotopes, Functional-Gene Pcr, and Geochemical Data

PubMed Central

Carreón-Diazconti, Concepción; Santamaría, Johanna; Berkompas, Justin; Field, James A.; Brusseau, Mark L.

2010-01-01

Isotopic analysis and molecular-based bioassay methods were used in conjunction with geochemical data to assess intrinsic reductive dechlorination processes for a chlorinated-solvent contaminated site in Tucson, Arizona. Groundwater samples were obtained from monitoring wells within a contaminant plume comprising tetrachloroethene and its metabolites trichloroethene, cis-1,2-dichloroethene, vinyl chloride, and ethene, as well as compounds associated with free-phase diesel present at the site. Compound specific isotope (CSI) analysis was performed to characterize biotransformation processes influencing the transport and fate of the chlorinated contaminants. PCR analysis was used to assess the presence of indigenous reductive dechlorinators. The target regions employed were the 16s rRNA gene sequences of Dehalococcoides sp. and Desulfuromonas sp., and DNA sequences of genes pceA, tceA, bvcA, and vcrA, which encode reductive dehalogenases. The results of the analyses indicate that relevant microbial populations are present and that reductive dechlorination is presently occurring at the site. The results further show that potential degrader populations as well as biotransformation activity is non-uniformly distributed within the site. The results of laboratory microcosm studies conducted using groundwater collected from the field site confirmed the reductive dechlorination of tetrachloroethene to dichloroethene. This study illustrates the use of an integrated, multiple-method approach for assessing natural attenuation at a complex chlorinated-solvent contaminated site. PMID:19603638
Phylogenomics of Phrynosomatid Lizards: Conflicting Signals from Sequence Capture versus Restriction Site Associated DNA Sequencing

PubMed Central

Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.

2015-01-01

Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487
Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants.

PubMed

Eaton, Deren A R; Spriggs, Elizabeth L; Park, Brian; Donoghue, Michael J

2017-05-01

Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.]. © The authors 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For permissions, please e-mail: journals.permission@oup.com.
An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins.

PubMed

Harper, Angela F; Leuthaeuser, Janelle B; Babbitt, Patricia C; Morris, John H; Ferrin, Thomas E; Poole, Leslie B; Fetrow, Jacquelyn S

2017-02-01

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins

PubMed Central

Babbitt, Patricia C.; Ferrin, Thomas E.

2017-01-01

Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially—MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method’s novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences. PMID:28187133
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
[Genetic analysis of two children patients affected with CHARGE syndrome].

PubMed

Li, Guoqiang; Li, Niu; Xu, Yufei; Li, Juan; Ding, Yu; Shen, Yiping; Wang, Xiumin; Wang, Jian

2018-04-10

To analyze two Chinese pediatric patients with multiple malformations and growth and development delay. Both patients were subjected to targeted gene sequencing, and the results were analyzed with Ingenuity Variant Analysis software. Suspected pathogenic variations were verified by Sanger sequencing. High-throughput sequencing showed that both patients have carried heterozygous variants of the CHD7 gene. Patient 1 carried a nonsense mutation in exon 36 (c.7957C>T, p.Arg2653*), while patient 2 carried a nonsense mutation of exon 2 (c.718C>T, p.Gln240*). Sanger sequencing confirmed the above mutations in both patients, while their parents were of wild-type for the corresponding sites, indicating that the two mutations have happened de novo. Two patients were diagnosed with CHARGE syndrome by high-throughput sequencing.
Repair of DNA double-strand breaks by templated nucleotide sequence insertions derived from distant regions of the genome.

PubMed

Onozawa, Masahiro; Zhang, Zhenhua; Kim, Yoo Jung; Goldberg, Liat; Varga, Tamas; Bergsagel, P Leif; Kuehl, W Michael; Aplan, Peter D

2014-05-27

We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed "templated sequence insertions" (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE PAGES

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...

2016-04-12

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less

CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
The complete sequence and structural analysis of human apolipoprotein B-100: relationship between apoB-100 and apoB-48 forms.

PubMed Central

Cladaras, C; Hadzopoulou-Cladaras, M; Nolte, R T; Atkinson, D; Zannis, V I

1986-01-01

We have isolated and sequenced overlapping cDNA clones covering the entire sequence of human apolipoprotein B-100 (apoB-100). DNA sequence analysis and determination of the mRNA transcription initiation site by S1 nuclease mapping showed that the apoB mRNA consists of 14,112 nucleotides including the 5' and 3' untranslated regions which are 128 and 301 nucleotides respectively. The DNA-derived protein sequence shows that apoB-100 is 513,000 daltons and contains 4560 amino acids including a 24-amino-acid-long signal peptide. The mol. wt of apoB-100 implies that there is one apoB molecule per LDL particle. Computer analysis of the predicted secondary structure of the protein showed that some of the potential alpha helical and beta sheet structures are amphipathic, whereas others have non-amphipathic neutral to apolar character. These latter regions may contribute to the formation of the lipid-binding domains of apoB-100. The protein contains 25 cysteines and 20 potential N-glycosylation sites. The majority of cysteines are distributed in the amino terminal portion of the protein. Four of the potential glycosylation sites are in predicted beta turn structures and may represent true glycosylation positions. ApoB lacks the tandem repeats which are characteristic of other apolipoproteins. The mean hydrophobicity the mean value of H1 and helical hydrophobic moment the mean value of microH profiles of apoB showed the presence of several potential helical regions with strong polar character and high hydrophobic moment. The region with the highest hydrophobic moment, between amino acid residues 3352 and 3369, contains five closely spaced, positively charged residues, and has sequence homology to the LDL receptor binding site of apoE. This region is flanked by three neighbouring regions with positively charged amino acids and high hydrophobic moment that are located between residues 3174 and 3681. One or more of these closely spaced apoB sequences may be involved in the formation of the LDL receptor-binding domain of apoB-100. Blotting analysis of intestinal RNA and hybridization of the blots with carboxy apoB cDNA probes produced a single 15-kb hybridization band whereas hybridization with amino terminal probes produced two hybridization bands of 15 and 8 kb. Our data indicate that both forms of apoB mRNA contain common sequences which extend from the amino terminal of apoB-100 to the vicinity of nucleotide residue 6300. These two messages may have resulted from differential splicing of the same primary apoB mRNA transcript. Images Fig. 4. Fig. 6. PMID:3030729
An approach to functionally relevant clustering of the protein universe: Active site profile‐based clustering of protein structures and sequences

PubMed Central

Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.

2017-01-01

Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422
An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences.

PubMed

Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S

2017-04-01

Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
A KCNH2 branch point mutation causing aberrant splicing contributes to an explanation of genotype-negative long QT syndrome.

PubMed

Crotti, Lia; Lewandowska, Marzena A; Schwartz, Peter J; Insolia, Roberto; Pedrazzini, Matteo; Bussani, Erica; Dagradi, Federica; George, Alfred L; Pagani, Franco

2009-02-01

Genetic screening of long QT syndrome (LQTS) fails to identify disease-causing mutations in about 30% of patients. So far, molecular screening has focused mainly on coding sequence mutations or on substitutions at canonical splice sites. The purpose of this study was to explore the possibility that intronic variants not at canonical splice sites might affect splicing regulatory elements, lead to aberrant transcripts, and cause LQTS. Molecular screening was performed through DHPLC and sequence analysis. The role of the intronic mutation identified was assessed with a hybrid minigene splicing assay. A three-generation LQTS family was investigated. Molecular screening failed to identify an obvious disease-causing mutation in the coding sequences of the major LQTS genes but revealed an intronic A-to-G substitution in KCNH2 (IVS9-28A/G) cosegregating with the clinical phenotype in family members. In vitro analysis proved that the mutation disrupts the acceptor splice site definition by affecting the branch point (BP) sequence and promoting intron retention. We further demonstrated a tight functional relationship between the BP and the polypyrimidine tract, whose weakness is responsible for the pathological effect of the IVS9-28A/G mutation. We identified a novel BP mutation in KCNH2 that disrupts the intron 9 acceptor splice site definition and causes LQT2. The present finding demonstrates that intronic mutations affecting pre-mRNA processing may contribute to the failure of traditional molecular screening in identifying disease-causing mutations in LQTS subjects and offers a rationale strategy for the reduction of genotype-negative cases.
Biodiversity of arbuscular mycorrhizal fungi in roots and soils of two salt marshes.

PubMed

Wilde, Petra; Manal, Astrid; Stodden, Marc; Sieverding, Ewald; Hildebrandt, Ulrich; Bothe, Hermann

2009-06-01

The occurrence of arbuscular mycorrhizal fungi (AMF) was assessed by both morphological and molecular criteria in two salt marshes: (i) a NaCl site of the island Terschelling, Atlantic Coast, the Netherlands and (ii) a K(2)CO(3) marsh at Schreyahn, Northern Germany. The overall biodiversity of AMF, based on sequence analysis, was comparably low in roots at both sites. However, the morphological spore analyses from soil samples of both sites exhibited a higher AMF biodiversity. Glomus geosporum was the only fungus of the Glomerales that was detected both as spores in soil samples and in roots of the AMF-colonized salt plants Aster tripolium and Puccinellia sp. at both saline sites and on all sampling dates (one exception). In roots, sequences of Glomus intraradices prevailed, but this fungus could not be identified unambiguously from DNA of soil spores. Likewise, Glomus sp. uncultured, only deposited as sequence in the database, was widely detected by DNA sequencing in root samples. All attempts to obtain the corresponding sequences from spores isolated from soil samples failed consistently. A small sized Archaeospora sp. was detected, either/or by morphological and molecular analyses, in roots or soil spores, in dead AMF spores or orobatid mites. The study noted inconsistencies between morphological characterization and identification by DNA sequencing of the 5.8S rDNA-ITS2 region or part of the 18S rDNA gene. The distribution of AMF unlikely followed the salt gradient at both sites, in contrast to the zone formation of plant species. Zygotes of the alga Vaucheria erythrospora (Xanthophyceae) were retrieved and should not be misidentified with AMF spores.
SSU rRNA-based phylogenetic position of the genera Amoeba and Chaos (Lobosea, Gymnamoebia): the origin of gymnamoebae revisited.

PubMed

Bolivar, I; Fahrni, J F; Smirnov, A; Pawlowski, J

2001-12-01

Naked lobose amoebae (gymnamoebae) are among the most abundant group of protists present in all aquatic and terrestrial biotopes. Yet, because of lack of informative morphological characters, the origin and evolutionary history of gymnamoebae are poorly known. The first molecular studies revealed multiple origins for the amoeboid lineages and an extraordinary diversity of amoebae species. Molecular data, however, exist only for a few species of the numerous taxa belonging to this group. Here, we present the small-subunit (SSU) rDNA sequences of four species of typical large gymnamoebae: Amoeba proteus, Amoeba leningradensis, Chaos nobile, and Chaos carolinense. Sequence analysis suggests that the four species are closely related to the species of genera Saccamoeba, Leptomyxa, Rhizamoeba, Paraflabellula, Hartmannella, and Echinamoeba. All of them form a relatively well-supported clade, which corresponds to the subclass Gymnamoebia, in agreement with morphology-based taxonomy. The other gymnamoebae cluster in small groups or branch separately. Their relationships change depending on the type of analysis and the model of nucleotide substitution. All gymnamoebae branch together in Neighbor-Joining analysis with corrections for among-site rate heterogeneity and proportion of invariable sites. This clade, however, is not statistically supported by SSU rRNA gene sequences and further analysis of protein sequence data will be necessary to test the monophyly of gymnamoebae.
Molecular analysis of immunoglobulin variable genes supports a germinal center experienced normal counterpart in primary cutaneous diffuse large B-cell lymphoma, leg-type.

PubMed

Pham-Ledard, Anne; Prochazkova-Carlotti, Martina; Deveza, Mélanie; Laforet, Marie-Pierre; Beylot-Barry, Marie; Vergier, Béatrice; Parrens, Marie; Feuillard, Jean; Merlio, Jean-Philippe; Gachard, Nathalie

2017-11-01

Immunophenotype of primary cutaneous diffuse large B-cell lymphoma, leg-type (PCLBCL-LT) suggests a germinal center-experienced B lymphocyte (BCL2+ MUM1+ BCL6+/-). As maturation history of B-cell is "imprinted" during B-cell development on the immunoglobulin gene sequence, we studied the structure and sequence of the variable part of the genes (IGHV, IGLV, IGKV), immunoglobulin surface expression and features of class switching in order to determine the PCLBCL-LT cell of origin. Clonality analysis with BIOMED2 protocol and VH leader primers was done on DNA extracted from frozen skin biopsies on retrospective samples from 14 patients. The clonal DNA IGHV sequence of the tumor was aligned and compared with the closest germline sequence and homology percentage was calculated. Superantigen binding sites were studied. Features of selection pressure were evaluated with the multinomial Lossos model. A functional monoclonal sequence was observed in 14 cases as determined for IGHV (10), IGLV (2) or IGKV (3). IGV mutation rates were high (>5%) in all cases but one (median:15.5%), with superantigen binding sites conservation. Features of selection pressure were identified in 11/12 interpretable cases, more frequently negative (75%) than positive (25%). Intraclonal variation was detected in 3 of 8 tumor specimens with a low rate of mutations. Surface immunoglobulin was an IgM in 12/12 cases. FISH analysis of IGHM locus, deleted during class switching, showed heterozygous IGHM gene deletion in half of cases. The genomic PCR analysis confirmed the deletions within the switch μ region. IGV sequences were highly mutated but functional, with negative features of selection pressure suggesting one or more germinal center passage(s) with somatic hypermutation, but superantigen (SpA) binding sites conservation. Genetic features of class switch were observed, but on the non functional allele and co-existing with primary isotype IgM expression. These data suggest that cell-of origin is germinal center experienced and superantigen driven selected B-cell, in a stage between germinal center B-cell and plasma cell. Copyright © 2017 Japanese Society for Investigative Dermatology. Published by Elsevier B.V. All rights reserved.
novPTMenzy: a database for enzymes involved in novel post-translational modifications

PubMed Central

Khater, Shradha; Mohanty, Debasisa

2015-01-01

With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459
Promoter classifier: software package for promoter database analysis.

PubMed

Gershenzon, Naum I; Ioshikhes, Ilya P

2005-01-01

Promoter Classifier is a package of seven stand-alone Windows-based C++ programs allowing the following basic manipulations with a set of promoter sequences: (i) calculation of positional distributions of nucleotides averaged over all promoters of the dataset; (ii) calculation of the averaged occurrence frequencies of the transcription factor binding sites and their combinations; (iii) division of the dataset into subsets of sequences containing or lacking certain promoter elements or combinations; (iv) extraction of the promoter subsets containing or lacking CpG islands around the transcription start site; and (v) calculation of spatial distributions of the promoter DNA stacking energy and bending stiffness. All programs have a user-friendly interface and provide the results in a convenient graphical form. The Promoter Classifier package is an effective tool for various basic manipulations with eukaryotic promoter sequences that usually are necessary for analysis of large promoter datasets. The program Promoter Divider is described in more detail as a representative component of the package.
PredictProtein—an open resource for online prediction of protein structural and functional features

PubMed Central

Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

2014-01-01

PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
Alteration of gene expression in human hepatocellular carcinoma with integrated hepatitis B virus DNA.

PubMed

Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei

2005-08-15

Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
Solving the problem of comparing whole bacterial genomes across different sequencing platforms.

PubMed

Kaas, Rolf S; Leekitcharoenphon, Pimlapas; Aarestrup, Frank M; Lund, Ole

2014-01-01

Whole genome sequencing (WGS) shows great potential for real-time monitoring and identification of infectious disease outbreaks. However, rapid and reliable comparison of data generated in multiple laboratories and using multiple technologies is essential. So far studies have focused on using one technology because each technology has a systematic bias making integration of data generated from different platforms difficult. We developed two different procedures for identifying variable sites and inferring phylogenies in WGS data across multiple platforms. The methods were evaluated on three bacterial data sets and sequenced on three different platforms (Illumina, 454, Ion Torrent). We show that the methods are able to overcome the systematic biases caused by the sequencers and infer the expected phylogenies. It is concluded that the cause of the success of these new procedures is due to a validation of all informative sites that are included in the analysis. The procedures are available as web tools.
Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tuskan, Gerald A; Gunter, Lee E; DiFazio, Stephen P

The 18S-28S rDNA and 5S rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 18S-28S rDNA sites and one 5S rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis -type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones selected from 2 linkage groups based on genome sequence assembly (LG-I and LG-VI) were localized on 2 chromosomes, as expected. BACs from LG-I hybridized to the longest chromosome in the complement. All BAC positions were found to be concordant with sequencemore » assembly positions. BAC-FISH will be useful for delineating each of the Populus trichocarpa chromosomes and improving the sequence assembly of this model angiosperm tree species.« less
Recruiting Human Microbiome Shotgun Data to Site-Specific Reference Genomes

PubMed Central

Xie, Gary; Lo, Chien-Chi; Scholz, Matthew; Chain, Patrick S. G.

2014-01-01

The human body consists of innumerable multifaceted environments that predispose colonization by a number of distinct microbial communities, which play fundamental roles in human health and disease. In addition to community surveys and shotgun metagenomes that seek to explore the composition and diversity of these microbiomes, there are significant efforts to sequence reference microbial genomes from many body sites of healthy adults. To illustrate the utility of reference genomes when studying more complex metagenomes, we present a reference-based analysis of sequence reads generated from 55 shotgun metagenomes, selected from 5 major body sites, including 16 sub-sites. Interestingly, between 13% and 92% (62.3% average) of these shotgun reads were aligned to a then-complete list of 2780 reference genomes, including 1583 references for the human microbiome. However, no reference genome was universally found in all body sites. For any given metagenome, the body site-specific reference genomes, derived from the same body site as the sample, accounted for an average of 58.8% of the mapped reads. While different body sites did differ in abundant genera, proximal or symmetrical body sites were found to be most similar to one another. The extent of variation observed, both between individuals sampled within the same microenvironment, or at the same site within the same individual over time, calls into question comparative studies across individuals even if sampled at the same body site. This study illustrates the high utility of reference genomes and the need for further site-specific reference microbial genome sequencing, even within the already well-sampled human microbiome. PMID:24454771
A novel site-specific recombination system derived from bacteriophage phiMR11.

PubMed

Rashel, Mohammad; Uchiyama, Jumpei; Ujihara, Takako; Takemura, Iyo; Hoshiba, Hiroshi; Matsuzaki, Shigenobu

2008-04-04

We report identification of a novel site-specific DNA recombination system that functions in both in vivo and in vitro, derived from lysogenic Staphylococcus aureus phage phiMR11. In silico analysis of the phiMR11 genome indicated orf1 as a putative integrase gene. Phage and bacterial attachment sites (attP and attB, respectively) and attachment junctions were determined and their nucleotide sequences decoded. Sequences of attP and attB were mostly different to each other except for a two bp common core that was the crossover point. We found several inverted repeats adjacent to the core sequence of attP as potential protein binding sites. The precise and efficient integration properties of phiMR11 integrase were shown on attP and attB in Escherichia coli and the minimum size of attP was found to be 34bp. In in vitro assays using crude or purified integrase, only buffer and substrate DNAs were required for the recombination reaction, indicating that other bacterially encoded factors are not essential for activity.
Microevolution of symbiotic Bradyrhizobium populations associated with soybeans in east North America

PubMed Central

Tang, Jie; Bromfield, E S P; Rodrigue, N; Cloutier, S; Tambong, J T

2012-01-01

Microevolution and origins of Bradyrhizobium populations associated with soybeans at two field sites (A and B, 280 km apart in Canada) with contrasting histories of inoculation was investigated using probabilistic analyses of six core (housekeeping) gene sequences. These analyses supported division of 220 isolates in five lineages corresponding either to B. japonicum groups 1 and 1a or to one of three novel lineages within the genus Bradyrhizobium. None of the isolates from site A and about 20% from site B (the only site with a recent inoculation history) were attributed to inoculation sources. The data suggest that most isolates were of indigenous origin based on sequence analysis of 148 isolates of soybean-nodulating bacteria from native legumes (Amphicarpaea bracteata and Desmodium canadense). Isolates from D. canadense clustered with B. japonicum group 1, whereas those from A. bracteata were placed in two novel lineages encountered at soybean field sites. One of these novel lineages predominated at soybean sites and exhibited a significant clonal expansion likely reflecting selection by the plant host. Homologous recombination events detected in the 35 sequence types from soybean sites had an effect on genetic diversification that was approximately equal to mutation. Interlineage transfer of core genes was infrequent and mostly attributable to gyrB that had a history of frequent recombination. Symbiotic gene sequences (nodC and nifH) of isolates from soybean sites and native legumes clustered in two lineages corresponding to B. japonicum and B. elkani with the inheritance of these genes appearing predominantly by vertical transmission. The data suggest that soybean-nodulating bacteria associated with native legumes represent a novel source of ecologically adapted bacteria for soybean inoculation. PMID:23301163
Sequence variability of the respiratory syncytial virus (RSV) fusion gene among contemporary and historical genotypes of RSV/A and RSV/B

PubMed Central

Hause, Anne M.; Henke, David M.; Avadhanula, Vasanthi; Shaw, Chad A.; Tapia, Lorena I.

2017-01-01

Background The fusion (F) protein of RSV is the major vaccine target. This protein undergoes a conformational change from pre-fusion to post-fusion. Both conformations share antigenic sites II and IV. Pre-fusion F has unique antigenic sites p27, ø, α2α3β3β4, and MPE8; whereas, post-fusion F has unique antigenic site I. Our objective was to determine the antigenic variability for RSV/A and RSV/B isolates from contemporary and historical genotypes compared to a historical RSV/A strain. Methods The F sequences of isolates from GenBank, Houston, and Chile (N = 1,090) were used for this analysis. Sequences were compared pair-wise to a reference sequence, a historical RSV/A Long strain. Variability (calculated as %) was defined as changes at each amino acid (aa) position when compared to the reference sequence. Only aa at antigenic sites with variability ≥5% were reported. Results A total of 1,090 sequences (822 RSV/A and 268 RSV/B) were analyzed. When compared to the reference F, those domains with the greatest number of non-synonymous changes included the signal peptide, p27, heptad repeat domain 2, antigenic site ø, and the transmembrane domain. RSV/A subgroup had 7 aa changes in the antigenic sites: site I (N = 1), II (N = 1), p27 (N = 4), α2α3β3β4(AM14) (N = 1), ranging in frequency from 7–91%. In comparison, RSV/B had 19 aa changes in antigenic sites: I (N = 3), II (N = 1), p27 (N = 9), ø (N = 4), α2α3β3β4(AM14) (N = 1), and MPE8 (N = 1), ranging in frequency from 79–100%. Discussion Although antigenic sites of RSV F are generally well conserved, differences are observed when comparing the two subgroups to the reference RSV/A Long strain. Further, these discrepancies are accented in the antigenic sites in pre-fusion F of RSV/B isolates, often occurring with a frequency of 100%. This could be of importance if a monovalent F protein from the historical GA1 genotype of RSV/A is used for vaccine development. PMID:28414749
Sequence variability of the respiratory syncytial virus (RSV) fusion gene among contemporary and historical genotypes of RSV/A and RSV/B.

PubMed

Hause, Anne M; Henke, David M; Avadhanula, Vasanthi; Shaw, Chad A; Tapia, Lorena I; Piedra, Pedro A

2017-01-01

The fusion (F) protein of RSV is the major vaccine target. This protein undergoes a conformational change from pre-fusion to post-fusion. Both conformations share antigenic sites II and IV. Pre-fusion F has unique antigenic sites p27, ø, α2α3β3β4, and MPE8; whereas, post-fusion F has unique antigenic site I. Our objective was to determine the antigenic variability for RSV/A and RSV/B isolates from contemporary and historical genotypes compared to a historical RSV/A strain. The F sequences of isolates from GenBank, Houston, and Chile (N = 1,090) were used for this analysis. Sequences were compared pair-wise to a reference sequence, a historical RSV/A Long strain. Variability (calculated as %) was defined as changes at each amino acid (aa) position when compared to the reference sequence. Only aa at antigenic sites with variability ≥5% were reported. A total of 1,090 sequences (822 RSV/A and 268 RSV/B) were analyzed. When compared to the reference F, those domains with the greatest number of non-synonymous changes included the signal peptide, p27, heptad repeat domain 2, antigenic site ø, and the transmembrane domain. RSV/A subgroup had 7 aa changes in the antigenic sites: site I (N = 1), II (N = 1), p27 (N = 4), α2α3β3β4(AM14) (N = 1), ranging in frequency from 7-91%. In comparison, RSV/B had 19 aa changes in antigenic sites: I (N = 3), II (N = 1), p27 (N = 9), ø (N = 4), α2α3β3β4(AM14) (N = 1), and MPE8 (N = 1), ranging in frequency from 79-100%. Although antigenic sites of RSV F are generally well conserved, differences are observed when comparing the two subgroups to the reference RSV/A Long strain. Further, these discrepancies are accented in the antigenic sites in pre-fusion F of RSV/B isolates, often occurring with a frequency of 100%. This could be of importance if a monovalent F protein from the historical GA1 genotype of RSV/A is used for vaccine development.
In silico study of breast cancer associated gene 3 using LION Target Engine and other tools.

PubMed

León, Darryl A; Cànaves, Jaume M

2003-12-01

Sequence analysis of individual targets is an important step in annotation and validation. As a test case, we investigated human breast cancer associated gene 3 (BCA3) with LION Target Engine and with other bioinformatics tools. LION Target Engine confirmed that the BCA3 gene is located on 11p15.4 and that the two most likely splice variants (lacking exon 3 and exons 3 and 5, respectively) exist. Based on our manual curation of sequence data, it is proposed that an additional variant (missing only exon 5) published in a public sequence repository, is a prediction artifact. A significant number of new orthologs were also identified, and these were the basis for a high-quality protein secondary structure prediction. Moreover, our research confirmed several distinct functional domains as described in earlier reports. Sequence conservation from multiple sequence alignments, splice variant identification, secondary structure predictions, and predicted phosphorylation sites suggest that the removal of interaction sites through alternative splicing might play a modulatory role in BCA3. This in silico approach shows the depth and relevance of an analysis that can be accomplished by including a variety of publicly available tools with an integrated and customizable life science informatics platform.

Conserved antigenic sites between MERS-CoV and Bat-coronavirus are revealed through sequence analysis.

PubMed

Sharmin, Refat; Islam, Abul B M M K

2016-01-01

MERS-CoV is a newly emerged human coronavirus reported closely related with HKU4 and HKU5 Bat coronaviruses. Bat and MERS corona-viruses are structurally related. Therefore, it is of interest to estimate the degree of conserved antigenic sites among them. It is of importance to elucidate the shared antigenic-sites and extent of conservation between them to understand the evolutionary dynamics of MERS-CoV. Multiple sequence alignment of the spike (S), membrane (M), enveloped (E) and nucleocapsid (N) proteins was employed to identify the sequence conservation among MERS and Bat (HKU4, HKU5) coronaviruses. We used various in silico tools to predict the conserved antigenic sites. We found that MERS-CoV shared 30 % of its S protein antigenic sites with HKU4 and 70 % with HKU5 bat-CoV. Whereas 100 % of its E, M and N protein's antigenic sites are found to be conserved with those in HKU4 and HKU5. This sharing suggests that in case of pathogenicity MERS-CoV is more closely related to HKU5 bat-CoV than HKU4 bat-CoV. The conserved epitopes indicates their evolutionary relationship and ancestry of pathogenicity.
High-throughput detection of RNA processing in bacteria.

PubMed

Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L

2018-03-27

Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .
EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

PubMed Central

Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

2015-01-01

The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928
EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

PubMed

Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

2015-01-01

The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.
Characterization of a protein that binds multiple sequences in mammalian type C retrovirus enhancers.

PubMed Central

Sun, W; O'Connell, M; Speck, N A

1993-01-01

Mammalian type C retrovirus enhancer factor 1 (MCREF-1) is a nuclear protein that binds several directly repeated sequences (CNGGN6CNGG) in the Moloney and Friend murine leukemia virus (MLV) enhancers (N. R. Manley, M. O'Connell, W. Sun, N. A. Speck, and N. Hopkins, J. Virol. 67:1967-1975, 1993). In this paper, we describe the partial purification of MCREF-1 from calf thymus nuclei and further characterize the binding properties of MCREF-1. MCREF-1 binds four sites in the Moloney MLV enhancer and three sites in the Friend MLV enhancer. Ethylation interference analysis suggests that the MCREF-1 binding site spans two adjacent minor grooves of DNA. Images PMID:8445719
Cooperative DNA binding and sequence discrimination by the Opaque2 bZIP factor.

PubMed Central

Yunes, J A; Vettore, A L; da Silva, M J; Leite, A; Arruda, P

1998-01-01

The maize Opaque2 (O2) protein is a basic leucine zipper transcription factor that controls the expression of distinct classes of endosperm genes through the recognition of different cis-acting elements in their promoters. The O2 target region in the promoter of the alpha-coixin gene was analyzed in detail and shown to comprise two closely adjacent binding sites, named O2u and O2d, which are related in sequence to the GCN4 binding site. Quantitative DNase footprint analysis indicated that O2 binding to alpha-coixin target sites is best described by a cooperative model. Transient expression assays showed that the two adjacent sites act synergistically. This synergy is mediated in part by cooperative DNA binding. In tobacco protoplasts, O2 binding at the O2u site is more important for enhancer activity than is binding at the O2d site, suggesting that the architecture of the O2-DNA complex is important for interaction with the transcriptional machinery. PMID:9811800
Cooperative DNA binding and sequence discrimination by the Opaque2 bZIP factor.

PubMed

Yunes, J A; Vettore, A L; da Silva, M J; Leite, A; Arruda, P

1998-11-01

The maize Opaque2 (O2) protein is a basic leucine zipper transcription factor that controls the expression of distinct classes of endosperm genes through the recognition of different cis-acting elements in their promoters. The O2 target region in the promoter of the alpha-coixin gene was analyzed in detail and shown to comprise two closely adjacent binding sites, named O2u and O2d, which are related in sequence to the GCN4 binding site. Quantitative DNase footprint analysis indicated that O2 binding to alpha-coixin target sites is best described by a cooperative model. Transient expression assays showed that the two adjacent sites act synergistically. This synergy is mediated in part by cooperative DNA binding. In tobacco protoplasts, O2 binding at the O2u site is more important for enhancer activity than is binding at the O2d site, suggesting that the architecture of the O2-DNA complex is important for interaction with the transcriptional machinery.
Targeting of Repeated Sequences Unique to a Gene Results in Significant Increases in Antisense Oligonucleotide Potency

PubMed Central

Vickers, Timothy A.; Freier, Susan M.; Bui, Huynh-Hoa; Watt, Andrew; Crooke, Stanley T.

2014-01-01

A new strategy for identifying potent RNase H-dependent antisense oligonucleotides (ASOs) is presented. Our analysis of the human transcriptome revealed that a significant proportion of genes contain unique repeated sequences of 16 or more nucleotides in length. Activities of ASOs targeting these repeated sites in several representative genes were compared to those of ASOs targeting unique single sites in the same transcript. Antisense activity at repeated sites was also evaluated in a highly controlled minigene system. Targeting both native and minigene repeat sites resulted in significant increases in potency as compared to targeting of non-repeated sites. The increased potency at these sites is a result of increased frequency of ASO/RNA interactions which, in turn, increases the probability of a productive interaction between the ASO/RNA heteroduplex and human RNase H1 in the cell. These results suggest a new, highly efficient strategy for rapid identification of highly potent ASOs. PMID:25334092
Analysis of Multiallelic CNVs by Emulsion Haplotype Fusion PCR.

PubMed

Tyson, Jess; Armour, John A L

2017-01-01

Emulsion-fusion PCR recovers long-range sequence information by combining products in cis from individual genomic DNA molecules. Emulsion droplets act as very numerous small reaction chambers in which different PCR products from a single genomic DNA molecule are condensed into short joint products, to unite sequences in cis from widely separated genomic sites. These products can therefore provide information about the arrangement of sequences and variants at a larger scale than established long-read sequencing methods. The method has been useful in defining the phase of variants in haplotypes, the typing of inversions, and determining the configuration of sequence variants in multiallelic CNVs. In this description we outline the rationale for the application of emulsion-fusion PCR methods to the analysis of multiallelic CNVs, and give practical details for our own implementation of the method in that context.
Integrative Clinical Genomics of Metastatic Cancer

PubMed Central

Robinson, Dan R.; Wu, Yi-Mi; Lonigro, Robert J.; Vats, Pankaj; Cobain, Erin; Everett, Jessica; Cao, Xuhong; Rabban, Erica; Kumar-Sinha, Chandan; Raymond, Victoria; Schuetze, Scott; Alva, Ajjai; Siddiqui, Javed; Chugh, Rashmi; Worden, Francis; Zalupski, Mark M.; Innis, Jeffrey; Mody, Rajen J.; Tomlins, Scott A.; Lucas, David; Baker, Laurence H.; Ramnath, Nithya; Schott, Ann F.; Hayes, Daniel F.; Vijai, Joseph; Offit, Kenneth; Stoffel, Elena M.; Roberts, J. Scott; Smith, David C.; Kunju, Lakshmi P.; Talpaz, Moshe; Cieslik, Marcin; Chinnaiyan, Arul M.

2017-01-01

SUMMARY Metastasis is the primary cause of cancer-related deaths. While The Cancer Genome Atlas (TCGA) has sequenced primary tumor types obtained from surgical resections, much less comprehensive molecular analysis is available from clinically acquired metastatic cancers. Here, we perform whole exome and transcriptome sequencing of 500 adult patients with metastatic solid tumors of diverse lineage and biopsy site. The most prevalent genes somatically altered in metastatic cancer included TP53, CDKN2A, PTEN, PIK3CA, and RB1. Putative pathogenic germline variants were present in 12.2% of cases of which 75% were related to defects in DNA repair. RNA sequencing complemented DNA sequencing for the identification of gene fusions, pathway activation, and immune profiling. Integrative sequence analysis provides a clinically relevant, multi-dimensional view of the complex molecular landscape and microenvironment of metastatic cancers. PMID:28783718
Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

PubMed Central

Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei

2007-01-01

Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966
Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

PubMed

Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

2013-07-01

The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets

PubMed Central

Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.

2013-01-01

The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967
Interstitial telomeric sequences in human chromosomes cluster with common fragile sites, mutagen sensitive sites, viral integration sites, cancer breakpoints, proto-oncogenes and breakpoints involved in primate evolution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Adekunle, S.S.A.; Wyandt, H.; Mark, H.F.L.

1994-09-01

Recently we mapped the telomeric repeat sequences to 111 interstitial sites in the human genome and to sites of gaps and breaks induced by aphidicolin and sister chromatid exchange sites detected by BrdU. Many of these sites correspond to conserved fragile sites in man, gorilla and chimpazee, to sites of conserved sister chromatid exchange in the mammalian X chromosome, to mutagenic sensitive sites, mapped locations of proto-oncogenes, breakpoints implicated in primate evolution and to breakpoints indicated as the sole anomaly in neoplasia. This observation prompted us to investigate if the interstitial telomeric sites cluster with these sites. An extensive literaturemore » search was carried out to find all the available published sites mentioned above. For comparison, we also carried out a statistical analysis of the clustering of the sites of the telomeric repeats with the gene locations where only nucleotide mutations have been observed as the only chromosomal abnormality. Our results indicate that the telomeric repeats cluster most with fragile sites, mutagenic sensitive sites and breakpoints implicated in primate evolution and least with cancer breakpoints, mapped locations of proto-oncogenes and other genes with nucleotide mutations.« less
Functional Evolution of PLP-dependent Enzymes based on Active-Site Structural Similarities

PubMed Central

Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

2014-01-01

Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5’-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the Comparison of Protein Active Site Structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. PMID:24920327
Functional evolution of PLP-dependent enzymes based on active-site structural similarities.

PubMed

Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

2014-10-01

Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5'-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional-fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. © 2014 Wiley Periodicals, Inc.
Molecular analysis of bacterial microbiota associated with oysters (Crassostrea gigas and Crassostrea corteziensis) in different growth phases at two cultivation sites.

PubMed

Trabal, Natalia; Mazón-Suástegui, José M; Vázquez-Juárez, Ricardo; Asencio-Valle, Felipe; Morales-Bojórquez, Enrique; Romero, Jaime

2012-08-01

Microbiota presumably plays an essential role in inhibiting pathogen colonization and in the maintenance of health in oysters, but limited data exist concerning their different growth phases and conditions. We analyzed the bacterial microbiota composition of two commercial oysters: Crassostrea gigas and Crassostrea corteziensis. Differences in microbiota were assayed in three growth phases: post-larvae at the hatchery, juvenile, and adult at two grow-out cultivation sites. Variations in the microbiota were assessed by PCR analysis of the 16S rRNA gene in DNA extracted from depurated oysters. Restriction fragment length polymorphism (RFLP) profiles were studied using Dice's similarity coefficient (Cs) and statistical principal component analysis (PCA). The microbiota composition was determined by sequencing temperature gradient gel electrophoresis (TGGE) bands. The RFLP analysis of post-larvae revealed homology in the microbiota of both oyster species (Cs > 88 %). Dice and PCA analyses of C. corteziensis but not C. gigas showed differences in the microbiota according to the cultivation sites. The sequencing analysis revealed low bacterial diversity (primarily β-Proteobacteria, Firmicutes, and Spirochaetes), with Burkholderia cepacia being the most abundant bacteria in both oyster species. This study provides the first description of the microbiota in C. corteziensis, which was shown to be influenced by cultivation site conditions. During early growth, we observed that B. cepacia colonized and remained strongly associated with the two oysters, probably in a symbiotic host-bacteria relationship. This association was maintained in the three growth phases and was not altered by environmental conditions or the management of the oysters at the grow-out site.
Information theory applications for biological sequence analysis.

PubMed

Vinga, Susana

2014-05-01

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Landscapes, depositional environments and human occupation at Middle Paleolithic open-air sites in the southern Levant, with new insights from Nesher Ramla, Israel

NASA Astrophysics Data System (ADS)

Zaidner, Yossi; Frumkin, Amos; Friesem, David; Tsatskin, Alexander; Shahack-Gross, Ruth

2016-04-01

Middle Paleolithic human occupation in the Levant (250-50 ka ago) has been recorded in roofed (cave and rockshelter) and open-air sites. Research at these different types of sites yielded different perspectives on the Middle Paleolithic human behavior and evolution. Until recently, open-air Middle Paleolithic sites in the Levant were found in three major sedimentary environments: fluvial, lake-margin and spring. Here we describe a unique depositional environment and formation processes at the recently discovered open-air site of Nesher Ramla (Israel) and discuss their contribution to understanding site formation processes in open-air sites in the Levant. The site is 8-m-thick Middle Paleolithic sequence (OSL dated to 170-80 ka) that is located in a karst sinkhole formed by gravitational deformation and sagging into underground voids. The sedimentary sequence was shaped by gravitational collapse, cyclic colluviation of soil and gravel into the depression, waterlogging, in situ pedogenesis and human occupation. Original bedding and combustion features are well-preserved in the Lower archaeological sequence, a rare occurrence in comparison to other open-air archaeological sites. This phenomenon coincides with episodes of fast sedimentation/burial, which also allowed better preservation of microscopic remains such as ash. The Upper archaeological sequence does not exhibit bedding or preservation of ash, despite presence of heat-affected lithic artifacts, which makes it similar to other open-air sites in the Levant. We suggest that rate of burial is the major factor that caused the difference between the Upper and Lower sequences. The differences in the burial rate may be connected to environmental and vegetation changes at the end of MIS 6. We also identified an interplay between sediment in-wash and density of human activity remains, i.e. during episodes of low natural sediment input the density of artifacts is higher relative to episodes with high rate of sediment in-wash. The detailed analysis of natural and anthropogenic processes at Nesher Ramla suggests a much wider spectrum of processes than previously reported for southern Levantine Paleolithic sites. Nesher Ramla shares certain depositional and post-depositional characteristics with both cave and open-air sites and provides a better insight into processes which control both types of sites.
Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice

PubMed Central

Shavit Grievink, Liat; Penny, David; Holland, Barbara R.

2013-01-01

Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this. PMID:23471508

Target Site Recognition by a Diversity-Generating Retroelement

PubMed Central

Guo, Huatao; Tse, Longping V.; Nieh, Angela W.; Czornyj, Elizabeth; Williams, Steven; Oukil, Sabrina; Liu, Vincent B.; Miller, Jeff F.

2011-01-01

Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification. PMID:22194701
Factor IX[sub Madrid 2]: A deletion/insertion in Facotr IX gene which abolishes the sequence of the donor junction at the exon IV-intron d splice site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solera, J.; Magallon, M.; Martin-Villar, J.

1992-02-01

DNA from a patient with severe hemophilia B was evaluated by RFLP analysis, producing results which suggested the existence of a partial deletion within the factor IX gene. The deletion was further localized and characterized by PCR amplification and sequencing. The altered allele has a 4,442-bp deletion which removes both the donor splice site located at the 5[prime] end of intron d and the two last coding nucleotides located at the 3[prime] end of exon IV in the normal factor IX gene; this fragment has been inserted in inverted orientation. Two homologous sequences have been discovered at the ends ofmore » the deleted DNA fragment.« less
Toward rules relating zinc finger protein sequences and DNA binding site preferences.

PubMed

Desjarlais, J R; Berg, J M

1992-08-15

Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Characterization of the human gene (TBXAS1) encoding thromboxane synthase.

PubMed

Miyata, A; Yokoyama, C; Ihara, H; Bandoh, S; Takeda, O; Takahashi, E; Tanabe, T

1994-09-01

The gene encoding human thromboxane synthase (TBXAS1) was isolated from a human EMBL3 genomic library using human platelet thromboxane synthase cDNA as a probe. Nucleotide sequencing revealed that the human thromboxane synthase gene spans more than 75 kb and consists of 13 exons and 12 introns, of which the splice donor and acceptor sites conform to the GT/AG rule. The exon-intron boundaries of the thromboxane synthase gene were similar to those of the human cytochrome P450 nifedipine oxidase gene (CYP3A4) except for introns 9 and 10, although the primary sequences of these enzymes exhibited 35.8% identity each other. The 1.2-kb of the 5'-flanking region sequence contained potential binding sites for several transcription factors (AP-1, AP-2, GATA-1, CCAAT box, xenobiotic-response element, PEA-3, LF-A1, myb, basic transcription element and cAMP-response element). Primer-extension analysis indicated the multiple transcription-start sites, and the major start site was identified as an adenine residue located 142 bases upstream of the translation-initiation site. However, neither a typical TATA box nor a typical CAAT box is found within the 100-b upstream of the translation-initiation site. Southern-blot analysis revealed the presence of one copy of the thromboxane synthase gene per haploid genome. Furthermore, a fluorescence in situ hybridization study revealed that the human gene for thromboxane synthase is localized to band q33-q34 of the long arm of chromosome 7. A tissue-distribution study demonstrated that thromboxane synthase mRNA is widely expressed in human tissues and is particularly abundant in peripheral blood leukocyte, spleen, lung and liver. The low but significant levels of mRNA were observed in kidney, placenta and thymus.
Analysis of the site-specific integration system of the Streptomyces aureofaciens phage μ1/6.

PubMed

Farkašovská, Jarmila; Godány, Andrej

2012-03-01

The bacteriophage μ1/6 integrates its DNA into the chromosome of tetracycline producing strains of Streptomyces aureofaciens by a site-specific recombination process. A bioinformatic analysis of the μ1/6 genome revealed that orf5 encodes a putative integrase, a basic protein of 416 amino acids. The μ1/6 integrase was found to belong to the integrase family of site-specific tyrosine recombinases. The phage attachment site (attP) was localized downstream of the int gene. The attachment junctions (attL and attR) were determined, allowing identification of the bacterial attachment site (attB). All attachment sites shared a 46-bp common core sequence within which a site-specific recombination occurs. This core sequence comprises the 3' end of a putative tRNA(Thr) gene (anticodon TGT) which is completely restored in attL after integration of the phage into the host genome. An integration vector containing μ1/6 int-attP region was inserted stably into the S. aureofaciens B96, S. lividans TK24, and S. coelicolor A3. The μ1/6 integrase was shown to be functional in vivo in heterologous Escherichia coli without any other factors encoded by Streptomyces. In vitro recombination assay using purified μ1/6 integrase demonstrated its ability to catalyze integrative recombination in the presence of a crude extract of E. coli cells.
Progress of targeted genome modification approaches in higher plants.

PubMed

Cardi, Teodoro; Neal Stewart, C

2016-07-01

Transgene integration in plants is based on illegitimate recombination between non-homologous sequences. The low control of integration site and number of (trans/cis)gene copies might have negative consequences on the expression of transferred genes and their insertion within endogenous coding sequences. The first experiments conducted to use precise homologous recombination for gene integration commenced soon after the first demonstration that transgenic plants could be produced. Modern transgene targeting categories used in plant biology are: (a) homologous recombination-dependent gene targeting; (b) recombinase-mediated site-specific gene integration; (c) oligonucleotide-directed mutagenesis; (d) nuclease-mediated site-specific genome modifications. New tools enable precise gene replacement or stacking with exogenous sequences and targeted mutagenesis of endogeneous sequences. The possibility to engineer chimeric designer nucleases, which are able to target virtually any genomic site, and use them for inducing double-strand breaks in host DNA create new opportunities for both applied plant breeding and functional genomics. CRISPR is the most recent technology available for precise genome editing. Its rapid adoption in biological research is based on its inherent simplicity and efficacy. Its utilization, however, depends on available sequence information, especially for genome-wide analysis. We will review the approaches used for genome modification, specifically those for affecting gene integration and modification in higher plants. For each approach, the advantages and limitations will be noted. We also will speculate on how their actual commercial development and implementation in plant breeding will be affected by governmental regulations.
Segmental duplications and evolutionary plasticity at tumor chromosome break-prone regions

PubMed Central

Darai-Ramqvist, Eva; Sandlund, Agneta; Müller, Stefan; Klein, George; Imreh, Stefan; Kost-Alimova, Maria

2008-01-01

We have previously found that the borders of evolutionarily conserved chromosomal regions often coincide with tumor-associated deletion breakpoints within human 3p12-p22. Moreover, a detailed analysis of a frequently deleted region at 3p21.3 (CER1) showed associations between tumor breaks and gene duplications. We now report on the analysis of 54 chromosome 3 breaks by multipoint FISH (mpFISH) in 10 carcinoma-derived cell lines. The centromeric region was broken in five lines. In lines with highly complex karyotypes, breaks were clustered near known fragile sites, FRA3B, FRA3C, and FRA3D (three lines), and in two other regions: 3p12.3-p13 (∼75 Mb position) and 3q21.3-q22.1 (∼130 Mb position) (six lines). All locations are shown based on NCBI Build 36.1 human genome sequence. The last two regions participated in three of four chromosome 3 inversions during primate evolution. Regions at 75, 127, and 131 Mb positions carry a large (∼250 kb) segmental duplication (tumor break-prone segmental duplication [TBSD]). TBSD homologous sequences were found at 15 sites on different chromosomes. They were located within bands frequently involved in carcinoma-associated breaks. Thirteen of them have been involved in inversions during primate evolution; 10 were reused by breaks during mammalian evolution; 14 showed copy number polymorphism in man. TBSD sites showed an increase in satellite repeats, retrotransposed sequences, and other segmental duplications. We propose that the instability of these sites stems from specific organization of the chromosomal region, associated with location at a boundary between different CG-content isochores and with the presence of TBSDs and “instability elements,” including satellite repeats and retroviral sequences. PMID:18230801
Segmental duplications and evolutionary plasticity at tumor chromosome break-prone regions.

PubMed

Darai-Ramqvist, Eva; Sandlund, Agneta; Müller, Stefan; Klein, George; Imreh, Stefan; Kost-Alimova, Maria

2008-03-01

We have previously found that the borders of evolutionarily conserved chromosomal regions often coincide with tumor-associated deletion breakpoints within human 3p12-p22. Moreover, a detailed analysis of a frequently deleted region at 3p21.3 (CER1) showed associations between tumor breaks and gene duplications. We now report on the analysis of 54 chromosome 3 breaks by multipoint FISH (mpFISH) in 10 carcinoma-derived cell lines. The centromeric region was broken in five lines. In lines with highly complex karyotypes, breaks were clustered near known fragile sites, FRA3B, FRA3C, and FRA3D (three lines), and in two other regions: 3p12.3-p13 ( approximately 75 Mb position) and 3q21.3-q22.1 ( approximately 130 Mb position) (six lines). All locations are shown based on NCBI Build 36.1 human genome sequence. The last two regions participated in three of four chromosome 3 inversions during primate evolution. Regions at 75, 127, and 131 Mb positions carry a large ( approximately 250 kb) segmental duplication (tumor break-prone segmental duplication [TBSD]). TBSD homologous sequences were found at 15 sites on different chromosomes. They were located within bands frequently involved in carcinoma-associated breaks. Thirteen of them have been involved in inversions during primate evolution; 10 were reused by breaks during mammalian evolution; 14 showed copy number polymorphism in man. TBSD sites showed an increase in satellite repeats, retrotransposed sequences, and other segmental duplications. We propose that the instability of these sites stems from specific organization of the chromosomal region, associated with location at a boundary between different CG-content isochores and with the presence of TBSDs and "instability elements," including satellite repeats and retroviral sequences.
Metagenome Sequence Analysis of Filamentous Microbial Communities Obtained from Geochemically Distinct Geothermal Channels Reveals Specialization of Three Aquificales Lineages

PubMed Central

Takacs-Vesbach, Cristina; Inskeep, William P.; Jay, Zackary J.; Herrgard, Markus J.; Rusch, Douglas B.; Tringe, Susannah G.; Kozubal, Mark A.; Hamamura, Natsuko; Macur, Richard E.; Fouke, Bruce W.; Reysenbach, Anna-Louise; McDermott, Timothy R.; Jennings, Ryan deM.; Hengartner, Nicolas W.; Xie, Gary

2013-01-01

The Aquificales are thermophilic microorganisms that inhabit hydrothermal systems worldwide and are considered one of the earliest lineages of the domain Bacteria. We analyzed metagenome sequence obtained from six thermal “filamentous streamer” communities (∼40 Mbp per site), which targeted three different groups of Aquificales found in Yellowstone National Park (YNP). Unassembled metagenome sequence and PCR-amplified 16S rRNA gene libraries revealed that acidic, sulfidic sites were dominated by Hydrogenobaculum (Aquificaceae) populations, whereas the circum-neutral pH (6.5–7.8) sites containing dissolved sulfide were dominated by Sulfurihydrogenibium spp. (Hydrogenothermaceae). Thermocrinis (Aquificaceae) populations were found primarily in the circum-neutral sites with undetectable sulfide, and to a lesser extent in one sulfidic system at pH 8. Phylogenetic analysis of assembled sequence containing 16S rRNA genes as well as conserved protein-encoding genes revealed that the composition and function of these communities varied across geochemical conditions. Each Aquificales lineage contained genes for CO2 fixation by the reverse-TCA cycle, but only the Sulfurihydrogenibium populations perform citrate cleavage using ATP citrate lyase (Acl). The Aquificaceae populations use an alternative pathway catalyzed by two separate enzymes, citryl-CoA synthetase (Ccs), and citryl-CoA lyase (Ccl). All three Aquificales lineages contained evidence of aerobic respiration, albeit due to completely different types of heme Cu oxidases (subunit I) involved in oxygen reduction. The distribution of Aquificales populations and differences among functional genes involved in energy generation and electron transport is consistent with the hypothesis that geochemical parameters (e.g., pH, sulfide, H2, O2) have resulted in niche specialization among members of the Aquificales. PMID:23755042
Identification of novel point mutations in splicing sites integrating whole-exome and RNA-seq data in myeloproliferative diseases.

PubMed

Spinelli, Roberta; Pirola, Alessandra; Redaelli, Sara; Sharma, Nitesh; Raman, Hima; Valletta, Simona; Magistroni, Vera; Piazza, Rocco; Gambacorti-Passerini, Carlo

2013-11-01

Point mutations in intronic regions near mRNA splice junctions can affect the splicing process. To identify novel splicing variants from exome sequencing data, we developed a bioinformatics splice-site prediction procedure to analyze next-generation sequencing (NGS) data (SpliceFinder). SpliceFinder integrates two functional annotation tools for NGS, ANNOVAR and MutationTaster and two canonical splice site prediction programs for single mutation analysis, SSPNN and NetGene2. By SpliceFinder, we identified somatic mutations affecting RNA splicing in a colon cancer sample, in eight atypical chronic myeloid leukemia (aCML), and eight CML patients. A novel homozygous splicing mutation was found in APC (NM_000038.4:c.1312+5G>A) and six heterozygous in GNAQ (NM_002072.2:c.735+1C>T), ABCC 3 (NM_003786.3:c.1783-1G>A), KLHDC 1 (NM_172193.1:c.568-2A>G), HOOK 1 (NM_015888.4:c.1662-1G>A), SMAD 9 (NM_001127217.2:c.1004-1C>T), and DNAH 9 (NM_001372.3:c.10242+5G>A). Integrating whole-exome and RNA sequencing in aCML and CML, we assessed the phenotypic effect of mutations on mRNA splicing for GNAQ, ABCC 3, HOOK 1. In ABCC 3 and HOOK 1, RNA-Seq showed the presence of aberrant transcripts with activation of a cryptic splice site or intron retention, validated by the reverse transcription-polymerase chain reaction (RT-PCR) in the case of HOOK 1. In GNAQ, RNA-Seq showed 22% of wild-type transcript and 78% of mRNA skipping exon 5, resulting in a 4-6 frameshift fusion confirmed by RT-PCR. The pipeline can be useful to identify intronic variants affecting RNA sequence by complementing conventional exome analysis.
Double layer zinc-UDP coordination polymers: structure and properties.

PubMed

Qiu, Qi-Ming; Gu, Leilei; Ma, Hongwei; Yan, Li; Liu, Minghua; Li, Hui

2018-05-17

A homochiral Zn-UDP coordination polymer with an alternating parallel ABAB sequence was constructed and studied by X-ray single crystal diffraction analysis. Its crystal structure shows that there are potentially open sites in the 2D layers. The activation of the sites makes the coordination polymer a fluorescent sensor for novel heterogeneous detection of amino acids.
Structural and sequencing analysis of local target DNA recognition by MLV integrase.

PubMed

Aiyer, Sriram; Rossi, Paolo; Malani, Nirav; Schneider, William M; Chandar, Ashwin; Bushman, Frederic D; Montelione, Gaetano T; Roth, Monica J

2015-06-23

Target-site selection by retroviral integrase (IN) proteins profoundly affects viral pathogenesis. We describe the solution nuclear magnetic resonance structure of the Moloney murine leukemia virus IN (M-MLV) C-terminal domain (CTD) and a structural homology model of the catalytic core domain (CCD). In solution, the isolated MLV IN CTD adopts an SH3 domain fold flanked by a C-terminal unstructured tail. We generated a concordant MLV IN CCD structural model using SWISS-MODEL, MMM-tree and I-TASSER. Using the X-ray crystal structure of the prototype foamy virus IN target capture complex together with our MLV domain structures, residues within the CCD α2 helical region and the CTD β1-β2 loop were predicted to bind target DNA. The role of these residues was analyzed in vivo through point mutants and motif interchanges. Viable viruses with substitutions at the IN CCD α2 helical region and the CTD β1-β2 loop were tested for effects on integration target site selection. Next-generation sequencing and analysis of integration target sequences indicate that the CCD α2 helical region, in particular P187, interacts with the sequences distal to the scissile bonds whereas the CTD β1-β2 loop binds to residues proximal to it. These findings validate our structural model and disclose IN-DNA interactions relevant to target site selection. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

PubMed Central

Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

2008-01-01

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319
Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop.

PubMed

Newell, Nicholas E

2011-12-15

The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.
PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.

PubMed

Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

2007-10-11

By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
Combined sequence and structure analysis of the fungal laccase family.

PubMed

Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

2003-08-20

Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal laccases. The identified motifs, L1-L4, can be useful in searching the newly sequenced genomes for putative laccase enzymes. Copyright 2003 Wiley Periodicals, Inc. Biotechnol Bioeng 83: 386-394, 2003.
Microbial characterization of nitrification in a shallow, nitrogen-contaminated aquifer, Cape Cod, Massachusetts and detection of a novel cluster associated with nitrifying Betaproteobacteria

USGS Publications Warehouse

Miller, D.N.; Smith, R.L.

2009-01-01

Groundwater nitrification is a poorly characterized process affecting the speciation and transport of nitrogen. Cores from two sites in a plume of contamination were examined using culture-based and molecular techniques targeting nitrification processes. The first site, located beneath a sewage effluent infiltration bed, received treated effluent containing O2 (> 300????M) and NH4+ (51-800????M). The second site was 2.5??km down-gradient near the leading edge of the ammonium zone within the contaminant plume and featured vertical gradients of O2, NH4+, and NO3- (0-300, 0-500, and 100-200????M with depth, respectively). Ammonia- and nitrite-oxidizers enumerated by the culture-based MPN method were low in abundance at both sites (1.8 to 350??g- 1 and 33 to 35,000??g- 1, respectively). Potential nitrifying activity measured in core material in the laboratory was also very low, requiring several weeks for products to accumulate. Molecular analysis of aquifer DNA (nested PCR followed by cloning and 16S rDNA sequencing) detected primarily sequences associated with the Nitrosospira genus throughout the cores at the down-gradient site and a smaller proportion from the Nitrosomonas genus in the deeper anoxic, NH4+ zone at the down-gradient site. Only a single Nitrosospira sequence was detected beneath the infiltration bed. Furthermore, the majority of Nitrosospira-associated sequences represent an unrecognized cluster. We conclude that an uncharacterized group associated with Nitrosospira dominate at the geochemically stable, down-gradient site, but found little evidence for Betaproteobacteria nitrifiers beneath the infiltration beds where geochemical conditions were more variable.
Microbial characterization of nitrification in a shallow, nitrogen-contaminated aquifer, Cape Cod, Massachusetts and detection of a novel cluster associated with nitrifying Betaproteobacteria

NASA Astrophysics Data System (ADS)

Miller, Daniel N.; Smith, Richard L.

2009-01-01

Groundwater nitrification is a poorly characterized process affecting the speciation and transport of nitrogen. Cores from two sites in a plume of contamination were examined using culture-based and molecular techniques targeting nitrification processes. The first site, located beneath a sewage effluent infiltration bed, received treated effluent containing O 2 (> 300 µM) and NH 4+ (51-800 µM). The second site was 2.5 km down-gradient near the leading edge of the ammonium zone within the contaminant plume and featured vertical gradients of O 2, NH 4+, and NO 3- (0-300, 0-500, and 100-200 µM with depth, respectively). Ammonia- and nitrite-oxidizers enumerated by the culture-based MPN method were low in abundance at both sites (1.8 to 350 g - 1 and 33 to 35,000 g - 1 , respectively). Potential nitrifying activity measured in core material in the laboratory was also very low, requiring several weeks for products to accumulate. Molecular analysis of aquifer DNA (nested PCR followed by cloning and 16S rDNA sequencing) detected primarily sequences associated with the Nitrosospira genus throughout the cores at the down-gradient site and a smaller proportion from the Nitrosomonas genus in the deeper anoxic, NH 4+ zone at the down-gradient site. Only a single Nitrosospira sequence was detected beneath the infiltration bed. Furthermore, the majority of Nitrosospira-associated sequences represent an unrecognized cluster. We conclude that an uncharacterized group associated with Nitrosospira dominate at the geochemically stable, down-gradient site, but found little evidence for Betaproteobacteria nitrifiers beneath the infiltration beds where geochemical conditions were more variable.
Comprehensive profiling of retroviral integration sites using target enrichment methods from historical koala samples without an assembled reference genome

PubMed Central

Alquezar-Planas, David E.; Ishida, Yasuko; Courtiol, Alexandre; Timms, Peter; Johnson, Rebecca N.; Lenz, Dorina; Helgen, Kristofer M.; Roca, Alfred L.; Hartman, Stefanie

2016-01-01

Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV) is currently invading the germline of the koala (Phascolarctos cinereus) and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS) and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW) koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small. PMID:27069793
Sequence characterization of S100A8 gene reveals structural differences of protein and transcriptional factor binding sites in water buffalo and yak.

PubMed

Kathiravan, P; Goyal, S; Kataria, R S; Mishra, B P; Jayakumar, S; Joshi, B K

2011-01-01

The present study was undertaken to characterize the structure of S100A8 gene and its promoter in water buffalo and yak. Sequence data of 2.067 kb, 2.071 kb, and 2.052 kb with respect to complete S100A8 gene including 5' flanking region was generated in river buffalo, swamp buffalo, and yak, respectively. BLAST analysis of coding DNA sequences (CDS) of S100A8 gene revealed 95% homology of buffalo sequence with cattle, 85% with pig and horse, 83% with dog, 72-73% with murines, and around 79% with primates and humans. Phylogenetic analysis of predicted CDS revealed distinct clustering of murines, primates, and domestic animals with bovines and bubalines forming a subcluster among farm animals. In silico translation of predicted CDS revealed a sequence of 89 amino acids with 7 amino acid changes between cattle and buffalo and 2 changes between cattle and yak. The search for Pfam family revealed the N-terminal calcium binding domain and the noncanonical EF hand domain in the carboxy terminus, with more variations being observed in the N-terminal domain among different species. Two amino acid changes observed in carboxy terminal EF hand domain resulted in altered secondary structure of yak S100A8 protein. Analysis of S100A8 gene promoter revealed 14 putative motifs for transcriptional factor binding sites. Two putative motifs viz. C/EBP and v-Myb were found to be absent in swamp buffalo as compared to river buffalo and cattle. Differences in the structure of S100A8 protein and the transcriptional factor binding sites identified in the present study need to be analyzed further for their functional significance in yak and swamp buffalo respectively. Copyright © Taylor & Francis Group, LLC

Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

PubMed

Pietrowski, D; Förster, M

2000-01-01

The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).
Epigallocatechin-3-gallate preferentially induces aggregation of amyloidogenic immunoglobulin light chains

PubMed Central

Hora, Manuel; Carballo-Pacheco, Martin; Weber, Benedikt; Morris, Vanessa K.; Wittkopf, Antje; Buchner, Johannes; Strodel, Birgit; Reif, Bernd

2017-01-01

Antibody light chain amyloidosis is a rare disease caused by fibril formation of secreted immunoglobulin light chains (LCs). The huge variety of antibody sequences puts a serious challenge to drug discovery. The green tea polyphenol epigallocatechin-3-gallate (EGCG) is known to interfere with fibril formation in general. Here we present solution- and solid-state NMR studies as well as MD simulations to characterise the interaction of EGCG with LC variable domains. We identified two distinct EGCG binding sites, both of which include a proline as an important recognition element. The binding sites were confirmed by site-directed mutagenesis and solid-state NMR analysis. The EGCG-induced protein complexes are unstructured. We propose a general mechanistic model for EGCG binding to a conserved site in LCs. We find that EGCG reacts selectively with amyloidogenic mutants. This makes this compound a promising lead structure, that can handle the immense sequence variability of antibody LCs. PMID:28128355
mtDNA variation in the Yanomami: evidence for additional New World founding lineages.

PubMed

Easton, R D; Merriwether, D A; Crews, D E; Ferrell, R E

1996-07-01

Native Americans have been classified into four founding haplogroups with as many as seven founding lineages based on mtDNA RFLPs and DNA sequence data. mtDNA analysis was completed for 83 Yanomami from eight villages in the Surucucu and Catrimani Plateau regions of Roraima in northwestern Brazil. Samples were typed for 15 polymorphic mtDNA sites (14 RFLP sites and 1 deletion site), and a subset was sequenced for both hypervariable regions of the mitochondrial D-loop. Substantial mitochondrial diversity was detected among the Yanomami, five of seven accepted founding haplotypes and three others were observed. Of the 83 samples, 4 (4.8%) were lineage B1, 1 (1.2%) was lineage B2, 31 (37.4%) were lineage C1, 29 (34.9%) were lineage C2, 2 (2.4%) were lineage D1, 6 (7.2%) were lineage D2, 7 (8.4%) were a haplotype we designated "X6," and 3 (3.6%) were a haplotype we designated "X7." Sequence analysis found 43 haplotypes in 50 samples. B2, X6, and X7 are previously unrecognized mitochondrial founding lineage types of Native Americans. The widespread distribution of these haplotypes in the New World and Asia provides support for declaring these lineages to be New World founding types.
mtDNA variation in the Yanomami: evidence for additional New World founding lineages.

PubMed Central

Easton, R. D.; Merriwether, D. A.; Crews, D. E.; Ferrell, R. E.

1996-01-01

Native Americans have been classified into four founding haplogroups with as many as seven founding lineages based on mtDNA RFLPs and DNA sequence data. mtDNA analysis was completed for 83 Yanomami from eight villages in the Surucucu and Catrimani Plateau regions of Roraima in northwestern Brazil. Samples were typed for 15 polymorphic mtDNA sites (14 RFLP sites and 1 deletion site), and a subset was sequenced for both hypervariable regions of the mitochondrial D-loop. Substantial mitochondrial diversity was detected among the Yanomami, five of seven accepted founding haplotypes and three others were observed. Of the 83 samples, 4 (4.8%) were lineage B1, 1 (1.2%) was lineage B2, 31 (37.4%) were lineage C1, 29 (34.9%) were lineage C2, 2 (2.4%) were lineage D1, 6 (7.2%) were lineage D2, 7 (8.4%) were a haplotype we designated "X6," and 3 (3.6%) were a haplotype we designated "X7." Sequence analysis found 43 haplotypes in 50 samples. B2, X6, and X7 are previously unrecognized mitochondrial founding lineage types of Native Americans. The widespread distribution of these haplotypes in the New World and Asia provides support for declaring these lineages to be New World founding types. PMID:8659527
Sequence-dependent modelling of local DNA bending phenomena: curvature prediction and vibrational analysis.

PubMed

Vlahovicek, K; Munteanu, M G; Pongor, S

1999-01-01

Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
In silico evolution of the Drosophila gap gene regulatory sequence under elevated mutational pressure.

PubMed

Chertkova, Aleksandra A; Schiffman, Joshua S; Nuzhdin, Sergey V; Kozlov, Konstantin N; Samsonova, Maria G; Gursky, Vitaly V

2017-02-07

Cis-regulatory sequences are often composed of many low-affinity transcription factor binding sites (TFBSs). Determining the evolutionary and functional importance of regulatory sequence composition is impeded without a detailed knowledge of the genotype-phenotype map. We simulate the evolution of regulatory sequences involved in Drosophila melanogaster embryo segmentation during early development. Natural selection evaluates gene expression dynamics produced by a computational model of the developmental network. We observe a dramatic decrease in the total number of transcription factor binding sites through the course of evolution. Despite a decrease in average sequence binding energies through time, the regulatory sequences tend towards organisations containing increased high affinity transcription factor binding sites. Additionally, the binding energies of separate sequence segments demonstrate ubiquitous mutual correlations through time. Fewer than 10% of initial TFBSs are maintained throughout the entire simulation, deemed 'core' sites. These sites have increased functional importance as assessed under wild-type conditions and their binding energy distributions are highly conserved. Furthermore, TFBSs within close proximity of core sites exhibit increased longevity, reflecting functional regulatory interactions with core sites. In response to elevated mutational pressure, evolution tends to sample regulatory sequence organisations with fewer, albeit on average, stronger functional transcription factor binding sites. These organisations are also shaped by the regulatory interactions among core binding sites with sites in their local vicinity.
Sequence-Specific Targeting of Dosage Compensation in Drosophila Favors an Active Chromatin Context

PubMed Central

Gelbart, Marnie; Tolstorukov, Michael Y.; Plachetka, Annette; Kharchenko, Peter V.; Jung, Youngsook L.; Gorchakov, Andrey A.; Larschan, Erica; Gu, Tingting; Minoda, Aki; Riddle, Nicole C.; Schwartz, Yuri B.; Elgin, Sarah C. R.; Karpen, Gary H.; Pirrotta, Vincenzo; Kuroda, Mitzi I.; Park, Peter J.

2012-01-01

The Drosophila MSL complex mediates dosage compensation by increasing transcription of the single X chromosome in males approximately two-fold. This is accomplished through recognition of the X chromosome and subsequent acetylation of histone H4K16 on X-linked genes. Initial binding to the X is thought to occur at “entry sites” that contain a consensus sequence motif (“MSL recognition element” or MRE). However, this motif is only ∼2 fold enriched on X, and only a fraction of the motifs on X are initially targeted. Here we ask whether chromatin context could distinguish between utilized and non-utilized copies of the motif, by comparing their relative enrichment for histone modifications and chromosomal proteins mapped in the modENCODE project. Through a comparative analysis of the chromatin features in male S2 cells (which contain MSL complex) and female Kc cells (which lack the complex), we find that the presence of active chromatin modifications, together with an elevated local GC content in the surrounding sequences, has strong predictive value for functional MSL entry sites, independent of MSL binding. We tested these sites for function in Kc cells by RNAi knockdown of Sxl, resulting in induction of MSL complex. We show that ectopic MSL expression in Kc cells leads to H4K16 acetylation around these sites and a relative increase in X chromosome transcription. Collectively, our results support a model in which a pre-existing active chromatin environment, coincident with H3K36me3, contributes to MSL entry site selection. The consequences of MSL targeting of the male X chromosome include increase in nucleosome lability, enrichment for H4K16 acetylation and JIL-1 kinase, and depletion of linker histone H1 on active X-linked genes. Our analysis can serve as a model for identifying chromatin and local sequence features that may contribute to selection of functional protein binding sites in the genome. PMID:22570616
Characterization, genetic diversity, and evolutionary link of Cucumber mosaic virus strain New Delhi from India.

PubMed

Koundal, Vikas; Haq, Qazi Mohd Rizwanul; Praveen, Shelly

2011-02-01

The genome of Cucumber mosaic virus New Delhi strain (CMV-ND) from India, obtained from tomato, was completely sequenced and compared with full genome sequences of 14 known CMV strains from subgroups I and II, for their genetic diversity. Sequence analysis suggests CMV-ND shares maximum sequence identity at the nucleotide level with a CMV strain from Taiwan. Among all 15 strains of CMV, the encoded protein 2b is least conserved, whereas the coat protein (CP) is most conserved. Sequence identity values and phylogram results indicate that CMV-ND belongs to subgroup I. Based on the recombination detection program result, it appears that CMV is prone to recombination, and different RNA components of CMV-ND have evolved differently. Recombinational analysis of all 15 CMV strains detected maximum recombination breakpoints in RNA2; CP showed the least recombination sites.
Identification of novel alleles of the rice blast resistance gene Pi54

NASA Astrophysics Data System (ADS)

Vasudevan, Kumar; Gruissem, Wilhelm; Bhullar, Navreet K.

2015-10-01

Rice blast is one of the most devastating rice diseases and continuous resistance breeding is required to control the disease. The rice blast resistance gene Pi54 initially identified in an Indian cultivar confers broad-spectrum resistance in India. We explored the allelic diversity of the Pi54 gene among 885 Indian rice genotypes that were found resistant in our screening against field mixture of naturally existing M. oryzae strains as well as against five unique strains. These genotypes are also annotated as rice blast resistant in the International Rice Genebank database. Sequence-based allele mining was used to amplify and clone the Pi54 allelic variants. Nine new alleles of Pi54 were identified based on the nucleotide sequence comparison to the Pi54 reference sequence as well as to already known Pi54 alleles. DNA sequence analysis of the newly identified Pi54 alleles revealed several single polymorphic sites, three double deletions and an eight base pair deletion. A SNP-rich region was found between a tyrosine kinase phosphorylation site and the nucleotide binding site (NBS) domain. Together, the newly identified Pi54 alleles expand the allelic series and are candidates for rice blast resistance breeding programs.
Position-specific binding of FUS to nascent RNA regulates mRNA length

PubMed Central

Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

2015-01-01

More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
Identification of MicroRNAs in Helicoverpa armigera and Spodoptera litura Based on Deep Sequencing and Homology Analysis

PubMed Central

Ge, Xie; Zhang, Yong; Jiang, Jianhao; Zhong, Yi; Yang, Xiaonan; Li, Zhiqian; Huang, Yongping; Tan, Anjiang

2013-01-01

The current identification of microRNAs (miRNAs) in insects is largely dependent on genome sequences. However, the lack of available genome sequences inhibits the identification of miRNAs in various insect species. In this study, we used a miRNA database of the silkworm Bombyx mori as a reference to identify miRNAs in Helicoverpa armigera and Spodoptera litura using deep sequencing and homology analysis. Because all three species belong to the Lepidoptera, the experiment produced reliable results. Our study identified 97 and 91 conserved miRNAs in H. armigera and S. litura, respectively. Using the genome of B. mori and BAC sequences of H. armigera as references, 1 novel miRNA and 8 novel miRNA candidates were identified in H. armigera, and 4 novel miRNA candidates were identified in S. litura. An evolutionary analysis revealed that most of the identified miRNAs were insect-specific, and more than 20 miRNAs were Lepidoptera-specific. The investigation of the expression patterns of miR-2a, miR-34, miR-2796-3p and miR-11 revealed their potential roles in insect development. miRNA target prediction revealed that conserved miRNA target sites exist in various genes in the 3 species. Conserved miRNA target sites for the Hsp90 gene among the 3 species were validated in the mammalian 293T cell line using a dual-luciferase reporter assay. Our study provides a new approach with which to identify miRNAs in insects lacking genome information and contributes to the functional analysis of insect miRNAs. PMID:23289012
On site DNA barcoding by nanopore sequencing

PubMed Central

Menegon, Michele; Cantaloni, Chiara; Rodriguez-Prieto, Ana; Centomo, Cesare; Abdelfattah, Ahmed; Rossato, Marzia; Bernardi, Massimo; Xumerle, Luciano; Loader, Simon; Delledonne, Massimo

2017-01-01

Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet’s biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities. PMID:28977016
PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

PubMed

Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

2013-02-01

Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.
The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information

PubMed Central

Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.

2010-01-01

The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719
Identification of RAN1 orthologue associated with sex determination through whole genome sequencing analysis in fig (Ficus carica L.).

PubMed

Mori, Kazuki; Shirasawa, Kenta; Nogata, Hitoshi; Hirata, Chiharu; Tashiro, Kosuke; Habu, Tsuyoshi; Kim, Sangwan; Himeno, Shuichi; Kuhara, Satoru; Ikegami, Hidetoshi

2017-01-25

With the aim of identifying sex determinants of fig, we generated the first draft genome sequence of fig and conducted the subsequent analyses. Linkage analysis with a high-density genetic map established by a restriction-site associated sequencing technique, and genome-wide association study followed by whole-genome resequencing analysis identified two missense mutations in RESPONSIVE-TO-ANTAGONIST1 (RAN1) orthologue encoding copper-transporting ATPase completely associated with sex phenotypes of investigated figs. This result suggests that RAN1 is a possible sex determinant candidate in the fig genome. The genomic resources and genetic findings obtained in this study can contribute to general understanding of Ficus species and provide an insight into fig's and plant's sex determination system.
Nucleotide sequence and structural organization of the human vasopressin pituitary receptor (V3) gene.

PubMed

René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y

2000-01-04

In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.
The molecular mechanism for interaction of ceruloplasmin and myeloperoxidase

NASA Astrophysics Data System (ADS)

Bakhautdin, Bakytzhan; Bakhautdin, Esen Göksöy

2016-04-01

Ceruloplasmin (Cp) is a copper-containing ferroxidase with potent antioxidant activity. Cp is expressed by hepatocytes and activated macrophages and has been known as physiologic inhibitor of myeloperoxidase (MPO). Enzymatic activity of MPO produces anti-microbial agents and strong prooxidants such as hypochlorous acid and has a potential to damage host tissue at the sites of inflammation and infection. Thus Cp-MPO interaction and inhibition of MPO has previously been suggested as an important control mechanism of excessive MPO activity. Our aim in this study was to identify minimal Cp domain or peptide that interacts with MPO. We first confirmed Cp-MPO interaction by ELISA and surface plasmon resonance (SPR). SPR analysis of the interaction yielded 30 nM affinity between Cp and MPO. We then designed and synthesized 87 overlapping peptides spanning the entire amino acid sequence of Cp. Each of the peptides was tested whether it binds to MPO by direct binding ELISA. Two of the 87 peptides, P18 and P76 strongly interacted with MPO. Amino acid sequence analysis of identified peptides revealed high sequence and structural homology between them. Further structural analysis of Cp's crystal structure by PyMOL software unfolded that both peptides represent surface-exposed sites of Cp and face nearly the same direction. To confirm our finding we raised anti-P18 antisera in rabbit and demonstrated that this antisera disrupts Cp-MPO binding and rescues MPO activity. Collectively, our results confirm Cp-MPO interaction and identify two nearly identical sites on Cp that specifically bind MPO. We propose that inhibition of MPO by Cp requires two nearly identical sites on Cp to bind homodimeric MPO simultaneously and at an angle of at least 120 degrees, which, in turn, exerts tension on MPO and results in conformational change.
Analysis of the regulatory region of the protease III (ptr) gene of Escherichia coli K-12.

PubMed

Claverie-Martin, F; Diaz-Torres, M R; Kushner, S R

1987-01-01

The ptr gene of Escherichia coli encodes protease III (Mr 110,000) and a 50-kDa polypeptide, both of which are found in the periplasmic space. The gene is physically located between the recC and recB loci on the E. coli chromosome. The nucleotide sequence of a 1167-bp EcoRV-ClaI fragment of chromosomal DNA containing the promoter region and 885 bp of the ptr coding sequence has been determined. S1 nuclease mapping analysis showed that the major 5' end of the ptr mRNA was localized 127 bp upstream from the ATG start codon. The open reading frame (ORF), preceded by a Shine-Dalgarno sequence, extends to the end of the sequenced DNA. Downstream from the -35 and -10 regions is a sequence that strongly fits the consensus sequence of known nitrogen-regulated promoters. A signal peptide of 23 amino acids residues is present at the N terminus of the derived amino acid sequence. The cleavage site as well as the ORF were confirmed by sequencing the N terminus of mature protease III.
Diversity and distribution patterns of root-associated fungi on herbaceous plants in alpine meadows of southwestern China.

PubMed

Gao, Qian; Yang, Zhu L

2016-01-01

The diversity of root-associated fungi associated with four ectomycorrhizal herbaceous species, Kobresia capillifolia, Carex parva, Polygonum macrophyllum and Potentilla fallens, collected in three sites of alpine meadows in southwestern China, was estimated based on internal transcribed spacer (ITS) rDNA sequence analysis of root tips. Three hundred seventy-seven fungal sequences sorted to 154 operational taxonomical units (sequence similarity of ≥ 97% across the ITS) were obtained from the four plant species across all three sites. Similar taxa (in GenBank with ≥ 97% similarity) were not found in GenBank and/or UNITE for most of the OTUs. Ectomycorrhiz a made up 64% of the fungi operational taxonomic units (OTUs), endophytes constituted 4% and the other 33% were unidentified root-associated fungi. Fungal OTUs were represented by 57% basidiomycetes and 43% ascomycetes. Inocybe, Tomentella/Thelophora, Sebacina, Hebeloma, Pezizomycotina, Cenococcum geophilum complex, Cortinarius, Lactarius and Helotiales were OTU-rich fungal lineages. Across the sites and host species the root-associated fungal communities generally exhibited low host and site specificity but high host and sampling site preference. Collectively our study revealed noteworthy diversity and endemism of root-associated fungi of alpine plants in this global biodiversity hotspot. © 2016 by The Mycological Society of America.
Characterization of proviruses cloned from mink cell focus-forming virus-infected cellular DNA.

PubMed Central

Khan, A S; Repaske, R; Garon, C F; Chan, H W; Rowe, W P; Martin, M A

1982-01-01

Two proviruses were cloned from EcoRI-digested DNA extracted from mink cells chronically infected with AKR mink cell focus-forming (MCF) 247 murine leukemia virus (MuLV), using a lambda phage host vector system. One cloned MuLV DNA fragment (designated MCF 1) contained sequences extending 6.8 kilobases from an EcoRI restriction site in the 5' long terminal repeat (LTR) to an EcoRI site located in the envelope (env) region and was indistinguishable by restriction endonuclease mapping for 5.1 kilobases (except for the EcoRI site in the LTR) from the 5' end of AKR ecotropic proviral DNA. The DNA segment extending from 5.1 to 6.8 kilobases contained several restriction sites that were not present in the AKR ecotropic provirus. A 0.5-kilobase DNA segment located at the 3' end of MCF 1 DNA contained sequences which hybridized to a xenotropic env-specific DNA probe but not to labeled ecotropic env-specific DNA. This dual character of MCF 1 proviral DNA was also confirmed by analyzing heteroduplex molecules by electron microscopy. The second cloned proviral DNA (designated MCF 2) was a 6.9-kilobase EcoRI DNA fragment which contained LTR sequences at each end and a 2.0-kilobase deletion encompassing most of the env region. The MCF 2 proviral DNA proved to be a useful reagent for detecting LTRs electron microscopically due to the presence of nonoverlapping, terminally located LTR sequences which effected its circularization with DNAs containing homologous LTR sequences. Nucleotide sequence analysis demonstrated the presence of a 104-base-pair direct repeat in the LTR of MCF 2 DNA. In contrast, only a single copy of the reiterated component of the direct repeat was present in MCF 1 DNA. Images PMID:6281459

Nucleic acid analysis using terminal-phosphate-labeled nucleotides

DOEpatents

Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

2008-04-22

The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.
Combining Comprehensive Analysis of Off-Site Lambda Phage Integration with a CRISPR-Based Means of Characterizing Downstream Physiology.

PubMed

Tanouchi, Yu; Covert, Markus W

2017-09-19

During its lysogenic life cycle, the phage genome is integrated into the host chromosome by site-specific recombination. In this report, we analyze lambda phage integration into noncanonical sites using next-generation sequencing and show that it generates significant genetic diversity by targeting over 300 unique sites in the host Escherichia coli genome. Moreover, these integration events can have important phenotypic consequences for the host, including changes in cell motility and increased antibiotic resistance. Importantly, the new technologies that we developed to enable this study-sequencing secondary sites using next-generation sequencing and then selecting relevant lysogens using clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9-based selection-are broadly applicable to other phage-bacterium systems. IMPORTANCE Bacteriophages play an important role in bacterial evolution through lysogeny, where the phage genome is integrated into the host chromosome. While phage integration generally occurs at a specific site in the host chromosome, it is also known to occur at other, so-called secondary sites. In this study, we developed a new experimental technology to comprehensively study secondary integration sites and discovered that phage can integrate into over 300 unique sites in the host genome, resulting in significant genetic diversity in bacteria. We further developed an assay to examine the phenotypic consequence of such diverse integration events and found that phage integration can cause changes in evolutionarily relevant traits such as bacterial motility and increases in antibiotic resistance. Importantly, our method is readily applicable to other phage-bacterium systems. Copyright © 2017 Tanouchi and Covert.
Accurate detection for a wide range of mutation and editing sites of microRNAs from small RNA high-throughput sequencing profiles

PubMed Central

Zheng, Yun; Ji, Bo; Song, Renhua; Wang, Shengpeng; Li, Ting; Zhang, Xiaotuo; Chen, Kun; Li, Tianqing; Li, Jinyan

2016-01-01

Various types of mutation and editing (M/E) events in microRNAs (miRNAs) can change the stabilities of pre-miRNAs and/or complementarities between miRNAs and their targets. Small RNA (sRNA) high-throughput sequencing (HTS) profiles can contain many mutated and edited miRNAs. Systematic detection of miRNA mutation and editing sites from the huge volume of sRNA HTS profiles is computationally difficult, as high sensitivity and low false positive rate (FPR) are both required. We propose a novel method (named MiRME) for an accurate and fast detection of miRNA M/E sites using a progressive sequence alignment approach which refines sensitivity and improves FPR step-by-step. From 70 sRNA HTS profiles with over 1.3 billion reads, MiRME has detected thousands of statistically significant M/E sites, including 3′-editing sites, 57 A-to-I editing sites (of which 32 are novel), as well as some putative non-canonical editing sites. We demonstrated that a few non-canonical editing sites were not resulted from mutations in genome by integrating the analysis of genome HTS profiles of two human cell lines, suggesting the existence of new editing types to further diversify the functions of miRNAs. Compared with six existing studies or methods, MiRME has shown much superior performance for the identification and visualization of the M/E sites of miRNAs from the ever-increasing sRNA HTS profiles. PMID:27229138
Study of Streptococcus thermophilus population on a world-wide and historical collection by a new MLST scheme.

PubMed

Delorme, Christine; Legravet, Nicolas; Jamet, Emmanuel; Hoarau, Caroline; Alexandre, Bolotin; El-Sharoud, Walid M; Darwish, Mohamed S; Renault, Pierre

2017-02-02

We analyzed 178 Streptococcus thermophilus strains isolated from diverse products, from around the world, over a 60-year period with a new multilocus sequence typing (MLST) scheme. This collection included isolates from two traditional cheese-making sites with different starter-use practices, in sampling campaigns carried out over a three years period. The nucleotide diversity of the S. thermophilus population was limited, but 116 sequence types (ST) were identified. Phylogenetic analysis of the concatenated sequences of the six housekeeping genes revealed the existence of groups confirmed by eBURST analysis. Deeper analyses performed on 25 strains by CRISPR and whole-genome analysis showed that phylogenies obtained by MLST and whole-genome analysis were in agreement but differed from that inferred by CRISPR analysis. Strains isolated from traditional products could cluster in specific groups indicating their origin, but also be mixed in groups containing industrial starter strains. In the traditional cheese-making sites, we found that S. thermophilus persisted on dairy equipment, but that occasionally added starter strains may become dominant. It underlined the impact of starter use that may reshape S. thermophilus populations including in traditional products. This new MLST scheme thus provides a framework for analyses of S. thermophilus populations and the management of its biodiversity. Copyright © 2016 Elsevier B.V. All rights reserved.
Structural and functional analysis of mouse Msx1 gene promoter: sequence conservation with human MSX1 promoter points at potential regulatory elements.

PubMed

Gonzalez, S M; Ferland, L H; Robert, B; Abdelhay, E

1998-06-01

Vertebrate Msx genes are related to one of the most divergent homeobox genes of Drosophila, the muscle segment homeobox (msh) gene, and are expressed in a well-defined pattern at sites of tissue interactions. This pattern of expression is conserved in vertebrates as diverse as quail, zebrafish, and mouse in a range of sites including neural crest, appendages, and craniofacial structures. In the present work, we performed structural and functional analyses in order to identify potential cis-acting elements that may be regulating Msx1 gene expression. To this end, a 4.9-kb segment of the 5'-flanking region was sequenced and analyzed for transcription-factor binding sites. Four regions showing a high concentration of these sites were identified. Transfection assays with fragments of regulatory sequences driving the expression of the bacterial lacZ reporter gene showed that a region of 4 kb upstream of the transcription start site contains positive and negative elements responsible for controlling gene expression. Interestingly, a fragment of 130 bp seems to contain the minimal elements necessary for gene expression, as its removal completely abolishes gene expression in cultured cells. These results are reinforced by comparison of this region with the human Msx1 gene promoter, which shows extensive conservation, including many consensus binding sites, suggesting a regulatory role for them.
Molecular analysis of carbon monoxide-oxidizing bacteria associated with recent Hawaiian volcanic deposits.

PubMed

Dunfield, Kari E; King, Gary M

2004-07-01

Genomic DNA extracts from four sites at Kilauea Volcano were used as templates for PCR amplification of the large subunit (coxL) of aerobic carbon monoxide dehydrogenase. The sites included a 42-year-old tephra deposit, a 108-year-old lava flow, a 212-year-old partially vegetated ash-and-tephra deposit, and an approximately 300-year-old forest. PCR primers amplified coxL sequences from the OMP clade of CO oxidizers, which includes isolates such as Oligotropha carboxidovorans, Mycobacterium tuberculosis, and Pseudomonas thermocarboxydovorans. PCR products were used to create clone libraries that provide the first insights into the diversity and phylogenetic affiliations of CO oxidizers in situ. On the basis of phylogenetic and statistical analyses, clone libraries for each site were distinct. Although some clone sequences were similar to coxL sequences from known organisms, many sequences appeared to represent phylogenetic lineages not previously known to harbor CO oxidizers. On the basis of average nucleotide diversity and average pairwise difference, a forested site supported the most diverse CO-oxidizing populations, while an 1894 lava flow supported the least diverse populations. Neither parameter correlated with previous estimates of atmospheric CO uptake rates, but both parameters correlated positively with estimates of microbial biomass and respiration. Collectively, the results indicate that the CO oxidizer functional group associated with recent volcanic deposits of the remote Hawaiian Islands contains substantial and previously unsuspected diversity.
Molecular Analysis of Carbon Monoxide-Oxidizing Bacteria Associated with Recent Hawaiian Volcanic Deposits†

PubMed Central

Dunfield, Kari E.; King, Gary M.

2004-01-01

Genomic DNA extracts from four sites at Kilauea Volcano were used as templates for PCR amplification of the large subunit (coxL) of aerobic carbon monoxide dehydrogenase. The sites included a 42-year-old tephra deposit, a 108-year-old lava flow, a 212-year-old partially vegetated ash-and-tephra deposit, and an approximately 300-year-old forest. PCR primers amplified coxL sequences from the OMP clade of CO oxidizers, which includes isolates such as Oligotropha carboxidovorans, Mycobacterium tuberculosis, and Pseudomonas thermocarboxydovorans. PCR products were used to create clone libraries that provide the first insights into the diversity and phylogenetic affiliations of CO oxidizers in situ. On the basis of phylogenetic and statistical analyses, clone libraries for each site were distinct. Although some clone sequences were similar to coxL sequences from known organisms, many sequences appeared to represent phylogenetic lineages not previously known to harbor CO oxidizers. On the basis of average nucleotide diversity and average pairwise difference, a forested site supported the most diverse CO-oxidizing populations, while an 1894 lava flow supported the least diverse populations. Neither parameter correlated with previous estimates of atmospheric CO uptake rates, but both parameters correlated positively with estimates of microbial biomass and respiration. Collectively, the results indicate that the CO oxidizer functional group associated with recent volcanic deposits of the remote Hawaiian Islands contains substantial and previously unsuspected diversity. PMID:15240307
Reduced-median-network analysis of complete mitochondrial DNA coding-region sequences for the major African, Asian, and European haplogroups.

PubMed

Herrnstadt, Corinna; Elson, Joanna L; Fahy, Eoin; Preston, Gwen; Turnbull, Douglass M; Anderson, Christen; Ghosh, Soumitra S; Olefsky, Jerrold M; Beal, M Flint; Davis, Robert E; Howell, Neil

2002-05-01

The evolution of the human mitochondrial genome is characterized by the emergence of ethnically distinct lineages or haplogroups. Nine European, seven Asian (including Native American), and three African mitochondrial DNA (mtDNA) haplogroups have been identified previously on the basis of the presence or absence of a relatively small number of restriction-enzyme recognition sites or on the basis of nucleotide sequences of the D-loop region. We have used reduced-median-network approaches to analyze 560 complete European, Asian, and African mtDNA coding-region sequences from unrelated individuals to develop a more complete understanding of sequence diversity both within and between haplogroups. A total of 497 haplogroup-associated polymorphisms were identified, 323 (65%) of which were associated with one haplogroup and 174 (35%) of which were associated with two or more haplogroups. Approximately one-half of these polymorphisms are reported for the first time here. Our results confirm and substantially extend the phylogenetic relationships among mitochondrial genomes described elsewhere from the major human ethnic groups. Another important result is that there were numerous instances both of parallel mutations at the same site and of reversion (i.e., homoplasy). It is likely that homoplasy in the coding region will confound evolutionary analysis of small sequence sets. By a linkage-disequilibrium approach, additional evidence for the absence of human mtDNA recombination is presented here.
[Using IRAP markers for analysis of genetic variability in populations of resource and rare species of plants].

PubMed

Boronnikova, S V; Kalendar', R N

2010-01-01

Species-specific LTR retrotransposons were first cloned in five rare relic species of drug plants located in the Perm' region. Sequences of LTR retrotransposons were used for PCR analysis based on amplification of repeated sequences from LTR or other sites of retrotransposons (IRAP). Genetic diversity was studied in six populations of rare relic species of plants Adonis vernalis L. by means of the IRAP method; 125 polymorphic IRAP-markers were analyzed. Parameters for DNA polymorphism and genetic diversity of A. vernalis populations were determined.
Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens

PubMed Central

Naz, Sadia; Ngo, Tony; Farooq, Umar

2017-01-01

Background The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis. The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Methods Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli, two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. Results High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis. Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Discussion Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner. PMID:28948099
Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens.

PubMed

Naz, Sadia; Ngo, Tony; Farooq, Umar; Abagyan, Ruben

2017-01-01

The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis . The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli , two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis . Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner.
Protospacer Adjacent Motif (PAM)-Distal Sequences Engage CRISPR Cas9 DNA Target Cleavage

PubMed Central

Ethier, Sylvain; Schmeing, T. Martin; Dostie, Josée; Pelletier, Jerry

2014-01-01

The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data. PMID:25275497
Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

PubMed

Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

2015-03-01

Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Genetic variability of Rickettsia spp. in Ixodes persulcatus ticks from continental and island areas of the Russian Far East.

PubMed

Igolkina, Y; Bondarenko, E; Rar, V; Epikhina, T; Vysochina, N; Pukhovskaya, N; Tikunov, A; Ivanov, L; Golovljova, I; Ivanov, М; Tikunova, N

2016-10-01

Rickettsia spp. are intracellular Gram-negative bacteria transmitted by arthropods. Two potentially pathogenic rickettsiae, Candidatus Rickettsia tarasevichiae and Rickettsia helvetica, have been found in unfed adult Ixodes persulcatus ticks. The aim of this study was to assess the prevalence and genetic variability of Rickettsia spp. in I. persulcatus ticks collected from different locations in the Russian Far East. In total, 604 adult I. persulcatus ticks collected from four sites in the Khabarovsk Territory (continental area) and one site in Sakhalin Island were examined for the presence of Rickettsia spp. by real-time PCR. Nested PCR with species-specific primers and sequencing were used for genotyping of revealed rickettsiae. The overall prevalence of Rickettsia spp. in ticks collected in different sites varied from 67.9 to 90.7%. However, the proportion of different Rickettsia species observed in ticks from Sakhalin Island significantly differed from that in ticks from the Khabarovsk Territory. In Sakhalin Island, R. helvetica prevailed in examined ticks, while Candidatus R. tarasevichiae was predominant in the Khabarovsk Territory. For gltA and ompB gene fragments, the sequences obtained for Candidatus R. tarasevichiae from all studied sites were identical to each other and to the known sequences of this species. According to sequence analysis of gltA, оmpB and sca4 genes, R. helvetica isolates from Sakhalin Island and the Khabarovsk Territory were identical to each other, but they differed from R. helvetica from other regions and from those found in other tick species. For the first time, DNA of pathogenic Rickettsia heilongjiangensis was detected in I. persulcatus ticks in two sites from the Khabarovsk Territory. The gltA, ompA and оmpB gene sequences of R. heilongjiangensis were identical to or had solitary mismatches with the corresponding sequences of R. heilongjiangensis found in other tick species. Copyright © 2016 Elsevier GmbH. All rights reserved.
Diversity of Functionally Permissive Sequences in the Receptor-Binding Site of Influenza Hemagglutinin.

PubMed

Wu, Nicholas C; Xie, Jia; Zheng, Tianqing; Nycholat, Corwin M; Grande, Geramie; Paulson, James C; Lerner, Richard A; Wilson, Ian A

2017-06-14

Influenza A virus hemagglutinin (HA) initiates viral entry by engaging host receptor sialylated glycans via its receptor-binding site (RBS). The amino acid sequence of the RBS naturally varies across avian and human influenza virus subtypes and is also evolvable. However, functional sequence diversity in the RBS has not been fully explored. Here, we performed a large-scale mutational analysis of the RBS of A/WSN/33 (H1N1) and A/Hong Kong/1/1968 (H3N2) HAs. Many replication-competent mutants not yet observed in nature were identified, including some that could escape from an RBS-targeted broadly neutralizing antibody. This functional sequence diversity is made possible by pervasive epistasis in the RBS 220-loop and can be buffered by avidity in viral receptor binding. Overall, our study reveals that the HA RBS can accommodate a much greater range of sequence diversity than previously thought, which has significant implications for the complex evolutionary interrelationships between receptor specificity and immune escape. Copyright © 2017 Elsevier Inc. All rights reserved.
Transcription Factor Map Alignment of Promoter Regions

PubMed Central

Blanco, Enrique; Messeguer, Xavier; Smith, Temple F; Guigó, Roderic

2006-01-01

We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments. PMID:16733547
Structural analysis of two length variants of the rDNA intergenic spacer from Eruca sativa.

PubMed

Lakshmikumaran, M; Negi, M S

1994-03-01

Restriction enzyme analysis of the rRNA genes of Eruca sativa indicated the presence of many length variants within a single plant and also between different cultivars which is unusual for most crucifers studied so far. Two length variants of the rDNA intergenic spacer (IGS) from a single individual E. sativa (cv. Itsa) plant were cloned and characterized. The complete nucleotide sequences of both the variants (3 kb and 4 kb) were determined. The intergenic spacer contains three families of tandemly repeated DNA sequences denoted as A, B and C. However, the long (4 kb) variant shows the presence of an additional repeat, denoted as D, which is a duplication of a 224 bp sequence just upstream of the putative transcription initiation site. Repeat units belonging to the three different families (A, B and C) were in the size range of 22 to 30 bp. Such short repeat elements are present in the IGS of most of the crucifers analysed so far. Sequence analysis of the variants (3 kb and 4 kb) revealed that the length heterogeneity of the spacer is located at three different regions and is due to the varying copy numbers of repeat units belonging to families A and B. Length variation of the spacer is also due to the presence of a large duplication (D repeats) in the 4 kb variant which is absent in the 3 kb variant. The putative transcription initiation site was identified by comparisons with the rDNA sequences from other plant species.
Genetic diversity of Grapevine virus A in Washington and California vineyards.

PubMed

Alabi, Olufemi J; Al Rwahnih, Maher; Mekuria, Tefera A; Naidu, Rayapati A

2014-05-01

Grapevine virus A (GVA; genus Vitivirus, family Betaflexiviridae) has been implicated with the Kober stem grooving disorder of the rugose wood disease complex. In this study, 26 isolates of GVA recovered from wine grape (Vitis vinifera) cultivars from California and Washington were analyzed for their genetic diversity. An analysis of a portion of the RNA-dependent RNA polymerase (RdRp) and complete coat protein (CP) sequences revealed intra- and inter-isolate sequence diversity. Our results indicated that both RdRp and CP are under strong negative selection based on the normalized values for the ratio of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site. A global phylogenetic analysis of CP sequences revealed segregation of virus isolates into four major clades with no geographic clustering. In contrast, the RdRp-based phylogenetic tree indicated segregation of GVA isolates from California and Washington into six clades, independent of geographic origin or cultivar. Phylogenetic network coupled with recombination analyses showed putative recombination events in both RdRp and CP sequence data sets, with more of these events located in the CP sequence. The preponderance of divergent variants of GVA co-replicating within individual grapevines could increase viral genotypic complexity with implications for phylogenetic analysis and evolutionary history of the virus. The knowledge of genetic diversity of GVA generated in this study will provide a foundation for elucidating the epidemiological characteristics of virus populations at different scales and implementing appropriate management strategies for minimizing the spread of genetic variants of the virus by vectors and via planting materials supplied to nurseries and grape growers.
cis-acting intron mutations that affect the efficiency of avian retroviral RNA splicing: implication for mechanisms of control.

PubMed Central

Katz, R A; Kotler, M; Skalka, A M

1988-01-01

The full-length retroviral RNA transcript serves as (i) mRNA for the gag and pol gene products, (ii) genomic RNA that is assembled into progeny virions, and (iii) a pre-mRNA for spliced subgenomic mRNAs. Therefore, a balance of spliced and unspliced RNA is required to generate the appropriate levels of protein and RNA products for virion production. We have introduced an insertion mutation near the avian sarcoma virus env splice acceptor site that results in a significant increase in splicing to form functional env mRNA. The mutant virus is replication defective, but phenotypic revertant viruses that have acquired second-site mutations near the splice acceptor site can be isolated readily. Detailed analysis of one of these viruses revealed that a single nucleotide change at -20 from the splice acceptor site, within the original mutagenic insert, was sufficient to restore viral growth and significantly decrease splicing efficiency compared with the original mutant and wild-type viruses. Thus, minor sequence alterations near the env splice acceptor site can produce major changes in the balance of spliced and unspliced RNAs. Our results suggest a mechanism of control in which splicing is modulated by cis-acting sequences at the env splice acceptor site. Furthermore, this retroviral system provides a powerful genetic method for selection and analysis of mutations that affect splicing control. Images PMID:2839694
Analysis of capsid portal protein and terminase functional domains: interaction sites required for DNA packaging in bacteriophage T4.

PubMed

Lin, H; Rao, V B; Black, L W

1999-06-04

Bacteriophage DNA packaging results from an ATP-driven translocation of concatemeric DNA into the prohead by the phage terminase complexed with the portal vertex dodecamer of the prohead. Functional domains of the bacteriophage T4 terminase and portal gene 20 product (gp20) were determined by mutant analysis and sequence localization within the structural genes. Interaction regions of the portal vertex and large terminase subunit (gp17) were determined by genetic (terminase-portal intergenic suppressor mutations), biochemical (column retention of gp17 and inhibition of in vitro DNA packaging by gp20 peptides), and immunological (co-immunoprecipitation of polymerized gp20 peptide and gp17) studies. The specificity of the interaction was tested by means of a phage T4 HOC (highly antigenicoutercapsid protein) display system in which wild-type, cs20, and scrambled portal peptide sequences were displayed on the HOC protein of phage T4. Binding affinities of these recombinant phages as determined by the retention of these phages by a His-tag immobilized gp17 column, and by co-immunoprecipitation with purified terminase supported the specific nature of the portal protein and terminase interaction sites. In further support of specificity, a gp20 peptide corresponding to a portion of the identified site inhibited packaging whereas the scrambled sequence peptide did not block DNA packaging in vitro. The portal interaction site is localized to 28 residues in the central portion of the linear sequence of gp20 (524 residues). As judged by two pairs of intergenic portal-terminase suppressor mutations, two separate regions of the terminase large subunit gp17 (central and COOH-terminal) interact through hydrophobic contacts at the portal site. Although the terminase apparently interacts with this gp20 portal peptide, polyclonal antibody against the portal peptide appears unable to access it in the native structure, suggesting intimate association of gp20 and gp17 possibly internalizes terminase regions within the portal in the packasome complex. Both similarities and differences are seen in comparison to analogous sites which have been identified in phages T3 and lambda. Copyright 1999 Academic Press.

Plasmid integration in a wide range of bacteria mediated by the integrase of Lactobacillus delbrueckii bacteriophage mv4.

PubMed Central

Auvray, F; Coddeville, M; Ritzenthaler, P; Dupont, L

1997-01-01

Bacteriophage mv4 is a temperate phage infecting Lactobacillus delbrueckii subsp. bulgaricus. During lysogenization, the phage integrates its genome into the host chromosome at the 3' end of a tRNA(Ser) gene through a site-specific recombination process (L. Dupont et al., J. Bacteriol., 177:586-595, 1995). A nonreplicative vector (pMC1) based on the mv4 integrative elements (attP site and integrase-coding int gene) is able to integrate into the chromosome of a wide range of bacterial hosts, including Lactobacillus plantarum, Lactobacillus casei (two strains), Lactococcus lactis subsp. cremoris, Enterococcus faecalis, and Streptococcus pneumoniae. Integrative recombination of pMC1 into the chromosomes of all of these species is dependent on the int gene product and occurs specifically at the pMC1 attP site. The isolation and sequencing of pMC1 integration sites from these bacteria showed that in lactobacilli, pMC1 integrated into the conserved tRNA(Ser) gene. In the other bacterial species where this tRNA gene is less or not conserved; secondary integration sites either in potential protein-coding regions or in intergenic DNA were used. A consensus sequence was deduced from the analysis of the different integration sites. The comparison of these sequences demonstrated the flexibility of the integrase for the bacterial integration site and suggested the importance of the trinucleotide CCT at the 5' end of the core in the strand exchange reaction. PMID:9068626
Sequence Stratigraphic Analysis and Facies Architecture of the Cretaceous Mancos Shale on and Near the Jicarilla Apache Indian Reservation, New Mexico-their relation to Sites of Oil Accumulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ridgley, Jennie

2001-08-21

The purpose of phase 1 and phase 2 of the Department of Energy funded project Analysis of oil- bearing Cretaceous Sandstone Hydrocarbon Reservoirs, exclusive of the Dakota Sandstone, on the Jicarilla Apache Indian Reservation, New Mexico was to define the facies of the oil producing units within the Mancos Shale and interpret the depositional environments of these facies within a sequence stratigraphic context. The focus of this report will center on (1) redefinition of the area and vertical extent of the ''Gallup sandstone'' or El Vado Sandstone Member of the Mancos Shale, (2) determination of the facies distribution within themore » ''Gallup sandstone'' and other oil-producing sandstones within the lower Mancos, placing these facies within the overall depositional history of the San Juan Basin, (3) application of the principals of sequence stratigraphy to the depositional units that comprise the Mancos Shale, and (4) evaluation of the structural features on the Reservation as they may control sites of oil accumulation.« less
Genome-Wide SNP Discovery and Analysis of Genetic Diversity in Farmed Sika Deer (Cervus nippon) in Northeast China Using Double-Digest Restriction Site-Associated DNA Sequencing.

PubMed

Ba, Hengxing; Jia, Boyin; Wang, Guiwu; Yang, Yifeng; Kedem, Gilead; Li, Chunyi

2017-09-07

Sika deer are an economically valuable species owing to their use in traditional Chinese medicine, particularly their velvet antlers. Sika deer in northeast China are mostly farmed in enclosure. Therefore, genetic management of farmed sika deer would benefit from detailed knowledge of their genetic diversity. In this study, we generated over 1.45 billion high-quality paired-end reads (288 Gbp) across 42 unrelated individuals using double-digest restriction site-associated DNA sequencing (ddRAD-seq). A total of 96,188 (29.63%) putative biallelic SNP loci were identified with an average sequencing depth of 23×. Based on the analysis, we found that the majority of the loci had a deficit of heterozygotes (F IS >0) and low values of H obs , which could be due to inbreeding and Wahlund effects. We also developed a collection of high-quality SNP probes that will likely be useful in a variety of applications in genotyping for cervid species in the future. Copyright © 2017 Ba et al.
Genome-Wide SNP Discovery and Analysis of Genetic Diversity in Farmed Sika Deer (Cervus nippon) in Northeast China Using Double-Digest Restriction Site-Associated DNA Sequencing

PubMed Central

Ba, Hengxing; Jia, Boyin; Wang, Guiwu; Yang, Yifeng; Kedem, Gilead; Li, Chunyi

2017-01-01

Sika deer are an economically valuable species owing to their use in traditional Chinese medicine, particularly their velvet antlers. Sika deer in northeast China are mostly farmed in enclosure. Therefore, genetic management of farmed sika deer would benefit from detailed knowledge of their genetic diversity. In this study, we generated over 1.45 billion high-quality paired-end reads (288 Gbp) across 42 unrelated individuals using double-digest restriction site-associated DNA sequencing (ddRAD-seq). A total of 96,188 (29.63%) putative biallelic SNP loci were identified with an average sequencing depth of 23×. Based on the analysis, we found that the majority of the loci had a deficit of heterozygotes (FIS >0) and low values of Hobs, which could be due to inbreeding and Wahlund effects. We also developed a collection of high-quality SNP probes that will likely be useful in a variety of applications in genotyping for cervid species in the future. PMID:28751500
Analysis of DNA methylation in Arabidopsis thaliana based on methylation-sensitive AFLP markers.

PubMed

Cervera, M T; Ruiz-García, L; Martínez-Zapater, J M

2002-12-01

AFLP analysis using restriction enzyme isoschizomers that differ in their sensitivity to methylation of their recognition sites has been used to analyse the methylation state of anonymous CCGG sequences in Arabidopsis thaliana. The technique was modified to improve the quality of fingerprints and to visualise larger numbers of scorable fragments. Sequencing of amplified fragments indicated that detection was generally associated with non-methylation of the cytosine to which the isoschizomer is sensitive. Comparison of EcoRI/ HpaII and EcoRI/ MspI patterns in different ecotypes revealed that 35-43% of CCGG sites were differentially digested by the isoschizomers. Interestingly, the pattern of digestion among different plants belonging to the same ecotype is highly conserved, with the rate of intra-ecotype methylation-sensitive polymorphisms being less than 1%. However, pairwise comparisons of methylation patterns between samples belonging to different ecotypes revealed differences in up to 34% of the methylation-sensitive polymorphisms. The lack of correlation between inter-ecotype similarity matrices based on methylation-insensitive or methylation-sensitive polymorphisms suggests that whatever the mechanisms regulating methylation may be, they are not related to nucleotide sequence variation.
VISMapper: ultra-fast exhaustive cartography of viral insertion sites for gene therapy.

PubMed

Juanes, José M; Gallego, Asunción; Tárraga, Joaquín; Chaves, Felipe J; Marín-Garcia, Pablo; Medina, Ignacio; Arnau, Vicente; Dopazo, Joaquín

2017-09-20

The possibility of integrating viral vectors to become a persistent part of the host genome makes them a crucial element of clinical gene therapy. However, viral integration has associated risks, such as the unintentional activation of oncogenes that can result in cancer. Therefore, the analysis of integration sites of retroviral vectors is a crucial step in developing safer vectors for therapeutic use. Here we present VISMapper, a vector integration site analysis web server, to analyze next-generation sequencing data for retroviral vector integration sites. VISMapper can be found at: http://vismapper.babelomics.org . Because it uses novel mapping algorithms VISMapper is remarkably faster than previous available programs. It also provides a useful graphical interface to analyze the integration sites found in the genomic context.
WormBase ParaSite - a comprehensive resource for helminth genomics.

PubMed

Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew

2017-07-01

The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

PubMed Central

2012-01-01

Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993
Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems.

PubMed

Wei, Yingying; Wu, George; Ji, Hongkai

2013-05-01

Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.
Oral Microbiome of Deep and Shallow Dental Pockets In Chronic Periodontitis

PubMed Central

Ge, Xiuchun; Rodriguez, Rafael; Trinh, My; Gunsolley, John; Xu, Ping

2013-01-01

We examined the subgingival bacterial biodiversity in untreated chronic periodontitis patients by sequencing 16S rRNA genes. The primary purpose of the study was to compare the oral microbiome in deep (diseased) and shallow (healthy) sites. A secondary purpose was to evaluate the influences of smoking, race and dental caries on this relationship. A total of 88 subjects from two clinics were recruited. Paired subgingival plaque samples were taken from each subject, one from a probing site depth >5 mm (deep site) and the other from a probing site depth ≤3mm (shallow site). A universal primer set was designed to amplify the V4–V6 region for oral microbial 16S rRNA sequences. Differences in genera and species attributable to deep and shallow sites were determined by statistical analysis using a two-part model and false discovery rate. Fifty-one of 170 genera and 200 of 746 species were found significantly different in abundances between shallow and deep sites. Besides previously identified periodontal disease-associated bacterial species, additional species were found markedly changed in diseased sites. Cluster analysis revealed that the microbiome difference between deep and shallow sites was influenced by patient-level effects such as clinic location, race and smoking. The differences between clinic locations may be influenced by racial distribution, in that all of the African Americans subjects were seen at the same clinic. Our results suggested that there were influences from the microbiome for caries and periodontal disease and these influences are independent. PMID:23762384
Ebolavirus comparative genomics

DOE PAGES

Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

2015-07-14

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less
High-throughput and site-specific identification of 2'-O-methylation sites using ribose oxidation sequencing (RibOxi-seq).

PubMed

Zhu, Yinzhou; Pirnie, Stephan P; Carmichael, Gordon G

2017-08-01

Ribose methylation (2'- O -methylation, 2'- O Me) occurs at high frequencies in rRNAs and other small RNAs and is carried out using a shared mechanism across eukaryotes and archaea. As RNA modifications are important for ribosome maturation, and alterations in these modifications are associated with cellular defects and diseases, it is important to characterize the landscape of 2'- O -methylation. Here we report the development of a highly sensitive and accurate method for ribose methylation detection using next-generation sequencing. A key feature of this method is the generation of RNA fragments with random 3'-ends, followed by periodate oxidation of all molecules terminating in 2',3'-OH groups. This allows only RNAs harboring 2'-OMe groups at their 3'-ends to be sequenced. Although currently requiring microgram amounts of starting material, this method is robust for the analysis of rRNAs even at low sequencing depth. © 2017 Zhu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Human adenovirus serotype 12 virion precursors pMu and pVI are cleaved at amino-terminal and carboxy-terminal sites that conform to the adenovirus 2 endoproteinase cleavage consensus sequence.

PubMed

Freimuth, P; Anderson, C W

1993-03-01

The sequence of a 1158-base pair fragment of the human adenovirus serotype 12 (Ad12) genome was determined. This segment encodes the precursors for virion components Mu and VI. Both Ad12 precursors contain two sequences that conform to a consensus sequence motif for cleavage by the endoproteinase of adenovirus 2 (Ad2). Analysis of the amino terminus of VI and of the peptide fragments found in Ad12 virions demonstrated that these sites are cleaved during Ad12 maturation. This observation suggests that the recognition motif for adenovirus endoproteinases is highly conserved among human serotypes. The adenovirus 2 endoproteinase polypeptide requires additional co-factors for activity (C. W. Anderson, Protein Expression Purif., 1993, 4, 8-15). Synthetic Ad12 or Ad2 pVI carboxy-terminal peptides each permitted efficient cleavage of an artificial endoproteinase substrate by recombinant Ad2 endoproteinase polypeptide.
Cytochrome c oxidase subunit I barcoding of the green bee-eater (Merops orientalis).

PubMed

Arif, I A; Khan, H A; Shobrak, M; Williams, J

2011-10-21

DNA barcoding using mitochondrial cytochrome c oxidase subunit I (COI) is regarded as a standard method for species identification. Recent reports have also shown extended applications of COI gene analysis in phylogeny and molecular diversity studies. The bee-eaters are a group of near passerine birds in the family Meropidae. There are 26 species worldwide; five of them are found in Saudi Arabia. Until now, GenBank included a COI barcode for only one species of bee-eater, the European bee-eater (Merops apiaster). We sequenced the 694-bp segment of the COI gene of the green bee-eater M. orientalis and compared the sequences with those of M. apiaster. Pairwise sequence comparison showed 66 variable sites across all the eight sequences from both species, with an interspecific genetic distance of 0.0362. Two and one within-species variable sites were found, with genetic distances of 0.0005 and 0.0003 for M. apiaster and M. orientalis, respectively. This is the first study reporting barcodes for M. orientalis.
ReadXplorer—visualization and analysis of mapped sequences

PubMed Central

Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

2014-01-01

Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157
Epoxyalkane:Coenzyme M Transferase Gene Diversity and Distribution in Groundwater Samples from Chlorinated-Ethene-Contaminated Sites

PubMed Central

Liu, Xikun

2016-01-01

ABSTRACT Epoxyalkane:coenzyme M transferase (EaCoMT) plays a critical role in the aerobic biodegradation and assimilation of alkenes, including ethene, propene, and the toxic chloroethene vinyl chloride (VC). To improve our understanding of the diversity and distribution of EaCoMT genes in the environment, novel EaCoMT-specific terminal-restriction fragment length polymorphism (T-RFLP) and nested-PCR methods were developed and applied to groundwater samples from six different contaminated sites. T-RFLP analysis revealed 192 different EaCoMT T-RFs. Using clone libraries, we retrieved 139 EaCoMT gene sequences from these samples. Phylogenetic analysis revealed that a majority of the sequences (78.4%) grouped with EaCoMT genes found in VC- and ethene-assimilating Mycobacterium strains and Nocardioides sp. strain JS614. The four most-abundant T-RFs were also matched with EaCoMT clone sequences related to Mycobacterium and Nocardioides strains. The remaining EaCoMT sequences clustered within two emergent EaCoMT gene subgroups represented by sequences found in propene-assimilating Gordonia rubripertincta strain B-276 and Xanthobacter autotrophicus strain Py2. EaCoMT gene abundance was positively correlated with VC and ethene concentrations at the sites studied. IMPORTANCE The EaCoMT gene plays a critical role in assimilation of short-chain alkenes, such as ethene, VC, and propene. An improved understanding of EaCoMT gene diversity and distribution is significant to the field of bioremediation in several ways. The expansion of the EaCoMT gene database and identification of incorrectly annotated EaCoMT genes currently in the database will facilitate improved design of environmental molecular diagnostic tools and high-throughput sequencing approaches for future bioremediation studies. Our results further suggest that potentially significant aerobic VC degraders in the environment are not well represented in pure culture. Future research should aim to isolate and characterize aerobic VC-degrading bacteria from these underrepresented groups. PMID:27016563
Characterization of a Chlamydomonas Transposon, Gulliver, Resembling Those in Higher Plants

PubMed Central

Ferris, P. J.

1989-01-01

While pursuing a chromosomal walk through the mt(+) locus of linkage group VI of Chlamydomonas reinhardtii, I encountered a 12-kb sequence that was found to be present in approximately 12 copies in the nuclear genome. Comparison of various C. reinhardtii laboratory strains provided evidence that the sequence was mobile and therefore a transposon. One of two separate natural isolates interfertile with C. reinhardtii, C. smithii (CC-1373), contained the transposon, but at completely different locations in its nuclear genome than C. reinhardtii; and a second, CC-1952 (S1-C5), lacked the transposon altogether. Genetic analysis indicated that the transposon was found at dispersed sites throughout the genome, but had a conserved structure at each location. Sequence homology between the termini was limited to an imperfect 15-bp inverted repeat. An 8-bp target site duplication was created by insertion; transposon sequences were completely removed upon excision leaving behind both copies of the target site duplication, with minor base changes. The transposon contained an internal region of unique repetitive sequence responsible for restriction fragment length heterogeneity among the various copies of the transposon. In several cases it was possible to identify which of the dozen transposons in a given strain served as the donor when a transposition event occurred. The transposon often moved into a site genetically linked to the donor, and transposition appeared to be nonreplicative. Thus the mechanism of transposition and excision of the transposon, which I have named Gulliver, resembles that of certain higher plant transposons, like the Ac transposon of maize. PMID:2570007
Detecting Coevolution in and among Protein Domains

PubMed Central

Yeang, Chen-Hsiang; Haussler, David

2007-01-01

Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264
CAPRRESI: Chimera Assembly by Plasmid Recovery and Restriction Enzyme Site Insertion.

PubMed

Santillán, Orlando; Ramírez-Romero, Miguel A; Dávila, Guillermo

2017-06-25

Here, we present chimera assembly by plasmid recovery and restriction enzyme site insertion (CAPRRESI). CAPRRESI benefits from many strengths of the original plasmid recovery method and introduces restriction enzyme digestion to ease DNA ligation reactions (required for chimera assembly). For this protocol, users clone wildtype genes into the same plasmid (pUC18 or pUC19). After the in silico selection of amino acid sequence regions where chimeras should be assembled, users obtain all the synonym DNA sequences that encode them. Ad hoc Perl scripts enable users to determine all synonym DNA sequences. After this step, another Perl script searches for restriction enzyme sites on all synonym DNA sequences. This in silico analysis is also performed using the ampicillin resistance gene (ampR) found on pUC18/19 plasmids. Users design oligonucleotides inside synonym regions to disrupt wildtype and ampR genes by PCR. After obtaining and purifying complementary DNA fragments, restriction enzyme digestion is accomplished. Chimera assembly is achieved by ligating appropriate complementary DNA fragments. pUC18/19 vectors are selected for CAPRRESI because they offer technical advantages, such as small size (2,686 base pairs), high copy number, advantageous sequencing reaction features, and commercial availability. The usage of restriction enzymes for chimera assembly eliminates the need for DNA polymerases yielding blunt-ended products. CAPRRESI is a fast and low-cost method for fusing protein-coding genes.
Albumin Redhill (-1 Arg, 320 Ala yields Thr): A glycoprotein variant of human serum albumin whose precursor has an aberrant signal peptidase cleavage site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brennan, S.O.; Myles, T.; Peach, R.J.

1990-01-01

Albumin Redhill is an electrophoretically slow genetic variant of human serum albumin that does not bind {sup 63}Ni{sup 2+} and has a molecular mass 2.5 kDa higher than normal albumin. Its inability to bind Ni{sup 2+} was explained by the finding of an additional residue of Arg at position -1. This did not explain the molecular basis of the genetic variation or the increase in apparent molecular mass. Fractionation of tryptic digests on concanavalin A-Sepharose followed by peptide mapping of the bound and unbound fractions and sequence analysis of the glycopeptides identified a mutation of 320 Ala {yields} Thr. Thismore » introduces as Asn-Tyr-Thr oligosaccharide attachment sequence centered on Asn-318 and explains the increase in molecular mass. This, however, did not satisfactorily explain the presence of the additional Arg residue at position -1. DNA sequencing of polymerase chain reaction-amplified genomic DNA encoding the prepro sequence of albumin indicated an additional mutation of -2 Arg {yields} Cys. The authors propose that the new Phe-Cys-Arg sequence in the propeptide is an aberrant signal peptidase cleavage site and that the signal peptidase cleaves the propeptide of albumin Redhill in the lumen of the endoplasmic reticulum before it reaches the Golgi vesicles, the site of the diarginyl-specific proalbumin convertase.« less

Binding sites for abundant nuclear factors modulate RNA polymerase I-dependent enhancer function in Saccharomyces cerevisiae.

PubMed

Kang, J J; Yokoi, T J; Holland, M J

1995-12-01

The 190-base pair (bp) rDNA enhancer within the intergenic spacer sequences of Saccharomyces cerevisiae rRNA cistrons activates synthesis of the 35S-rRNA precursor about 20-fold in vivo (Mestel,, R., Yip, M., Holland, J. P., Wang, E., Kang, J., and Holland, M. J. (1989) Mol. Cell. Biol. 9, 1243-1254). We now report identification and analysis of transcriptional activities mediated by three cis-acting sites within a 90-bp portion of the rDNA enhancer designated the modulator region. In vivo, these sequences mediated termination of transcription by RNA polymerase I and potentiated the activity of the rDNA enhancer element. Two trans-acting factors, REB1 and REB2, bind independently to sites within the modulator region (Morrow, B. E., Johnson, S. P., and Warner, J. R. (1989) J. Biol. Chem. 264, 9061-9068). We show that REB2 is identical to the ABF1 protien. Site-directed mutagenesis of REB1 and ABF1 binding sites demonstrated uncoupling of RNA polymerase I-dependent termination from transcriptional activation in vivo. We conclude that REB1 and ABF1 are required for RNA polymerase I-dependent termination and enhancer function, respectively, Since REB1 and ABF1 proteins also regulate expression of class II genes and other nuclear functions, our results suggest further similarities between RNA polymerase I and II regulatory mechanisms. Two rDNA enhancers flanking a rDNA minigene stimulated RNA polymerase I transcription in a "multiplicative" fashion. Deletion mapping analysis showed that similar cis-acting sequences were required for enhancer function when positioned upstream or downstream from a rDNA minigene.
A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation

PubMed Central

Gruber, Andreas J.; Schmidt, Ralf; Gruber, Andreas R.; Martin, Georges; Ghosh, Souvik; Belmadani, Manuel; Keller, Walter

2016-01-01

Alternative polyadenylation (APA) is a general mechanism of transcript diversification in mammals, which has been recently linked to proliferative states and cancer. Different 3′ untranslated region (3′ UTR) isoforms interact with different RNA-binding proteins (RBPs), which modify the stability, translation, and subcellular localization of the corresponding transcripts. Although the heterogeneity of pre-mRNA 3′ end processing has been established with high-throughput approaches, the mechanisms that underlie systematic changes in 3′ UTR lengths remain to be characterized. Through a uniform analysis of a large number of 3′ end sequencing data sets, we have uncovered 18 signals, six of which are novel, whose positioning with respect to pre-mRNA cleavage sites indicates a role in pre-mRNA 3′ end processing in both mouse and human. With 3′ end sequencing we have demonstrated that the heterogeneous ribonucleoprotein C (HNRNPC), which binds the poly(U) motif whose frequency also peaks in the vicinity of polyadenylation (poly(A)) sites, has a genome-wide effect on poly(A) site usage. HNRNPC-regulated 3′ UTRs are enriched in ELAV-like RBP 1 (ELAVL1) binding sites and include those of the CD47 gene, which participate in the recently discovered mechanism of 3′ UTR–dependent protein localization (UDPL). Our study thus establishes an up-to-date, high-confidence catalog of 3′ end processing sites and poly(A) signals, and it uncovers an important role of HNRNPC in regulating 3′ end processing. It further suggests that U-rich elements mediate interactions with multiple RBPs that regulate different stages in a transcript's life cycle. PMID:27382025
Probabilistic grammatical model for helix‐helix contact site classification

PubMed Central

2013-01-01

Background Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. PMID:24350601
Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese.

PubMed

Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

2016-01-01

To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA.
Transcriptogenomics identification and characterization of RNA editing sites in human primary monocytes using high-depth next generation sequencing data.

PubMed

Leong, Wai-Mun; Ripen, Adiratna Mat; Mirsafian, Hoda; Mohamad, Saharuddin Bin; Merican, Amir Feisal

2018-06-07

High-depth next generation sequencing data provide valuable insights into the number and distribution of RNA editing events. Here, we report the RNA editing events at cellular level of human primary monocyte using high-depth whole genomic and transcriptomic sequencing data. We identified over a ten thousand putative RNA editing sites and 69% of the sites were A-to-I editing sites. The sites enriched in repetitive sequences and intronic regions. High-depth sequencing datasets revealed that 90% of the canonical sites were edited at lower frequencies (<0.7). Single and multiple human monocytes and brain tissues samples were analyzed through genome sequence independent approach. The later approach was observed to identify more editing sites. Monocytes was observed to contain more C-to-U editing sites compared to brain tissues. Our results establish comparable pipeline that can address current limitations as well as demonstrate the potential for highly sensitive detection of RNA editing events in single cell type. Copyright © 2018 Elsevier Inc. All rights reserved.
Classification of European Mtdnas from an Analysis of Three European Populations

PubMed Central

Torroni, A.; Huoponen, K.; Francalacci, P.; Petrozzi, M.; Morelli, L.; Scozzari, R.; Obinu, D.; Savontaus, M. L.; Wallace, D. C.

1996-01-01

Mitochondrial DNA (mtDNA) sequence variation was examined in Finns, Swedes and Tuscans by PCR amplification and restriction analysis. About 99% of the mtDNAs were subsumed within 10 mtDNA haplogroups (H, I, J, K, M, T, U, V, W, and X) suggesting that the identified haplogroups could encompass virtually all European mtDNAs. Because both hypervariable segments of the mtDNA control region were previously sequenced in the Tuscan samples, the mtDNA haplogroups and control region sequences could be compared. Using a combination of haplogroup-specific restriction site changes and control region nucleotide substitutions, the distribution of the haplogroups was surveyed through the published restriction site polymorphism and control region sequence data of Caucasoids. This supported the conclusion that most haplogroups observed in Europe are Caucasoid-specific, and that at least some of them occur at varying frequencies in different Caucasoid populations. The classification of almost all European mtDNA variation in a number of well defined haplogroups could provide additional insights about the origin and relationships of Caucasoid populations and the process of human colonization of Europe, and is valuable for the definition of the role played by mtDNA backgrounds in the expression of pathological mtDNA mutations PMID:8978068
Molecular cloning of a cDNA encoding the glycoprotein of hen oviduct microsomal signal peptidase.

PubMed Central

Newsome, A L; McLean, J W; Lively, M O

1992-01-01

Detergent-solubilized hen oviduct signal peptidase has been characterized previously as an apparent complex of a 19 kDa protein and a 23 kDa glycoprotein (GP23) [Baker & Lively (1987) Biochemistry 26, 8561-8567]. A cDNA clone encoding GP23 from a chicken oviduct lambda gt11 cDNA library has now been characterized. The cDNA encodes a protein of 180 amino acid residues with a single site for asparagine-linked glycosylation that has been directly identified by amino acid sequence analysis of a tryptic-digest peptide containing the glycosylated site. Immunoblot analysis reveals cross-reactivity with a dog pancreas protein. Comparison of the deduced amino acid sequence of GP23 with the 22/23 kDa glycoprotein of dog microsomal signal peptidase [Shelness, Kanwar & Blobel (1988) J. Biol. Chem. 263, 17063-17070], one of five proteins associated with this enzyme, reveals that the amino acid sequences are 90% identical. Thus the signal peptidase glycoprotein is as highly conserved as the sequences of cytochromes c and b from these same species and is likely to be found in a similar form in many, if not all, vertebrate species. The data also show conclusively that the dog and avian signal peptidases have at least one protein subunit in common. Images Fig. 1. PMID:1546959
1,4-Benzoquinone reductase from Phanerochaete chrysosporium: cDNA cloning and regulation of expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Akileswaran, L.; Brock, B.J.; Cereghino, J.L.

1999-02-01

A cDNA clone encoding a quinone reductase (QR) from the white rot basidiomycete Phanerochaete chrysosporium was isolated and sequenced. The cDNA consisted of 1,007 nucleotides and a poly(A) tail and encoded a deduced protein containing 271 amino acids. The experimentally determined eight-amino-acid N-germinal sequence of the purified QR protein from P. chrysosporium matched amino acids 72 to 79 of the predicted translation product of the cDNA. The M{sub r} of the predicted translation product, beginning with Pro-72, was essentially identical to the experimentally determined M{sub r} of one monomer of the QR dimer, and this finding suggested that QR ismore » synthesized as a proenzyme. The results of in vitro transcription-translation experiments suggested that QR is synthesized as a proenzyme with a 71-amino-acid leader sequence. This leader sequence contains two potential KEX2 cleavage sites and numerous potential cleavage sites for dipeptidyl aminopeptidase. The QR activity in cultures of P. chrysosporium increased following the addition of 2-dimethoxybenzoquinone, vanillic acid, or several other aromatic compounds. An immunoblot analysis indicated that induction resulted in an increase in the amount of QR protein, and a Northern blot analysis indicated that this regulation occurs at the level of the qr mRNA.« less
Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

PubMed

Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

2016-03-02

Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Shifts in phylogenetic diversity of archaeal communities in mangrove sediments at different sites and depths in southeastern Brazil.

PubMed

Mendes, Lucas William; Taketani, Rodrigo Gouvêa; Navarrete, Acácio Aparecido; Tsai, Siu Mui

2012-06-01

This study focused on the structure and composition of archaeal communities in sediments of tropical mangroves in order to obtain sufficient insight into two Brazilian sites from different locations (one pristine and another located in an urban area) and at different depth levels from the surface. Terminal restriction fragment length polymorphism (T-RFLP) of PCR-amplified 16S rRNA gene fragments was used to scan the archaeal community structure, and 16S rRNA gene clone libraries were used to determine the community composition. Redundancy analysis of T-RFLP patterns revealed differences in archaeal community structure according to location, depth and soil attributes. Parameters such as pH, organic matter, potassium and magnesium presented significant correlation with general community structure. Furthermore, phylogenetic analysis revealed a community composition distributed differently according to depth where, in shallow samples, 74.3% of sequences were affiliated with Euryarchaeota and 25.7% were shared between Crenarchaeota and Thaumarchaeota, while for the deeper samples, 24.3% of the sequences were affiliated with Euryarchaeota and 75.7% with Crenarchaeota and Thaumarchaeota. Archaeal diversity measurements based on 16S rRNA gene clone libraries decreased with increasing depth and there was a greater difference between depths (<18% of sequences shared) than sites (>25% of sequences shared). Taken together, our findings indicate that mangrove ecosystems support a diverse archaeal community; it might possibly be involved in nutrient cycles and are affected by sediment properties, depth and distinct locations. Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Methylation analysis of plasma cell-free DNA for breast cancer early detection using bisulfite next-generation sequencing.

PubMed

Li, Zibo; Guo, Xinwu; Tang, Lili; Peng, Limin; Chen, Ming; Luo, Xipeng; Wang, Shouman; Xiao, Zhi; Deng, Zhongping; Dai, Lizhong; Xia, Kun; Wang, Jun

2016-10-01

Circulating cell-free DNA (cfDNA) has been considered as a potential biomarker for non-invasive cancer detection. To evaluate the methylation levels of six candidate genes (EGFR, GREM1, PDGFRB, PPM1E, SOX17, and WRN) in plasma cfDNA as biomarkers for breast cancer early detection, quantitative analysis of the promoter methylation of these genes from 86 breast cancer patients and 67 healthy controls was performed by using microfluidic-PCR-based target enrichment and next-generation bisulfite sequencing technology. The predictive performance of different logistic models based on methylation status of candidate genes was investigated by means of the area under the ROC curve (AUC) and odds ratio (OR) analysis. Results revealed that EGFR, PPM1E, and 8 gene-specific CpG sites showed significantly hypermethylation in cancer patients' plasma and significantly associated with breast cancer (OR ranging from 2.51 to 9.88). The AUC values for these biomarkers were ranging from 0.66 to 0.75. Combinations of multiple hypermethylated genes or CpG sites substantially improved the predictive performance for breast cancer detection. Our study demonstrated the feasibility of quantitative measurement of candidate gene methylation in cfDNA by using microfluidic-PCR-based target enrichment and bisulfite next-generation sequencing, which is worthy of further validation and potentially benefits a broad range of applications in clinical oncology practice. Quantitative analysis of methylation pattern of plasma cfDNA by next-generation sequencing might be a valuable non-invasive tool for early detection of breast cancer.
Genetic diversity analysis of Leuconostoc mesenteroides from Korean vegetables and food products by multilocus sequence typing.

PubMed

Sharma, Anshul; Kaur, Jasmine; Lee, Sulhee; Park, Young-Seo

2018-06-01

In the present study, 35 Leuconostoc mesenteroides strains isolated from vegetables and food products from South Korea were studied by multilocus sequence typing (MLST) of seven housekeeping genes (atpA, groEL, gyrB, pheS, pyrG, rpoA, and uvrC). The fragment sizes of the seven amplified housekeeping genes ranged in length from 366 to 1414 bp. Sequence analysis indicated 27 different sequence types (STs) with 25 of them being represented by a single strain indicating high genetic diversity, whereas the remaining 2 were characterized by five strains each. In total, 220 polymorphic nucleotide sites were detected among seven housekeeping genes. The phylogenetic analysis based on the STs of the seven loci indicated that the 35 strains belonged to two major groups, A (28 strains) and B (7 strains). Split decomposition analysis showed that intraspecies recombination played a role in generating diversity among strains. The minimum spanning tree showed that the evolution of the STs was not correlated with food source. This study signifies that the multilocus sequence typing is a valuable tool to access the genetic diversity among L. mesenteroides strains from South Korea and can be used further to monitor the evolutionary changes.
cDNA, genomic sequence cloning and overexpression of ribosomal protein S25 gene (RPS25) from the Giant Panda.

PubMed

Hao, Yan-Zhe; Hou, Wan-Ru; Hou, Yi-Ling; Du, Yu-Jie; Zhang, Tian; Peng, Zheng-Song

2009-11-01

RPS25 is a component of the 40S small ribosomal subunit encoded by RPS25 gene, which is specific to eukaryotes. Studies in reference to RPS25 gene from animals were handful. The Giant Panda (Ailuropoda melanoleuca), known as a "living fossil", are increasingly concerned by the world community. Studies on RPS25 of the Giant Panda could provide scientific data for inquiring into the hereditary traits of the gene and formulating the protective strategy for the Giant Panda. The cDNA of the RPS25 cloned from Giant Panda is 436 bp in size, containing an open reading frame of 378 bp encoding 125 amino acids. The length of the genomic sequence is 1,992 bp, which was found to possess four exons and three introns. Alignment analysis indicated that the nucleotide sequence of the coding sequence shows a high homology to those of Homo sapiens, Bos taurus, Mus musculus and Rattus norvegicus as determined by Blast analysis, 92.6, 94.4, 89.2 and 91.5%, respectively. Primary structure analysis revealed that the molecular weight of the putative RPS25 protein is 13.7421 kDa with a theoretical pI 10.12. Topology prediction showed there is one N-glycosylation site, one cAMP and cGMP-dependent protein kinase phosphorylation site, two Protein kinase C phosphorylation sites and one Tyrosine kinase phosphorylation site in the RPS25 protein of the Giant Panda. The RPS25 gene was overexpressed in E. coli BL21 and Western Blotting of the RPS25 protein was also done. The results indicated that the RPS25 gene can be really expressed in E. coli and the RPS25 protein fusioned with the N-terminally his-tagged form gave rise to the accumulation of an expected 17.4 kDa polypeptide. The cDNA and the genomic sequence of RPS25 were cloned successfully for the first time from the Giant Panda using RT-PCR technology and Touchdown-PCR, respectively, which were both sequenced and analyzed preliminarily; then the cDNA of the RPS25 gene was overexpressed in E. coli BL21 and immunoblotted, which is the first report on the RPS25 gene from the Giant Panda. The data will enrich and supplement the information about RPS25, which will contribute to the protection for gene resources and the discussion of the genetic polymorphism.
Uncertainties in Eddy Covariance fluxes due to post-field data processing: a multi-site, full factorial analysis

NASA Astrophysics Data System (ADS)

Sabbatini, S.; Fratini, G.; Arriga, N.; Papale, D.

2012-04-01

Eddy Covariance (EC) is the only technologically available direct method to measure carbon and energy fluxes between ecosystems and atmosphere. However, uncertainties related to this method have not been exhaustively assessed yet, including those deriving from post-field data processing. The latter arise because there is no exact processing sequence established for any given situation, and the sequence itself is long and complex, with many processing steps and options available. However, the consistency and inter-comparability of flux estimates may be largely affected by the adoption of different processing sequences. The goal of our work is to quantify the uncertainty introduced in each processing step by the fact that different options are available, and to study how the overall uncertainty propagates throughout the processing sequence. We propose an easy-to-use methodology to assign a confidence level to the calculated fluxes of energy and mass, based on the adopted processing sequence, and on available information such as the EC system type (e.g. open vs. closed path), the climate and the ecosystem type. The proposed methodology synthesizes the results of a massive full-factorial experiment. We use one year of raw data from 15 European flux stations and process them so as to cover all possible combinations of the available options across a selection of the most relevant processing steps. The 15 sites have been selected to be representative of different ecosystems (forests, croplands and grasslands), climates (mediterranean, nordic, arid and humid) and instrumental setup (e.g. open vs. closed path). The software used for this analysis is EddyPro™ 3.0 (www.licor.com/eddypro). The critical processing steps, selected on the basis of the different options commonly used in the FLUXNET community, are: angle of attack correction; coordinate rotation; trend removal; time lag compensation; low- and high- frequency spectral correction; correction for air density fluctuations; and length of the flux averaging interval. We illustrate the results of the full-factorial combination relative to a subset of the selected sites with particular emphasis on the total uncertainty at different time scales and aggregations, as well as a preliminary analysis of the most critical steps for their contribution to the total uncertainties and their potential relation with site set-up characteristics and ecosystem type.
Novel Molecular Method for Identification of Streptococcus pneumoniae Applicable to Clinical Microbiology and 16S rRNA Sequence-Based Microbiome Studies

PubMed Central

Scholz, Christian F. P.; Poulsen, Knud

2012-01-01

The close phylogenetic relationship of the important pathogen Streptococcus pneumoniae and several species of commensal streptococci, particularly Streptococcus mitis and Streptococcus pseudopneumoniae, and the recently demonstrated sharing of genes and phenotypic traits previously considered specific for S. pneumoniae hamper the exact identification of S. pneumoniae. Based on sequence analysis of 16S rRNA genes of a collection of 634 streptococcal strains, identified by multilocus sequence analysis, we detected a cytosine at position 203 present in all 440 strains of S. pneumoniae but replaced by an adenosine residue in all strains representing other species of mitis group streptococci. The S. pneumoniae-specific sequence signature could be demonstrated by sequence analysis or indirectly by restriction endonuclease digestion of a PCR amplicon covering the site. The S. pneumoniae-specific signature offers an inexpensive means for validation of the identity of clinical isolates and should be used as an integrated marker in the annotation procedure employed in 16S rRNA-based molecular studies of complex human microbiotas. This may avoid frequent misidentifications such as those we demonstrate to have occurred in previous reports and in reference sequence databases. PMID:22442329
Detection of possible restriction sites for type II restriction enzymes in DNA sequences.

PubMed

Gagniuc, P; Cimponeriu, D; Ionescu-Tîrgovişte, C; Mihai, Andrada; Stavarachi, Monica; Mihai, T; Gavrilă, L

2011-01-01

In order to make a step forward in the knowledge of the mechanism operating in complex polygenic disorders such as diabetes and obesity, this paper proposes a new algorithm (PRSD -possible restriction site detection) and its implementation in Applied Genetics software. This software can be used for in silico detection of potential (hidden) recognition sites for endonucleases and for nucleotide repeats identification. The recognition sites for endonucleases may result from hidden sequences through deletion or insertion of a specific number of nucleotides. Tests were conducted on DNA sequences downloaded from NCBI servers using specific recognition sites for common type II restriction enzymes introduced in the software database (n = 126). Each possible recognition site indicated by the PRSD algorithm implemented in Applied Genetics was checked and confirmed by NEBcutter V2.0 and Webcutter 2.0 software. In the sequence NG_008724.1 (which includes 63632 nucleotides) we found a high number of potential restriction sites for ECO R1 that may be produced by deletion (n = 43 sites) or insertion (n = 591 sites) of one nucleotide. The second module of Applied Genetics has been designed to find simple repeats sizes with a real future in understanding the role of SNPs (Single Nucleotide Polymorphisms) in the pathogenesis of the complex metabolic disorders. We have tested the presence of simple repetitive sequences in five DNA sequence. The software indicated exact position of each repeats detected in the tested sequences. Future development of Applied Genetics can provide an alternative for powerful tools used to search for restriction sites or repetitive sequences or to improve genotyping methods.
New splicing mutation in the choline kinase beta (CHKB) gene causing a muscular dystrophy detected by whole-exome sequencing.

PubMed

Oliveira, Jorge; Negrão, Luís; Fineza, Isabel; Taipa, Ricardo; Melo-Pires, Manuel; Fortuna, Ana Maria; Gonçalves, Ana Rita; Froufe, Hugo; Egas, Conceição; Santos, Rosário; Sousa, Mário

2015-06-01

Muscular dystrophies (MDs) are a group of hereditary muscle disorders that include two particularly heterogeneous subgroups: limb-girdle MD and congenital MD, linked to 52 different genes (seven common to both subgroups). Massive parallel sequencing technology may avoid the usual stepwise gene-by-gene analysis. We report the whole-exome sequencing (WES) analysis of a patient with childhood-onset progressive MD, also presenting mental retardation and dilated cardiomyopathy. Conventional sequencing had excluded eight candidate genes. WES of the trio (patient and parents) was performed using the ion proton sequencing system. Data analysis resorted to filtering steps using the GEMINI software revealed a novel silent variant in the choline kinase beta (CHKB) gene. Inspection of sequence alignments ultimately identified the causal variant (CHKB:c.1031+3G>C). This splice site mutation was confirmed using Sanger sequencing and its effect was further evaluated with gene expression analysis. On reassessment of the muscle biopsy, typical abnormal mitochondrial oxidative changes were observed. Mutations in CHKB have been shown to cause phosphatidylcholine deficiency in myofibers, causing a rare form of CMD (only 21 patients reported). Notwithstanding interpretative difficulties that need to be overcome before the integration of WES in the diagnostic workflow, this work corroborates its utility in solving cases from highly heterogeneous groups of diseases, in which conventional diagnostic approaches fail to provide a definitive diagnosis.
Screening and Characterization of RAPD Markers in Viscerotropic Leishmania Parasites

PubMed Central

Mkada–Driss, Imen; Talbi, Chiraz; Guerbouj, Souheila; Driss, Mehdi; Elamine, Elwaleed M.; Cupolillo, Elisa; Mukhtar, Moawia M.; Guizani, Ikram

2014-01-01

Visceral leishmaniasis (VL) is mainly due to the Leishmania donovani complex. VL is endemic in many countries worldwide including East Africa and the Mediterranean region where the epidemiology is complex. Taxonomy of these pathogens is under controversy but there is a correlation between their genetic diversity and geographical origin. With steady increase in genome knowledge, RAPD is still a useful approach to identify and characterize novel DNA markers. Our aim was to identify and characterize polymorphic DNA markers in VL Leishmania parasites in diverse geographic regions using RAPD in order to constitute a pool of PCR targets having the potential to differentiate among the VL parasites. 100 different oligonucleotide decamers having arbitrary DNA sequences were screened for reproducible amplification and a selection of 28 was used to amplify DNA from 12 L. donovani, L. archibaldi and L. infantum strains having diverse origins. A total of 155 bands were amplified of which 60.65% appeared polymorphic. 7 out of 28 primers provided monomorphic patterns. Phenetic analysis allowed clustering the parasites according to their geographical origin. Differentially amplified bands were selected, among them 22 RAPD products were successfully cloned and sequenced. Bioinformatic analysis allowed mapping of the markers and sequences and priming sites analysis. This study was complemented with Southern-blot to confirm assignment of markers to the kDNA. The bioinformatic analysis identified 16 nuclear and 3 minicircle markers. Analysis of these markers highlighted polymorphisms at RAPD priming sites with mainly 5′ end transversions, and presence of inter– and intra– taxonomic complex sequence and microsatellites variations; a bias in transitions over transversions and indels between the different sequences compared is observed, which is however less marked between L. infantum and L. donovani. The study delivers a pool of well-documented polymorphic DNA markers, to develop molecular diagnostics assays to characterize and differentiate VL causing agents. PMID:25313833
Sequence features and phylogenetic analysis of the stress protein Hsp90α in chinook salmon Oncorhynchus tshawytscha, a poikilothermic vertebrate

USGS Publications Warehouse

Palmisano, Aldo N.; Winton, James R.; Dickhoff, Walton W.

1999-01-01

We cloned and sequenced a chinook salmon Hsp90 cDNA; sequence analysis shows it to be Hsp90??. Phylogenetic analysis supports the hypothesis that ?? and ?? paralogs of Hsp90 arose as a result of a gene duplication event and that they diverged early in the evolution of vertebrates, before tetrapods separated from the teleost lineage. Among several differences distinguishing poikilothermic Hsp90?? sequences from their bird and mammal orthologs, the teleost versions specifically lack a characteristic QTQDQP phosphorylation site near the N-terminus. We used the cDNA to develop an RNA (Northern) blot to quantify cellular Hsp90 mRNA levels. Chinook salmon embryonic (CHSE-214) cells responded to heat shock with a rapid rise in Hsp90 mRNA through 4 h, followed by a gradual decline over the next 20 h. Hsp90 mRNA level may be useful as a stress indicator, especially in a laboratory setting or in response to acute heat stress.
Subsurface Analysis of the Mesaverde Group on and near the Jicarilla Apache Indian Reservation, New Mexico-its implication on Sites of Oil and Gas Accumulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ridgley, Jennie

2001-08-21

The purpose of the phase 2 Mesaverde study part of the Department of Energy funded project ''Analysis of oil-bearing Cretaceous Sandstone Hydrocarbon Reservoirs, exclusive of the Dakota Sandstone, on the Jicarilla Apache Indian Reservation, New Mexico'' was to define the facies of the oil-producing units within the subsurface units of the Mesaverde Group and integrate these results with outcrop studies that defined the depositional environments of these facies within a sequence stratigraphic context. The focus of this report will center on (1) integration of subsurface correlations with outcrop correlations of components of the Mesaverde, (2) application of the sequence stratigraphicmore » model determined in the phase one study to these correlations, (3) determination of the facies distribution of the Mesaverde Group and their relationship to sites of oil and gas accumulation, (4) evaluation of the thermal maturity and potential source rocks for oil and gas in the Mesaverde Group, and (5) evaluation of the structural features on the Reservation as they may control sites of oil accumulation.« less

The HIP1 binding site is required for growth regulation of the dihydrofolate reductase gene promoter.

PubMed

Means, A L; Slansky, J E; McMahon, S L; Knuth, M W; Farnham, P J

1992-03-01

The transcription rate of the dihydrofolate reductase (DHFR) gene increases at the G1/S boundary of the proliferative cell cycle. Through analysis of transiently and stably transfected NIH 3T3 cells, we have now demonstrated that DHFR promoter sequences extending from -270 to +20 are sufficient to confer similar regulation on a reporter gene. Mutation of a protein binding site that spans sequences from -16 to +11 in the DHFR promoter resulted in loss of the transcriptional increase at the G1/S boundary. Purification of an activity from HeLa nuclear extract that binds to this region enriched for a 180-kDa polypeptide (HIP1). Using this HIP1 preparation, we have identified specific positions within the binding site that are critical for efficient protein-DNA interactions. An analysis of association and dissociation rates suggests that bound HIP1 protein can exchange rapidly with free protein. This rapid exchange may facilitate the burst of transcriptional activity from the DHFR promoter at the G1/S boundary.
The HIP1 binding site is required for growth regulation of the dihydrofolate reductase gene promoter.

PubMed Central

Means, A L; Slansky, J E; McMahon, S L; Knuth, M W; Farnham, P J

1992-01-01

The transcription rate of the dihydrofolate reductase (DHFR) gene increases at the G1/S boundary of the proliferative cell cycle. Through analysis of transiently and stably transfected NIH 3T3 cells, we have now demonstrated that DHFR promoter sequences extending from -270 to +20 are sufficient to confer similar regulation on a reporter gene. Mutation of a protein binding site that spans sequences from -16 to +11 in the DHFR promoter resulted in loss of the transcriptional increase at the G1/S boundary. Purification of an activity from HeLa nuclear extract that binds to this region enriched for a 180-kDa polypeptide (HIP1). Using this HIP1 preparation, we have identified specific positions within the binding site that are critical for efficient protein-DNA interactions. An analysis of association and dissociation rates suggests that bound HIP1 protein can exchange rapidly with free protein. This rapid exchange may facilitate the burst of transcriptional activity from the DHFR promoter at the G1/S boundary. Images PMID:1545788
Confirmation of translatability and functionality certifies the dual endothelin1/VEGFsp receptor (DEspR) protein.

PubMed

Herrera, Victoria L M; Steffen, Martin; Moran, Ann Marie; Tan, Glaiza A; Pasion, Khristine A; Rivera, Keith; Pappin, Darryl J; Ruiz-Opazo, Nelson

2016-06-14

In contrast to rat and mouse databases, the NCBI gene database lists the human dual-endothelin1/VEGFsp receptor (DEspR, formerly Dear) as a unitary transcribed pseudogene due to a stop [TGA]-codon at codon#14 in automated DNA and RNA sequences. However, re-analysis is needed given prior single gene studies detected a tryptophan [TGG]-codon#14 by manual Sanger sequencing, demonstrated DEspR translatability and functionality, and since the demonstration of actual non-translatability through expression studies, the standard-of-excellence for pseudogene designation, has not been performed. Re-analysis must meet UNIPROT criteria for demonstration of a protein's existence at the highest (protein) level, which a priori, would override DNA- or RNA-based deductions. To dissect the nucleotide sequence discrepancy, we performed Maxam-Gilbert sequencing and reviewed 727 RNA-seq entries. To comply with the highest level multiple UNIPROT criteria for determining DEspR's existence, we performed various experiments using multiple anti-DEspR monoclonal antibodies (mAbs) targeting distinct DEspR epitopes with one spanning the contested tryptophan [TGG]-codon#14, assessing: (a) DEspR protein expression, (b) predicted full-length protein size, (c) sequence-predicted protein-specific properties beyond codon#14: receptor glycosylation and internalization, (d) protein-partner interactions, and (e) DEspR functionality via DEspR-inhibition effects. Maxam-Gilbert sequencing and some RNA-seq entries demonstrate two guanines, hence a tryptophan [TGG]-codon#14 within a compression site spanning an error-prone compression sequence motif. Western blot analysis using anti-DEspR mAbs targeting distinct DEspR epitopes detect the identical glycosylated 17.5 kDa pull-down protein. Decrease in DEspR-protein size after PNGase-F digest demonstrates post-translational glycosylation, concordant with the consensus-glycosylation site beyond codon#14. Like other small single-transmembrane proteins, mass spectrometry analysis of anti-DEspR mAb pull-down proteins do not detect DEspR, but detect DEspR-protein interactions with proteins implicated in intracellular trafficking and cancer. FACS analyses also detect DEspR-protein in different human cancer stem-like cells (CSCs). DEspR-inhibition studies identify DEspR-roles in CSC survival and growth. Live cell imaging detects fluorescently-labeled anti-DEspR mAb targeted-receptor internalization, concordant with the single internalization-recognition sequence also located beyond codon#14. Data confirm translatability of DEspR, the full-length DEspR protein beyond codon#14, and elucidate DEspR-specific functionality. Along with detection of the tryptophan [TGG]-codon#14 within an error-prone compression site, cumulative data demonstrating DEspR protein existence fulfill multiple UNIPROT criteria, thus refuting its pseudogene designation.
Secure Genomic Computation through Site-Wise Encryption

PubMed Central

Zhao, Yongan; Wang, XiaoFeng; Tang, Haixu

2015-01-01

Commercial clouds provide on-demand IT services for big-data analysis, which have become an attractive option for users who have no access to comparable infrastructure. However, utilizing these services for human genome analysis is highly risky, as human genomic data contains identifiable information of human individuals and their disease susceptibility. Therefore, currently, no computation on personal human genomic data is conducted on public clouds. To address this issue, here we present a site-wise encryption approach to encrypt whole human genome sequences, which can be subject to secure searching of genomic signatures on public clouds. We implemented this method within the Hadoop framework, and tested it on the case of searching disease markers retrieved from the ClinVar database against patients’ genomic sequences. The secure search runs only one order of magnitude slower than the simple search without encryption, indicating our method is ready to be used for secure genomic computation on public clouds. PMID:26306278
DNA mimic proteins: functions, structures, and bioinformatic analysis.

PubMed

Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

2014-05-13

DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.
Secure Genomic Computation through Site-Wise Encryption.

PubMed

Zhao, Yongan; Wang, XiaoFeng; Tang, Haixu

2015-01-01

Commercial clouds provide on-demand IT services for big-data analysis, which have become an attractive option for users who have no access to comparable infrastructure. However, utilizing these services for human genome analysis is highly risky, as human genomic data contains identifiable information of human individuals and their disease susceptibility. Therefore, currently, no computation on personal human genomic data is conducted on public clouds. To address this issue, here we present a site-wise encryption approach to encrypt whole human genome sequences, which can be subject to secure searching of genomic signatures on public clouds. We implemented this method within the Hadoop framework, and tested it on the case of searching disease markers retrieved from the ClinVar database against patients' genomic sequences. The secure search runs only one order of magnitude slower than the simple search without encryption, indicating our method is ready to be used for secure genomic computation on public clouds.
Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites.

PubMed

Fang, Chun; Noguchi, Tamotsu; Yamana, Hayato

2014-10-01

Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.
A stochastic context free grammar based framework for analysis of protein sequences

PubMed Central

Dyrka, Witold; Nebel, Jean-Christophe

2009-01-01

Background In the last decade, there have been many applications of formal language theory in bioinformatics such as RNA structure prediction and detection of patterns in DNA. However, in the field of proteomics, the size of the protein alphabet and the complexity of relationship between amino acids have mainly limited the application of formal language theory to the production of grammars whose expressive power is not higher than stochastic regular grammars. However, these grammars, like other state of the art methods, cannot cover any higher-order dependencies such as nested and crossing relationships that are common in proteins. In order to overcome some of these limitations, we propose a Stochastic Context Free Grammar based framework for the analysis of protein sequences where grammars are induced using a genetic algorithm. Results This framework was implemented in a system aiming at the production of binding site descriptors. These descriptors not only allow detection of protein regions that are involved in these sites, but also provide insight in their structure. Grammars were induced using quantitative properties of amino acids to deal with the size of the protein alphabet. Moreover, we imposed some structural constraints on grammars to reduce the extent of the rule search space. Finally, grammars based on different properties were combined to convey as much information as possible. Evaluation was performed on sites of various sizes and complexity described either by PROSITE patterns, domain profiles or a set of patterns. Results show the produced binding site descriptors are human-readable and, hence, highlight biologically meaningful features. Moreover, they achieve good accuracy in both annotation and detection. In addition, findings suggest that, unlike current state-of-the-art methods, our system may be particularly suited to deal with patterns shared by non-homologous proteins. Conclusion A new Stochastic Context Free Grammar based framework has been introduced allowing the production of binding site descriptors for analysis of protein sequences. Experiments have shown that not only is this new approach valid, but produces human-readable descriptors for binding sites which have been beyond the capability of current machine learning techniques. PMID:19814800
Genetic variability in Melipona quinquefasciata (Hymenoptera, Apidae, Meliponini) from northeastern Brazil determined using the first internal transcribed spacer (ITS1).

PubMed

Pereira, J O P; Freitas, B M; Jorge, D M M; Torres, D C; Soares, C E A; Grangeiro, T B

2009-01-01

Melipona quinquefasciata is a ground-nesting South American stingless bee whose geographic distribution was believed to comprise only the central and southern states of Brazil. We obtained partial sequences (about 500-570 bp) of first internal transcribed spacer (ITS1) nuclear ribosomal DNA from Melipona specimens putatively identified as M. quinquefasciata collected from different localities in northeastern Brazil. To confirm the taxonomic identity of the northeastern samples, specimens from the state of Goiás (Central region of Brazil) were included for comparison. All sequences were deposited in GenBank (accession numbers EU073751-EU073759). The mean nucleotide divergence (excluding sites with insertions/deletions) in the ITS1 sequences was only 1.4%, ranging from 0 to 4.1%. When the sites with insertions/deletions were also taken into account, sequence divergences varied from 0 to 5.3%. In all pairwise comparisons, the ITS1 sequence from the specimens collected in Goiás was most divergent compared to the ITS1 sequences of the bees from the other locations. However, neighbor-joining phylogenetic analysis showed that all ITS1 sequences from northeastern specimens along with the sample of Goiás were resolved in a single clade with a bootstrap support of 100%. The ITS1 sequencing data thus support the occurrence of M. quinquefasciata in northeast Brazil.
Functional organization of a single nif cluster in the mesophilic archaeon Methanosarcina mazei strain Gö1

PubMed Central

Ehlers, Claudia; Veit, Katharina; Gottschalk, Gerhard; Schmitz, Ruth A.

2002-01-01

The mesophilic methanogenic archaeon Methanosarcina mazei strain Gö1 is able to utilize molecular nitrogen (N2) as its sole nitrogen source. We have identified and characterized a single nitrogen fixation (nif) gene cluster in M. mazei Gö1 with an approximate length of 9 kbp. Sequence analysis revealed seven genes with sequence similarities to nifH, nifI1, nifI2, nifD, nifK, nifE and nifN, similar to other diazotrophic methanogens and certain bacteria such as Clostridium acetobutylicum, with the two glnB-like genes (nifI1 and nifI2) located between nifH and nifD. Phylogenetic analysis of deduced amino acid sequences for the nitrogenase structural genes of M. mazei Gö1 showed that they are most closely related to Methanosarcina barkeri nif2 genes, and also closely resemble those for the corresponding nif products of the gram-positive bacterium C. acetobutylicum. Northern blot analysis and reverse transcription PCR analysis demonstrated that the M. mazei nif genes constitute an operon transcribed only under nitrogen starvation as a single 8 kb transcript. Sequence analysis revealed a palindromic sequence at the transcriptional start site in front of the M. mazei nifH gene, which may have a function in transcriptional regulation of the nif operon. PMID:15803652
MinION Analysis and Reference Consortium: Phase 1 data release and analysis

PubMed Central

Eccles, David A.; Zalunin, Vadim; Urban, John M.; Piazza, Paolo; Bowden, Rory J.; Paten, Benedict; Mwaigwisya, Solomon; Batty, Elizabeth M.; Simpson, Jared T.; Snutch, Terrance P.

2015-01-01

The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance. PMID:26834992
Identification of Colletotrichum spp. isolated from strawberry in Zhejiang Province and Shanghai City, China*

PubMed Central

Xie, Liu; Zhang, Jing-ze; Wan, Yao; Hu, Dong-wei

2010-01-01

Strawberry anthracnose, caused by Colletotrichum spp., is a major disease of cultivated strawberry. This study identifies 31 isolates of Colletotrichum spp. which cause strawberry anthracnose in Zhejiang Province and Shanghai City, China. Eleven isolates were identified as C. acutatum, 10 as C. gloeosporioides and 10 as C. fragariae based on morphological characteristics, phylogenetic and sequence analyses. Species-specific polymerase chain reaction (PCR) and enzyme digestion further confirmed the identification of the Colletotrichum spp., demonstrating that these three species are currently the causal agents of strawberry anthracnose in the studied regions. Based on analysis of rDNA internal transcribed spacers (ITS) sequences, sequences of all C. acutatum were identical, and little genetic variability was observed between C. fragariae and C. gloeosporioides. However, the conservative nature of the MvnI specific site from isolates of C. gloeosporioides was confirmed, and this site could be used to differentiate C. gloeosporioides from C. fragariae. PMID:20043353
Amino acid sequence of tyrosinase from Neurospora crassa.

PubMed Central

Lerch, K

1978-01-01

The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279
Functional specificity of a Hox protein mediated by the recognition of minor groove structure.

PubMed

Joshi, Rohit; Passner, Jonathan M; Rohs, Remo; Jain, Rinku; Sosinsky, Alona; Crickmore, Michael A; Jacob, Vinitha; Aggarwal, Aneel K; Honig, Barry; Mann, Richard S

2007-11-02

The recognition of specific DNA-binding sites by transcription factors is a critical yet poorly understood step in the control of gene expression. Members of the Hox family of transcription factors bind DNA by making nearly identical major groove contacts via the recognition helices of their homeodomains. In vivo specificity, however, often depends on extended and unstructured regions that link Hox homeodomains to a DNA-bound cofactor, Extradenticle (Exd). Using a combination of structure determination, computational analysis, and in vitro and in vivo assays, we show that Hox proteins recognize specific Hox-Exd binding sites via residues located in these extended regions that insert into the minor groove but only when presented with the correct DNA sequence. Our results suggest that these residues, which are conserved in a paralog-specific manner, confer specificity by recognizing a sequence-dependent DNA structure instead of directly reading a specific DNA sequence.
Depositional sequence analysis and sedimentologic modeling for improved prediction of Pennsylvanian reservoirs (Annex 1). Annual report, February 1, 1991--January 31, 1992

DOE Office of Scientific and Technical Information (OSTI.GOV)

Watney, W.L.

1992-08-01

Interdisciplinary studies of the Upper Pennsylvanian Lansing and Kansas City groups have been undertaken in order to improve the geologic characterization of petroleum reservoirs and to develop a quantitative understanding of the processes responsible for formation of associated depositional sequences. To this end, concepts and methods of sequence stratigraphy are being used to define and interpret the three-dimensional depositional framework of the Kansas City Group. The investigation includes characterization of reservoir rocks in oil fields in western Kansas, description of analog equivalents in near-surface and surface sites in southeastern Kansas, and construction of regional structural and stratigraphic framework to linkmore » the site specific studies. Geologic inverse and simulation models are being developed to integrate quantitative estimates of controls on sedimentation to produce reconstructions of reservoir-bearing strata in an attempt to enhance our ability to predict reservoir characteristics.« less
Depositional sequence analysis and sedimentologic modeling for improved prediction of Pennsylvanian reservoirs (Annex 1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Watney, W.L.

1992-01-01

Interdisciplinary studies of the Upper Pennsylvanian Lansing and Kansas City groups have been undertaken in order to improve the geologic characterization of petroleum reservoirs and to develop a quantitative understanding of the processes responsible for formation of associated depositional sequences. To this end, concepts and methods of sequence stratigraphy are being used to define and interpret the three-dimensional depositional framework of the Kansas City Group. The investigation includes characterization of reservoir rocks in oil fields in western Kansas, description of analog equivalents in near-surface and surface sites in southeastern Kansas, and construction of regional structural and stratigraphic framework to linkmore » the site specific studies. Geologic inverse and simulation models are being developed to integrate quantitative estimates of controls on sedimentation to produce reconstructions of reservoir-bearing strata in an attempt to enhance our ability to predict reservoir characteristics.« less
Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds.

PubMed

Mariani, Luca; Weinand, Kathryn; Vedenko, Anastasia; Barrera, Luis A; Bulyk, Martha L

2017-09-27

Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs. Copyright © 2017 Elsevier Inc. All rights reserved.
Flanking signal and mature peptide residues influence signal peptide cleavage

PubMed Central

Choo, Khar Heng; Ranganathan, Shoba

2008-01-01

Background Signal peptides (SPs) mediate the targeting of secretory precursor proteins to the correct subcellular compartments in prokaryotes and eukaryotes. Identifying these transient peptides is crucial to the medical, food and beverage and biotechnology industries yet our understanding of these peptides remains limited. This paper examines the most common type of signal peptides cleavable by the endoprotease signal peptidase I (SPase I), and the residues flanking the cleavage sites of three groups of signal peptide sequences, namely (i) eukaryotes (Euk) (ii) Gram-positive (Gram+) bacteria, and (iii) Gram-negative (Gram-) bacteria. Results In this study, 2352 secretory peptide sequences from a variety of organisms with amino-terminal SPs are extracted from the manually curated SPdb database for analysis based on physicochemical properties such as pI, aliphatic index, GRAVY score, hydrophobicity, net charge and position-specific residue preferences. Our findings show that the three groups share several similarities in general, but they display distinctive features upon examination in terms of their amino acid compositions and frequencies, and various physico-chemical properties. Thus, analysis or prediction of their sequences should be separated and treated as distinct groups. Conclusion We conclude that the peptide segment recognized by SPase I extends to the start of the mature protein to a limited extent, upon our survey of the amino acid residues surrounding the cleavage processing site. These flanking residues possibly influence the cleavage processing and contribute to non-canonical cleavage sites. Our findings are applicable in defining more accurate prediction tools for recognition and identification of cleavage site of SPs. PMID:19091014
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring

PubMed Central

2012-01-01

Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.

PubMed

Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl

2012-07-13

Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.

Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.

PubMed

Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T

1996-02-01

A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.
The challenge of annotating protein sequences: The tale of eight domains of unknown function in Pfam.

PubMed

Goonesekere, Nalin C W; Shipely, Krysten; O'Connor, Kevin

2010-06-01

The Pfam database is an important tool in genome annotation, since it provides a collection of curated protein families. However, a subset of these families, known as domains of unknown function (DUFs), remains poorly characterized. We have related sequences from DUF404, DUF407, DUF482, DUF608, DUF810, DUF853, DUF976 and DUF1111 to homologs in PDB, within the midnight zone (9-20%) of sequence identity. These relationships were extended to provide functional annotation by sequence analysis and model building. Also described are examples of residue plasticity within enzyme active sites, and change of function within homologous sequences of a DUF. Copyright 2010 Elsevier Ltd. All rights reserved.
Associative diazotrophic bacteria in grass roots and soils from heavy metal contaminated sites.

PubMed

Moreira, Fátima M S; Lange, Anderson; Klauberg-Filho, Osmar; Siqueira, José O; Nóbrega, Rafaela S A; Lima, Adriana S

2008-12-01

This work aimed to evaluate density of associative diazotrophic bacteria populations in soil and grass root samples from heavy metal contaminated sites, and to characterize isolates from these populations, both, phenotypically (Zinc, Cadmium and NaCl tolerance in vitro, and protein profiles) and genotypically (16S rDNA sequencing), as compared to type strains of known diazotrophic species. Densities were evaluated by using NFb, Fam and JNFb media, commonly used for enrichment cultures of diazotrophic bacteria. Bacterial densities found in soil and grass root samples from contaminated sites were similar to those reported for agricultural soils. Azospirillum spp. isolates from contaminated sites and type strains from non-contaminated sites varied substantially in their in vitro tolerance to Zn+2 and Cd+2, being Cd+2 more toxic than Zn+2. Among the most tolerant isolates (UFLA 1S, 1R, S181, S34 and S22), some (1R, S34 and S22) were more tolerant to heavy metals than rhizobia from tropical and temperate soils. The majority of the isolates tolerant to heavy metals were also tolerant to salt stress as indicated by their ability to grow in solid medium supplemented with 30 g L(-1) NaCl. Five isolates exhibited high dissimilarity in protein profiles, and the 16S rDNA sequence analysis of two of them revealed new sequences for Azospirillum.
In silico structural analysis of group 3, 6 and 9 allergens from Dermatophagoides farinae.

PubMed

Teng, Feixiang; Yu, Lili; Bian, Yonghua; Sun, Jinxia; Wu, Juansong; Ling, Cunbao; Yang, Li; Wang, Yungang; Cui, Yubao

2015-05-01

Dermatophagoides farinae (Hughes; Acari: Pyroglyphidae) are the predominant source of dust mite allergens, which provoke allergic diseases, such as rhinitis, asthma and eczema. Of the 30 allergen groups produced by D. farinae, the Der f 3, Der f 6 and Der f 9 allergens are all trypsin‑associated proteins, however little else is currently known about them. The present study used in silico tools to compare the amino acid sequences, and predict the secondary and tertiary structures of Der f 3, Der f 6 and Der f 9 allergens. Protein sequence alignment detected ~46% identity between Der f 3, Der f 6 and Der f 9. Furthermore, each protein was shown to contain three active sites and two highly conserved trypsin functional domains. Predictions of the secondary and tertiary structure identified α‑helices, β‑sheets and random coils. The active sites of the three proteins appeared to fold onto each other in a three‑dimensional model, constituting the active site of the enzyme. Epitope analysis demonstrated that Der f 3, Der f 6 and Der f 9 have 4‑5 potential epitopes located in random coils, and the epitope sequences of Der f 3, Der f 6 and Der f 9 were shown to overlap in two domains (at amino acids 83‑87 and 179‑180); however the residues in these two domains were not identical. The present study aimed to conduct a biochemical and genetic analysis of these three allergens, and to potentially contribute to the development of vaccines for allergen‑specific immunotherapy.
Metabolic primers for detection of (Per)chlorate-reducing bacteria in the environment and phylogenetic analysis of cld gene sequences.

PubMed

Bender, Kelly S; Rice, Melissa R; Fugate, William H; Coates, John D; Achenbach, Laurie A

2004-09-01

Natural attenuation of the environmental contaminant perchlorate is a cost-effective alternative to current removal methods. The success of natural perchlorate remediation is dependent on the presence and activity of dissimilatory (per)chlorate-reducing bacteria (DPRB) within a target site. To detect DPRB in the environment, two degenerate primer sets targeting the chlorite dismutase (cld) gene were developed and optimized. A nested PCR approach was used in conjunction with these primer sets to increase the sensitivity of the molecular detection method. Screening of environmental samples indicated that all products amplified by this method were cld gene sequences. These sequences were obtained from pristine sites as well as contaminated sites from which DPRB were isolated. More than one cld phylotype was also identified from some samples, indicating the presence of more than one DPRB strain at those sites. The use of these primer sets represents a direct and sensitive molecular method for the qualitative detection of (per)chlorate-reducing bacteria in the environment, thus offering another tool for monitoring natural attenuation. Sequences of cld genes isolated in the course of this project were also generated from various DPRB and provided the first opportunity for a phylogenetic treatment of this metabolic gene. Comparisons of the cld and 16S ribosomal DNA (rDNA) gene trees indicated that the cld gene does not track 16S rDNA phylogeny, further implicating the possible role of horizontal transfer in the evolution of (per)chlorate respiration.
Structural analysis of the 5{prime} region of mouse and human Huntington disease genes reveals conservation of putative promoter region and Di- and trinucleotide polymorphisms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Biaoyang; Nasir, J.; Kalchman, M.A.

1995-02-10

We have previously cloned and characterized the murine homologue of the Huntington disease (HD) gene and shown that it maps to mouse chromosome 5 within a region of conserved synteny with human chromosome 4p16.3. Here we present a detailed comparison of the sequence of the putative promoter and the organization of the 5{prime} genomic region of the murine (Hdh) and human HD genes encompassing the first five exons. We show that in this region these two genes share identical exon boundaries, but have different-size introns. Two dinucleotide (CT) and one trinucleotide intronic polymorphism in Hdh and an intronic CA polymorphismmore » in the HD gene were identified. Comparison of 940-bp sequence 5{prime} to the putative translation start site reveals a highly conserved region (78.8% nucleotide identity) between Hdh and the HD gene from nucleotide -56 to -206 (of Hdh). Neither Hdh nor the HD gene have typical TATA or CCAAT elements, but both show one putative AP2 binding site and numerous potential Sp1 binding sites. The high sequence identity between Hdh and the HD gene for approximately 200 bp 5{prime} to the putative translation start site indicates that these sequences may play a role in regulating expression of the Huntington disease gene. 30 refs., 4 figs., 2 tabs.« less
Multilocus sequence analysis of Thermoanaerobacter isolates reveals recombining, but differentiated, populations from geothermal springs of the Uzon Caldera, Kamchatka, Russia

PubMed Central

Wagner, Isaac D.; Varghese, Litty B.; Hemme, Christopher L.; Wiegel, Juergen

2013-01-01

Thermal environments have island-like characteristics and provide a unique opportunity to study population structure and diversity patterns of microbial taxa inhabiting these sites. Strains having ≥98% 16S rRNA gene sequence similarity to the obligately anaerobic Firmicutes Thermoanaerobacter uzonensis were isolated from seven geothermal springs, separated by up to 1600 m, within the Uzon Caldera (Kamchatka, Russian Far East). The intraspecies variation and spatial patterns of diversity for this taxon were assessed by multilocus sequence analysis (MLSA) of 106 strains. Analysis of eight protein-coding loci (gyrB, lepA, leuS, pyrG, recA, recG, rplB, and rpoB) revealed that all loci were polymorphic and that nucleotide substitutions were mostly synonymous. There were 148 variable nucleotide sites across 8003 bp concatenates of the protein-coding loci. While pairwise FST values indicated a small but significant level of genetic differentiation between most subpopulations, there was a negligible relationship between genetic divergence and spatial separation. Strains with the same allelic profile were only isolated from the same hot spring, occasionally from consecutive years, and single locus variant (SLV) sequence types were usually derived from the same spring. While recombination occurred, there was an “epidemic” population structure in which a particular T. uzonensis sequence type rose in frequency relative to the rest of the population. These results demonstrate spatial diversity patterns for an anaerobic bacterial species in a relative small geographic location and reinforce the view that terrestrial geothermal springs are excellent places to look for biogeographic diversity patterns regardless of the involved distances. PMID:23801987
Identification of a DNA sequence motif required for expression of iron-regulated genes in pseudomonads.

PubMed

Rombel, I T; McMorran, B J; Lamont, I L

1995-02-20

Many bacteria respond to a lack of iron in the environment by synthesizing siderophores, which act as iron-scavenging compounds. Fluorescent pseudomonads synthesize strain-specific but chemically related siderophores called pyoverdines or pseudobactins. We have investigated the mechanisms by which iron controls expression of genes involved in pyoverdine metabolism in Pseudomonas aeruginosa. Transcription of these genes is repressed by the presence of iron in the growth medium. Three promoters from these genes were cloned and the activities of the promoters were dependent on the amounts of iron in the growth media. Two of the promoters were sequenced and the transcriptional start site were identified by S1 nuclease analysis. Sequences similar to the consensus binding site for the Fur repressor protein, which controls expression of iron-repressible genes in several gram-negative species, were not present in the promoters, suggesting that they are unlikely to have a high affinity for Fur. However, comparison of the promoter sequences with those of iron-regulated genes from other Pseudomonas species and also the iron-regulated exotoxin gene of P. aeruginosa allowed identification of a shared sequence element, with the consensus sequence (G/C)CTAAAT-CCC, which is likely to act as a binding site for a transcriptional activator protein. Mutations in this sequence greatly reduced the activities of the promoters characterized here as well as those of other iron-regulated promoters. The requirement for this motif in the promoters of iron-regulated genes of different Pseudomonas species indicates that similar mechanisms are likely to be involved in controlling expression of a range of iron-regulated genes in pseudomonads.
Molecular cloning of the pheromone biosynthesis-activating neuropeptide in Helicoverpa zea.

PubMed Central

Davis, M T; Vakharia, V N; Henry, J; Kempe, T G; Raina, A K

1992-01-01

Pheromone biosynthesis-activating neuropeptide (PBAN) regulates sex pheromone biosynthesis in female Helicoverpa (Heliothis) zea. Two oligonucleotide probes representing two overlapping amino acid regions of PBAN were used to screen 2.5 x 10(5) recombinant plaques, and a positive recombinant clone was isolated. Sequence analysis of the isolated clone showed that the PBAN gene is interrupted after the codon encoding amino acid 14 by a 0.63-kilobase (kb) intron. Preceding the PBAN amino acid sequence is a 10-amino acid sequence containing a pentapeptide Phe-Thr-Pro-Arg-Leu, which is followed by a Gly-Arg-Arg processing site. Immediately after the PBAN amino acid sequence is a Gly-Arg processing site and a short stretch of 10 amino acids. This 10-amino acid sequence contains a repeat of the PBAN C-terminal pentapeptide Phe-Ser-Pro-Arg-Leu and is terminated by another Gly-Arg processing site. It is suggested that the PBAN gene in H. zea might carry, besides PBAN, a 7- and an 8-residue amidated peptide, which share with PBAN the core C-terminal pentapeptide Phe-(Ser or Thr)-Pro-Arg-Leu-NH2. The C-terminal pentapeptide sequence of PBAN represents the minimum sequence required for pheromonotropic activity in H. zea and also bears a high degree of homology to the pyrokinin family of insect peptides with myotropic activity. It is possible that the putative heptapeptide and octapeptide might be new members of the pyrokinin family, with pheromonotropic and/or myotropic activities. Thus, the PBAN gene products, besides affecting sexual behavior, might have broad influence on many biological processes in H. zea. Images PMID:1729680
Identification of the HrpS binding site in the hrpL promoter and effect of the RpoN binding site of HrpS on the regulation of the type III secretion system in Erwinia amylovora.

PubMed

Lee, Jae Hoon; Sundin, George W; Zhao, Youfu

2016-06-01

The type III secretion system (T3SS) is a key pathogenicity factor in Erwinia amylovora. Previous studies have demonstrated that the T3SS in E. amylovora is transcriptionally regulated by an RpoN-HrpL sigma factor cascade, which is activated by the bacterial alarmone (p)ppGpp. In this study, the binding site of HrpS, an enhancer binding protein, was identified for the first time in plant-pathogenic bacteria. Complementation of the hrpL mutant with promoter deletion constructs of the hrpL gene and promoter activity analyses using various lengths of the hrpL promoter fused to a promoter-less green fluorescent protein (gfp) reporter gene delineated the upstream region for HrpS binding. Sequence analysis revealed a dyad symmetry sequence between -138 and -125 nucleotides (TGCAA-N4-TTGCA) as the potential HrpS binding site, which is conserved in the promoter of the hrpL gene among plant enterobacterial pathogens. Results of quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) and electrophoresis mobility shift assay coupled with site-directed mutagenesis (SDM) analysis showed that the intact dyad symmetry sequence was essential for HrpS binding, full activation of T3SS gene expression and virulence. In addition, the role of the GAYTGA motif (RpoN binding site) of HrpS in the regulation of T3SS gene expression in E. amylovora was characterized by complementation of the hrpS mutant using mutant variants generated by SDM. Results showed that a Y100F substitution of HrpS complemented the hrpS mutant, whereas Y100A and Y101A substitutions did not. These results suggest that tyrosine (Y) and phenylalanine (F) function interchangeably in the conserved GAYTGA motif of HrpS in E. amylovora. © 2015 BSPP AND JOHN WILEY & SONS LTD.
APADB: a database for alternative polyadenylation and microRNA regulation events

PubMed Central

Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn

2014-01-01

Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703
Construction of transformed, cultured silkworm cells and transgenic silkworm using the site-specific integrase system from phage φC31.

PubMed

Yin, Yajuan; Cao, Guangli; Xue, Renyu; Gong, Chengliang

2014-10-01

The Streptomyces bacteriophage, φC31, uses a site-specific integrase enzyme to perform efficient recombination. The recombination system uses specific sequences to integrate exogenous DNA from the phage into a host. The sequences are known as the attP site in the phage and the attB site in the host. The system can be used as a genetic manipulation tool. In this study it has been applied to the transformation of cultured BmN cells and the construction of transgenic Bombyx mori individuals. A plasmid, pSK-attB/Pie1-EGFP/Zeo-PASV40, containing a cassette designed to express a egfp-zeocin fusion gene, was co-transfected into cultured BmN cells with a helper plasmid, pSK-Pie1/NLS-Int/NSL. Expression of the egfp-zeocin fusion gene was driven by an ie-1 promoter, downstream of a φC31 attB site. The helper plasmid encoded the φC31 integrase enzyme, which was flanked by two nuclear localization signals. Expression of the egfp-zeocin fusion gene could be observed in transformed cells. The two plasmids were also transferred into silkworm eggs to obtain transgenic silkworms. Successful integration of the fusion gene was indicated by the detection of green fluorescence, which was emitted by the silkworms. Nucleotide sequence analysis demonstrated that the attB site had been cut, to allow recombination between the attB and endogenous pseudo attP sites in the cultured silkworm cells and silkworm individuals.
Immune Selection In Vitro Reveals Human Immunodeficiency Virus Type 1 Nef Sequence Motifs Important for Its Immune Evasion Function In Vivo

PubMed Central

Lee, Patricia; Ng, Hwee L.; Yang, Otto O.

2012-01-01

Human immunodeficiency virus type 1 (HIV-1) Nef downregulates major histocompatibility complex class I (MHC-I), impairing the clearance of infected cells by CD8+ cytotoxic T lymphocytes (CTLs). While sequence motifs mediating this function have been determined by in vitro mutagenesis studies of laboratory-adapted HIV-1 molecular clones, it is unclear whether the highly variable Nef sequences of primary isolates in vivo rely on the same sequence motifs. To address this issue, nef quasispecies from nine chronically HIV-1-infected persons were examined for sequence evolution and altered MHC-I downregulatory function under Gag-specific CTL immune pressure in vitro. This selection resulted in decreased nef diversity and strong purifying selection. Site-by-site analysis identified 13 codons undergoing purifying selection and 1 undergoing positive selection. Of the former, only 6 have been reported to have roles in Nef function, including 4 associated with MHC-I downregulation. Functional testing of naturally occurring in vivo polymorphisms at the 7 sites with no previously known functional role revealed 3 mutations (A84D, Y135F, and G140R) that ablated MHC-I downregulation and 3 (N52A, S169I, and V180E) that partially impaired MHC-I downregulation. Globally, the CTL pressure in vitro selected functional Nef from the in vivo quasispecies mixtures that predominately lacked MHC-I downregulatory function at the baseline. Overall, these data demonstrate that CTL pressure exerts a strong purifying selective pressure for MHC-I downregulation and identifies novel functional motifs present in Nef sequences in vivo. PMID:22553319
Splice-site mutations identified in PDE6A responsible for retinitis pigmentosa in consanguineous Pakistani families

PubMed Central

Khan, Shahid Y.; Ali, Shahbaz; Naeem, Muhammad Asif; Khan, Shaheen N.; Husnain, Tayyab; Butt, Nadeem H.; Qazi, Zaheeruddin A.; Akram, Javed; Riazuddin, Sheikh; Ayyagari, Radha; Hejtmancik, J. Fielding

2015-01-01

Purpose This study was conducted to localize and identify causal mutations associated with autosomal recessive retinitis pigmentosa (RP) in consanguineous familial cases of Pakistani origin. Methods Ophthalmic examinations that included funduscopy and electroretinography (ERG) were performed to confirm the affectation status. Blood samples were collected from all participating individuals, and genomic DNA was extracted. A genome-wide scan was performed, and two-point logarithm of odds (LOD) scores were calculated. Sanger sequencing was performed to identify the causative variants. Subsequently, we performed whole exome sequencing to rule out the possibility of a second causal variant within the linkage interval. Sequence conservation was performed with alignment analyses of PDE6A orthologs, and in silico splicing analysis was completed with Human Splicing Finder version 2.4.1. Results A large multigenerational consanguineous family diagnosed with early-onset RP was ascertained. An ophthalmic clinical examination consisting of fundus photography and electroretinography confirmed the diagnosis of RP. A genome-wide scan was performed, and suggestive two-point LOD scores were observed with markers on chromosome 5q. Haplotype analyses identified the region; however, the region did not segregate with the disease phenotype in the family. Subsequently, we performed a second genome-wide scan that excluded the entire genome except the chromosome 5q region harboring PDE6A. Next-generation whole exome sequencing identified a splice acceptor site mutation in intron 16: c.2028–1G>A, which was completely conserved in PDE6A orthologs and was absent in ethnically matched 350 control chromosomes, the 1000 Genomes database, and the NHLBI Exome Sequencing Project. Subsequently, we investigated our entire cohort of RP familial cases and identified a second family who harbored a splice acceptor site mutation in intron 10: c.1408–2A>G. In silico analysis suggested that these mutations will result in the elimination of wild-type splice acceptor sites that would result in either skipping of the respective exon or the creation of a new cryptic splice acceptor site; both possibilities would result in retinal photoreceptor cells that lack PDE6A wild-type protein. Conclusions we report two splice acceptor site variations in PDE6A in consanguineous Pakistani families who manifested cardinal symptoms of RP. Taken together with our previously published work, our data suggest that mutations in PDE6A account for about 2% of the total genetic load of RP in our cohort and possibly in the Pakistani population as well. PMID:26321862
Molecular evolution of the CYP2D subfamily in primates: purifying selection on substrate recognition sites without the frequent or long-tract gene conversion.

PubMed

Yasukochi, Yoshiki; Satta, Yoko

2015-03-25

The human cytochrome P450 (CYP) 2D6 gene is a member of the CYP2D gene subfamily, along with the CYP2D7P and CYP2D8P pseudogenes. Although the CYP2D6 enzyme has been studied extensively because of its clinical importance, the evolution of the CYP2D subfamily has not yet been fully understood. Therefore, the goal of this study was to reveal the evolutionary process of the human drug metabolic system. Here, we investigate molecular evolution of the CYP2D subfamily in primates by comparing 14 CYP2D sequences from humans to New World monkey genomes. Window analysis and statistical tests revealed that entire genomic sequences of paralogous genes were extensively homogenized by gene conversion during molecular evolution of CYP2D genes in primates. A neighbor-joining tree based on genomic sequences at the nonsubstrate recognition sites showed that CYP2D6 and CYP2D8 genes were clustered together due to gene conversion. In contrast, a phylogenetic tree using amino acid sequences at substrate recognition sites did not cluster the CYP2D6 and CYP2D8 genes, suggesting that the functional constraint on substrate specificity is one of the causes for purifying selection at the substrate recognition sites. Our results suggest that the CYP2D gene subfamily in primates has evolved to maintain the regioselectivity for a substrate hydroxylation activity between individual enzymes, even though extensive gene conversion has occurred across CYP2D coding sequences. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Molecular Evolution of the CYP2D Subfamily in Primates: Purifying Selection on Substrate Recognition Sites without the Frequent or Long-Tract Gene Conversion

PubMed Central

Yasukochi, Yoshiki; Satta, Yoko

2015-01-01

The human cytochrome P450 (CYP) 2D6 gene is a member of the CYP2D gene subfamily, along with the CYP2D7P and CYP2D8P pseudogenes. Although the CYP2D6 enzyme has been studied extensively because of its clinical importance, the evolution of the CYP2D subfamily has not yet been fully understood. Therefore, the goal of this study was to reveal the evolutionary process of the human drug metabolic system. Here, we investigate molecular evolution of the CYP2D subfamily in primates by comparing 14 CYP2D sequences from humans to New World monkey genomes. Window analysis and statistical tests revealed that entire genomic sequences of paralogous genes were extensively homogenized by gene conversion during molecular evolution of CYP2D genes in primates. A neighbor-joining tree based on genomic sequences at the nonsubstrate recognition sites showed that CYP2D6 and CYP2D8 genes were clustered together due to gene conversion. In contrast, a phylogenetic tree using amino acid sequences at substrate recognition sites did not cluster the CYP2D6 and CYP2D8 genes, suggesting that the functional constraint on substrate specificity is one of the causes for purifying selection at the substrate recognition sites. Our results suggest that the CYP2D gene subfamily in primates has evolved to maintain the regioselectivity for a substrate hydroxylation activity between individual enzymes, even though extensive gene conversion has occurred across CYP2D coding sequences. PMID:25808902
The Extent of mRNA Editing Is Limited in Chicken Liver and Adipose, but Impacted by Tissular Context, Genotype, Age, and Feeding as Exemplified with a Conserved Edited Site in COG3.

PubMed

Roux, Pierre-François; Frésard, Laure; Boutin, Morgane; Leroux, Sophie; Klopp, Christophe; Djari, Anis; Esquerré, Diane; Martin, Pascal G P; Zerjal, Tatiana; Gourichon, David; Pitel, Frédérique; Lagarrigue, Sandrine

2015-12-04

RNA editing is a posttranscriptional process leading to differences between genomic DNA and transcript sequences, potentially enhancing transcriptome diversity. With recent advances in high-throughput sequencing, many efforts have been made to describe mRNA editing at the transcriptome scale, especially in mammals, yielding contradictory conclusions regarding the extent of this phenomenon. We show, by detailed description of the 25 studies focusing so far on mRNA editing at the whole-transcriptome scale, that systematic sequencing artifacts are considered in most studies whereas biological replication is often neglected and multi-alignment not properly evaluated, which ultimately impairs the legitimacy of results. We recently developed a rigorous strategy to identify mRNA editing using mRNA and genomic DNA sequencing, taking into account sequencing and mapping artifacts, and biological replicates. We applied this method to screen for mRNA editing in liver and white adipose tissue from eight chickens and confirm the small extent of mRNA recoding in this species. Among the 25 unique edited sites identified, three events were previously described in mammals, attesting that this phenomenon is conserved throughout evolution. Deeper investigations on five sites revealed the impact of tissular context, genotype, age, feeding conditions, and sex on mRNA editing levels. More specifically, this analysis highlighted that the editing level at the site located on COG3 was strongly regulated by four of these factors. By comprehensively characterizing the mRNA editing landscape in chickens, our results highlight how this phenomenon is limited and suggest regulation of editing levels by various genetic and environmental factors. Copyright © 2016 Roux et al.
The Extent of mRNA Editing Is Limited in Chicken Liver and Adipose, but Impacted by Tissular Context, Genotype, Age, and Feeding as Exemplified with a Conserved Edited Site in COG3

PubMed Central

Roux, Pierre-François; Frésard, Laure; Boutin, Morgane; Leroux, Sophie; Klopp, Christophe; Djari, Anis; Esquerré, Diane; Martin, Pascal GP; Zerjal, Tatiana; Gourichon, David; Pitel, Frédérique; Lagarrigue, Sandrine

2015-01-01

RNA editing is a posttranscriptional process leading to differences between genomic DNA and transcript sequences, potentially enhancing transcriptome diversity. With recent advances in high-throughput sequencing, many efforts have been made to describe mRNA editing at the transcriptome scale, especially in mammals, yielding contradictory conclusions regarding the extent of this phenomenon. We show, by detailed description of the 25 studies focusing so far on mRNA editing at the whole-transcriptome scale, that systematic sequencing artifacts are considered in most studies whereas biological replication is often neglected and multi-alignment not properly evaluated, which ultimately impairs the legitimacy of results. We recently developed a rigorous strategy to identify mRNA editing using mRNA and genomic DNA sequencing, taking into account sequencing and mapping artifacts, and biological replicates. We applied this method to screen for mRNA editing in liver and white adipose tissue from eight chickens and confirm the small extent of mRNA recoding in this species. Among the 25 unique edited sites identified, three events were previously described in mammals, attesting that this phenomenon is conserved throughout evolution. Deeper investigations on five sites revealed the impact of tissular context, genotype, age, feeding conditions, and sex on mRNA editing levels. More specifically, this analysis highlighted that the editing level at the site located on COG3 was strongly regulated by four of these factors. By comprehensively characterizing the mRNA editing landscape in chickens, our results highlight how this phenomenon is limited and suggest regulation of editing levels by various genetic and environmental factors. PMID:26637431
Crystallization and preliminary X-ray diffraction analysis of the Bacillus subtilis replication termination protein in complex with the 37-base-pair TerI-binding site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vivian, J. P.; Porter, C.; Wilce, J. A.

2006-11-01

A preparation of replication terminator protein (RTP) of B. subtilis and a 37-base-pair TerI sequence (comprising two binding sites for RTP) has been purified and crystallized. The replication terminator protein (RTP) of Bacillus subtilis binds to specific DNA sequences that halt the progression of the replisome in a polar manner. These terminator complexes flank a defined region of the chromosome into which they allow replication forks to enter but not exit. Forcing the fusion of replication forks in a specific zone is thought to allow the coordination of post-replicative processes. The functional terminator complex comprises two homodimers each of 29more » kDa bound to overlapping binding sites. A preparation of RTP and a 37-base-pair TerI sequence (comprising two binding sites for RTP) has been purified and crystallized. A data set to 3.9 Å resolution with 97.0% completeness and an R{sub sym} of 12% was collected from a single flash-cooled crystal using synchrotron radiation. The diffraction data are consistent with space group P622, with unit-cell parameters a = b = 118.8, c = 142.6 Å.« less
Human Splicing Finder: an online bioinformatics tool to predict splicing signals.

PubMed

Desmet, François-Olivier; Hamroun, Dalil; Lalande, Marine; Collod-Béroud, Gwenaëlle; Claustres, Mireille; Béroud, Christophe

2009-05-01

Thousands of mutations are identified yearly. Although many directly affect protein expression, an increasing proportion of mutations is now believed to influence mRNA splicing. They mostly affect existing splice sites, but synonymous, non-synonymous or nonsense mutations can also create or disrupt splice sites or auxiliary cis-splicing sequences. To facilitate the analysis of the different mutations, we designed Human Splicing Finder (HSF), a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-beta Serine-Arginine proteins and the hnRNP A1 ribonucleoprotein. We also developed new Position Weight Matrices to assess the strength of 5' and 3' splice sites and branch points. We evaluated HSF efficiency using a set of 83 intronic and 35 exonic mutations known to result in splicing defects. We showed that the mutation effect was correctly predicted in almost all cases. HSF could thus represent a valuable resource for research, diagnostic and therapeutic (e.g. therapeutic exon skipping) purposes as well as for global studies, such as the GEN2PHEN European Project or the Human Variome Project.

Human Splicing Finder: an online bioinformatics tool to predict splicing signals

PubMed Central

Desmet, François-Olivier; Hamroun, Dalil; Lalande, Marine; Collod-Béroud, Gwenaëlle; Claustres, Mireille; Béroud, Christophe

2009-01-01

Thousands of mutations are identified yearly. Although many directly affect protein expression, an increasing proportion of mutations is now believed to influence mRNA splicing. They mostly affect existing splice sites, but synonymous, non-synonymous or nonsense mutations can also create or disrupt splice sites or auxiliary cis-splicing sequences. To facilitate the analysis of the different mutations, we designed Human Splicing Finder (HSF), a tool to predict the effects of mutations on splicing signals or to identify splicing motifs in any human sequence. It contains all available matrices for auxiliary sequence prediction as well as new ones for binding sites of the 9G8 and Tra2-β Serine-Arginine proteins and the hnRNP A1 ribonucleoprotein. We also developed new Position Weight Matrices to assess the strength of 5′ and 3′ splice sites and branch points. We evaluated HSF efficiency using a set of 83 intronic and 35 exonic mutations known to result in splicing defects. We showed that the mutation effect was correctly predicted in almost all cases. HSF could thus represent a valuable resource for research, diagnostic and therapeutic (e.g. therapeutic exon skipping) purposes as well as for global studies, such as the GEN2PHEN European Project or the Human Variome Project. PMID:19339519
Recent sequence variation in probe binding site affected detection of respiratory syncytial virus group B by real-time RT-PCR.

PubMed

Kamau, Everlyn; Agoti, Charles N; Lewa, Clement S; Oketch, John; Owor, Betty E; Otieno, Grieven P; Bett, Anne; Cane, Patricia A; Nokes, D James

2017-03-01

Direct immuno-fluorescence test (IFAT) and multiplex real-time RT-PCR have been central to RSV diagnosis in Kilifi, Kenya. Recently, these two methods showed discrepancies with an increasing number of PCR undetectable RSV-B viruses. Establish if mismatches in the primer and probe binding sites could have reduced real-time RT-PCR sensitivity. Nucleoprotein (N) and glycoprotein (G) genes were sequenced for real-time RT-PCR positive and negative samples. Primer and probe binding regions in N gene were checked for mismatches and phylogenetic analyses done to determine molecular epidemiology of these viruses. New primers and probe were designed and tested on the previously real-time RT-PCR negative samples. N gene sequences revealed 3 different mismatches in the probe target site of PCR negative, IFAT positive viruses. The primers target sites had no mismatches. Phylogenetic analysis of N and G genes showed that real-time RT-PCR positive and negative samples fell into distinct clades. Newly designed primers-probe pair improved detection and recovered previous PCR undetectable viruses. An emerging RSV-B variant is undetectable by a quite widely used real-time RT-PCR assay due to polymorphisms that influence probe hybridization affecting PCR accuracy. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Identification of novel point mutations in splicing sites integrating whole-exome and RNA-seq data in myeloproliferative diseases

PubMed Central

Spinelli, Roberta; Pirola, Alessandra; Redaelli, Sara; Sharma, Nitesh; Raman, Hima; Valletta, Simona; Magistroni, Vera; Piazza, Rocco; Gambacorti-Passerini, Carlo

2013-01-01

Point mutations in intronic regions near mRNA splice junctions can affect the splicing process. To identify novel splicing variants from exome sequencing data, we developed a bioinformatics splice-site prediction procedure to analyze next-generation sequencing (NGS) data (SpliceFinder). SpliceFinder integrates two functional annotation tools for NGS, ANNOVAR and MutationTaster and two canonical splice site prediction programs for single mutation analysis, SSPNN and NetGene2. By SpliceFinder, we identified somatic mutations affecting RNA splicing in a colon cancer sample, in eight atypical chronic myeloid leukemia (aCML), and eight CML patients. A novel homozygous splicing mutation was found in APC (NM_000038.4:c.1312+5G>A) and six heterozygous in GNAQ (NM_002072.2:c.735+1C>T), ABCC3 (NM_003786.3:c.1783-1G>A), KLHDC1 (NM_172193.1:c.568-2A>G), HOOK1 (NM_015888.4:c.1662-1G>A), SMAD9 (NM_001127217.2:c.1004-1C>T), and DNAH9 (NM_001372.3:c.10242+5G>A). Integrating whole-exome and RNA sequencing in aCML and CML, we assessed the phenotypic effect of mutations on mRNA splicing for GNAQ, ABCC3, HOOK1. In ABCC3 and HOOK1, RNA-Seq showed the presence of aberrant transcripts with activation of a cryptic splice site or intron retention, validated by the reverse transcription-polymerase chain reaction (RT-PCR) in the case of HOOK1. In GNAQ, RNA-Seq showed 22% of wild-type transcript and 78% of mRNA skipping exon 5, resulting in a 4–6 frameshift fusion confirmed by RT-PCR. The pipeline can be useful to identify intronic variants affecting RNA sequence by complementing conventional exome analysis. PMID:24498620
Cell type-specific termination of transcription by transposable element sequences.

PubMed

Conley, Andrew B; Jordan, I King

2012-09-30

Transposable elements (TEs) encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question. Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS) genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3' UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS. TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are evolutionarily young. The extent of transcription termination by TEs seen here, along with the preference for sense-oriented TE insertions to provide TTS, is consistent with the observed antisense orientation bias of human TEs.
RSAT 2018: regulatory sequence analysis tools 20th anniversary.

PubMed

Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

2018-05-02

RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
New Insights into the Classification and Integration Specificity of Streptococcus Integrative Conjugative Elements through Extensive Genome Exploration

PubMed Central

Ambroset, Chloé; Coluzzi, Charles; Guédon, Gérard; Devignes, Marie-Dominique; Loux, Valentin; Lacroix, Thomas; Payot, Sophie; Leblond-Bourget, Nathalie

2016-01-01

Recent genome analyses suggest that integrative and conjugative elements (ICEs) are widespread in bacterial genomes and therefore play an essential role in horizontal transfer. However, only a few of these elements are precisely characterized and correctly delineated within sequenced bacterial genomes. Even though previous analysis showed the presence of ICEs in some species of Streptococci, the global prevalence and diversity of ICEs was not analyzed in this genus. In this study, we searched for ICEs in the completely sequenced genomes of 124 strains belonging to 27 streptococcal species. These exhaustive analyses revealed 105 putative ICEs and 26 slightly decayed elements whose limits were assessed and whose insertion site was identified. These ICEs were grouped in seven distinct unrelated or distantly related families, according to their conjugation modules. Integration of these streptococcal ICEs is catalyzed either by a site-specific tyrosine integrase, a low-specificity tyrosine integrase, a site-specific single serine integrase, a triplet of site-specific serine integrases or a DDE transposase. Analysis of their integration site led to the detection of 18 target-genes for streptococcal ICE insertion including eight that had not been identified previously (ftsK, guaA, lysS, mutT, rpmG, rpsI, traG, and ebfC). It also suggests that all specificities have evolved to minimize the impact of the insertion on the host. This overall analysis of streptococcal ICEs emphasizes their prevalence and diversity and demonstrates that exchanges or acquisitions of conjugation and recombination modules are frequent. PMID:26779141
New Insights into the Classification and Integration Specificity of Streptococcus Integrative Conjugative Elements through Extensive Genome Exploration.

PubMed

Ambroset, Chloé; Coluzzi, Charles; Guédon, Gérard; Devignes, Marie-Dominique; Loux, Valentin; Lacroix, Thomas; Payot, Sophie; Leblond-Bourget, Nathalie

2015-01-01

Recent genome analyses suggest that integrative and conjugative elements (ICEs) are widespread in bacterial genomes and therefore play an essential role in horizontal transfer. However, only a few of these elements are precisely characterized and correctly delineated within sequenced bacterial genomes. Even though previous analysis showed the presence of ICEs in some species of Streptococci, the global prevalence and diversity of ICEs was not analyzed in this genus. In this study, we searched for ICEs in the completely sequenced genomes of 124 strains belonging to 27 streptococcal species. These exhaustive analyses revealed 105 putative ICEs and 26 slightly decayed elements whose limits were assessed and whose insertion site was identified. These ICEs were grouped in seven distinct unrelated or distantly related families, according to their conjugation modules. Integration of these streptococcal ICEs is catalyzed either by a site-specific tyrosine integrase, a low-specificity tyrosine integrase, a site-specific single serine integrase, a triplet of site-specific serine integrases or a DDE transposase. Analysis of their integration site led to the detection of 18 target-genes for streptococcal ICE insertion including eight that had not been identified previously (ftsK, guaA, lysS, mutT, rpmG, rpsI, traG, and ebfC). It also suggests that all specificities have evolved to minimize the impact of the insertion on the host. This overall analysis of streptococcal ICEs emphasizes their prevalence and diversity and demonstrates that exchanges or acquisitions of conjugation and recombination modules are frequent.
GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences.

PubMed

Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu

2016-12-22

Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org.
GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences

PubMed Central

Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu

2016-01-01

Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org. PMID:28004786
[Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

PubMed

Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

2009-11-01

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.
Translation of vph mRNA in Streptomyces lividans and Escherichia coli after removal of the 5' untranslated leader.

PubMed

Wu, C J; Janssen, G R

1996-10-01

The Streptomyces vinaceus viomycin phosphotransferase (vph) mRNA contains an untranslated leader with a conventional Shine-Dalgarno homology. The vph leader was removed by ligation of the vph coding sequence to the transcriptional start site of a Streptomyces or an Escherichia coli promoter, such that transcription would initiate at the first position of the vph start codon. Analysis of mRNA demonstrated that transcription initiated primarily at the A of the vph AUG translational start codon in both Streptomyces lividans and E. coli; cells expressing the unleadered vph mRNA were resistant to viomycin indicating that the Shine-Dalgarno sequence, or other features contained within the leader, was not necessary for vph translation. Addition of four nucleotides (5'-AUGC-3') onto the 5' end of the unleadered vph mRNA resulted in translation initiation from the vph start codon and the AUG triplet contained within the added sequence. Translational fusions of vph sequence to a Tn5 neo reporter gene indicated that the first 16 codons of vph coding sequence were sufficient to specify the translational start site and reading frame for expression of neomycin resistance in both E. coli and S. lividans.
A history estimate and evolutionary analysis of rabies virus variants in China.

PubMed

Ming, Pinggang; Yan, Jiaxin; Rayner, Simon; Meng, Shengli; Xu, Gelin; Tang, Qing; Wu, Jie; Luo, Jing; Yang, Xiaoming

2010-03-01

To investigate the evolutionary dynamics of rabies virus (RABV) in China, we collected and sequenced 55 isolates sampled from 14 Chinese provinces over the last 40 years and performed a coalescent-based analysis of the G gene. This revealed that the RABV currently circulating in China is composed of three main groups. Bayesian coalescent analysis estimated the date of the most recent common ancestor for the current RABV Chinese strains to be 1412 (with a 95 % confidence interval of 1006-1736). The estimated mean substitution rate for the G gene sequences (3.961x10(-4) substitutions per site per year) was in accordance with previous reports for RABV.
Evidence for tyrosine-linked glycosaminoglycan in a bacterial surface protein.

PubMed

Peters, J; Rudolf, S; Oschkinat, H; Mengele, R; Sumper, M; Kellermann, J; Lottspeich, F; Baumeister, W

1992-04-01

The S-layer protein of Acetogenium kivui was subjected to proteolysis with different proteases and several high molecular mass glycosaminoglycan peptides containing glucose, galactosamine and an unidentified sugar-related component were separated by molecular sieve chromatography and reversed-phase HPLC and subjected to N-terminal sequence analysis. By methylation analysis glucose was found to be uniformly 1,6-linked, whereas galactosamine was exclusively 1,4-linked. Hydrazinolysis and subsequent amino-acid analysis as well as two-dimensional NMR spectroscopy were used to demonstrate that in these peptides carbohydrate was covalently linked to tyrosine. As all of the four Tyr-glycosylation sites were found to be preceded by valine, a new recognition sequence for glycosylation is suggested.
Discrimination of Bacillus anthracis from closely related microorganisms by analysis of 16S and 23S rRNA with oligonucleotide microchips

DOEpatents

Bavykin, Sergei G.; Mirzabekov, Andrei D.

2007-10-30

The present invention is directed to a novel method of discriminating a highly infectious bacterium Bacillus anthracis from a group of closely related microorganisms. Sequence variations in the 16S and 23S rRNA of the B. cereus subgroup including B. anthracis are utilized to construct an array that can detect these sequence variations through selective hybridizations. The identification and analysis of these sequence variations enables positive discrimination of isolates of the B. cereus group that includes B. anthracis. Discrimination of single base differences in rRNA was achieved with a microchip during analysis of B. cereus group isolates from both single and in mixed probes, as well as identification of polymorphic sites. Successful use of a microchip to determine the appropriate subgroup classification using eight reference microorganisms from the B. cereus group as a study set, was demonstrated.
Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity

PubMed Central

Elias-Kirma, Shani; Nir, Ronit; Segal, Eran

2017-01-01

Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394
oPOSSUM-3: Advanced Analysis of Regulatory Motif Over-Representation Across Genes or ChIP-Seq Datasets

PubMed Central

Kwon, Andrew T.; Arenillas, David J.; Hunt, Rebecca Worsley; Wasserman, Wyeth W.

2012-01-01

oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca. PMID:22973536
oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets.

PubMed

Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W

2012-09-01

oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.
Genetic diversity and classification of Tibetan yak populations based on the mtDNA COIII gene.

PubMed

Song, Q Q; Chai, Z X; Xin, J W; Zhao, S J; Ji, Q M; Zhang, C F; Ma, Z J; Zhong, J C

2015-03-13

To determine the level of genetic diversity and phylogenetic relationships among Tibetan yak populations, the mitochondrial DNA cytochrome c oxidase subunit 3 (COIII) genes of 378 yak individuals from 16 populations were analyzed in this study. The results showed that the length of cytochrome c oxidase subunit 3 gene sequences was 781 bp, with nucleotide frequencies of 29.2, 29.4, 26.1, and 15.2% for T, C, A, and G, respectively. A total of 26 haplotypes were identified, with 69 polymorphic sites, including 11 parsimony-informative sites and 58 single-nucleotide polymorphism sites. No deletions/insertions were found in sequence comparison, indicating that nucleotide mutation types were transitions and transversions. Haplotype and nucleotide diversities were 0.562 and 0.00138, respectively, indicating a high level of genetic diversity in Tibetan yak populations. Phylogenetic relationship analysis indicated that Tibetan yak populations are divided into 2 groups.
Pseudoexon activation increases phenotype severity in a Becker muscular dystrophy patient.

PubMed

Greer, Kane; Mizzi, Kayla; Rice, Emily; Kuster, Lukas; Barrero, Roberto A; Bellgard, Matthew I; Lynch, Bryan J; Foley, Aileen Reghan; O Rathallaigh, Eoin; Wilton, Steve D; Fletcher, Sue

2015-07-01

We report a dystrophinopathy patient with an in-frame deletion of DMD exons 45-47, and therefore a genetic diagnosis of Becker muscular dystrophy, who presented with a more severe than expected phenotype. Analysis of the patient DMD mRNA revealed an 82 bp pseudoexon, derived from intron 44, that disrupts the reading frame and is expected to yield a nonfunctional dystrophin. Since the sequence of the pseudoexon and canonical splice sites does not differ from the reference sequence, we concluded that the genomic rearrangement promoted recognition of the pseudoexon, causing a severe dystrophic phenotype. We characterized the deletion breakpoints and identified motifs that might influence selection of the pseudoexon. We concluded that the donor splice site was strengthened by juxtaposition of intron 47, and loss of intron 44 silencer elements, normally located downstream of the pseudoexon donor splice site, further enhanced pseudoexon selection and inclusion in the DMD transcript in this patient.
Mitochondrial sequence analysis for forensic identification using pyrosequencing technology.

PubMed

Andréasson, H; Asp, A; Alderborn, A; Gyllensten, U; Allen, M

2002-01-01

Over recent years, requests for mtDNA analysis in the field of forensic medicine have notably increased, and the results of such analyses have proved to be very useful in forensic cases where nuclear DNA analysis cannot be performed. Traditionally, mtDNA has been analyzed by DNA sequencing of the two hypervariable regions, HVI and HVII, in the D-loop. DNA sequence analysis using the conventional Sanger sequencing is very robust but time consuming and labor intensive. By contrast, mtDNA analysis based on the pyrosequencing technology provides fast and accurate results from the human mtDNA present in many types of evidence materials in forensic casework. The assay has been developed to determine polymorphic sites in the mitochondrial D-loop as well as the coding region to further increase the discrimination power of mtDNA analysis. The pyrosequencing technology for analysis of mtDNA polymorphisms has been tested with regard to sensitivity, reproducibility, and success rate when applied to control samples and actual casework materials. The results show that the method is very accurate and sensitive; the results are easily interpreted and provide a high success rate on casework samples. The panel of pyrosequencing reactions for the mtDNA polymorphisms were chosen to result in an optimal discrimination power in relation to the number of bases determined.

Molecular Characterization of a Novel Bovine Viral Diarrhea Virus Isolate SD-15

PubMed Central

Zhu, Lisai; Lu, Haibing; Cao, Yufeng; Gai, Xiaochun; Guo, Changming; Liu, Yajing; Liu, Jiaxu; Wang, Xinping

2016-01-01

As one of the major pathogens, bovine viral diarrhea virus caused a significant economic loss to the livestock industry worldwide. Although BVDV infections have increasingly been reported in China in recent years, the molecular aspects of those BVDV strains were barely characterized. In this study, we reported the identification and characterization of a novel BVDV isolate designated as SD-15 from cattle, which is associated with an outbreak characterized by severe hemorrhagic and mucous diarrhea with high morbidity and mortality in Shandong, China. SD-15 was revealed to be a noncytopathic BVDV, and has a complete genomic sequence of 12,285 nucleotides that contains a large open reading frame encoding 3900 amino acids. Alignment analysis showed that SD-15 has 93.8% nucleotide sequence identity with BVDV ZM-95 isolate, a previous BVDV strain isolated from pigs manifesting clinical signs and lesions resembling to classical swine fever. Phylogenetic analysis clustered SD-15 to a BVDV-1m subgenotype. Analysis of the deduced amino acid sequence of glycoproteins revealed that E2 has several highly conserved and variable regions within BVDV-1 genotypes. An additional N-glycosylation site (240NTT) was revealed exclusively in SD-15-encoded E2 in addition to four potential glycosylation sites (Asn-X-Ser/Thr) shared by all BVDV-1 genotypes. Furthermore, unique amino acid and linear epitope mutations were revealed in SD-15-encoded Erns glycoprotein compared with known BVDV-1 genotype. In conclusion, we have isolated a noncytopathic BVDV-1m strain that is associated with a disease characterized by high morbidity and mortality, revealed the complete genome sequence of the first BVDV-1m virus originated from cattle, and found a unique glycosylation site in E2 and a linear epitope mutation in Erns encoded by SD-15 strain. Those results will broaden the current understanding of BVDV infection and lay a basis for future investigation on SD-15-related pathogenesis. PMID:27764206
Role of DNA secondary structures in fragile site breakage along human chromosome 10

PubMed Central

Dillon, Laura W.; Pierce, Levi C. T.; Ng, Maggie C. Y.; Wang, Yuh-Hwa

2013-01-01

The formation of alternative DNA secondary structures can result in DNA breakage leading to cancer and other diseases. Chromosomal fragile sites, which are regions of the genome that exhibit chromosomal breakage under conditions of mild replication stress, are predicted to form stable DNA secondary structures. DNA breakage at fragile sites is associated with regions that are deleted, amplified or rearranged in cancer. Despite the correlation, unbiased examination of the ability to form secondary structures has not been evaluated in fragile sites. Here, using the Mfold program, we predict potential DNA secondary structure formation on the human chromosome 10 sequence, and utilize this analysis to compare fragile and non-fragile DNA. We found that aphidicolin (APH)-induced common fragile sites contain more sequence segments with potential high secondary structure-forming ability, and these segments clustered more densely than those in non-fragile DNA. Additionally, using a threshold of secondary structure-forming ability, we refined legitimate fragile sites within the cytogenetically defined boundaries, and identified potential fragile regions within non-fragile DNA. In vitro detection of alternative DNA structure formation and a DNA breakage cell assay were used to validate the computational predictions. Many of the regions identified by our analysis coincide with genes mutated in various diseases and regions of copy number alteration in cancer. This study supports the role of DNA secondary structures in common fragile site instability, provides a systematic method for their identification and suggests a mechanism by which DNA secondary structures can lead to human disease. PMID:23297364
Regulation of CYBB Gene Expression in Human Phagocytes by a Distant Upstream NF-κB Binding Site.

PubMed

Frazão, Josias B; Thain, Alison; Zhu, Zhiqing; Luengo, Marcos; Condino-Neto, Antonio; Newburger, Peter E

2015-09-01

The human CYBB gene encodes the gp91-phox component of the phagocyte oxidase enzyme complex, which is responsible for generating superoxide and other downstream reactive oxygen species essential to microbial killing. In the present study, we have identified by sequence analysis a putative NF-κB binding site in a DNase I hypersensitive site, termed HS-II, located in the distant 5' flanking region of the CYBB gene. Electrophoretic mobility assays showed binding of the sequence element by recombinant NF-κB protein p50 and by proteins in nuclear extract from the HL-60 myeloid leukemia cell line corresponding to p50 and to p50/p65 heterodimers. Chromatin immunoprecipitation demonstrated NF-κB binding to the site in intact HL-60 cells. Chromosome conformation capture (3C) assays demonstrated physical interaction between the NF-κB binding site and the CYBB promoter region. Inhibition of NF-κB activity by salicylate reduced CYBB expression in peripheral blood neutrophils and differentiated U937 monocytic leukemia cells. U937 cells transfected with a mutant inhibitor of κB "super-repressor" showed markedly diminished CYBB expression. Luciferase reporter analysis of the NF-κB site linked to the CYBB 5' flanking promoter region revealed enhanced expression, augmented by treatment with interferon-γ. These studies indicate a role for this distant, 15 kb upstream, binding site in NF-κB regulation of the CYBB gene, an essential component of phagocyte-mediated host defense. © 2015 Wiley Periodicals, Inc.
Triplex-mediated analysis of cytosine methylation at CpA sites in DNA.

PubMed

Johannsen, Marie W; Gerrard, Simon R; Melvin, Tracy; Brown, Tom

2014-01-18

Modified triplex-forming oligonucleotides distinguish 5-methyl cytosine from unmethylated cytosine in DNA duplexes by differences in triplex melting temperatures. The discrimination is sequence-specific; dramatic differences in stabilisation are seen for CpA methylation, whereas CpG methylation is not detected. This direct detection of DNA methylation constitutes a new approach for epigenetic analysis.
Discovery of Undefined Protein Crosslinking Chemistry: A Comprehensive Methodology Utilizing 18O-labeling and Mass Spectrometry

PubMed Central

Liu, Min; Zhang, Zhongqi; Zang, Tianzhu; Spahr, Chris; Cheetham, Janet; Ren, Da; Sunny Zhou, Zhaohui

2013-01-01

Characterization of protein crosslinking, particularly without prior knowledge of the chemical nature and site of crosslinking, poses a significant challenge due to their intrinsic structural complexity and the lack of a comprehensive analytical approach. Towards this end, we have developed a generally applicable workflow—XChem-Finder that involves four stages. (1) Detection of crosslinked peptides via 18O-labeling at C-termini. (2) Determination of the putative partial sequences of each crosslinked peptide pair using a fragment ion mass database search against known protein sequences coupled with a de novo sequence tag search. (3) Extension to full sequences based on protease specificity, the unique combination of mass, and other constraints. (4) Deduction of crosslinking chemistry and site. The mass difference between the sum of two putative full-length peptides and the crosslinked peptide provides the formulas (elemental composition analysis) for the functional groups involved in each cross- linking. Combined with sequence restraint from MS/MS data, plausible crosslinking chemistry and site were inferred, and ultimately, confirmed by matching with all data. Applying our approach to a stressed IgG2 antibody, ten cross-linked peptides were discovered and found to be connected via thioether originating from disulfides at locations that had not been previously recognized. Furthermore, once the crosslink chemistry was revealed, a targeted crosslink search yielded four additional crosslinked peptides that all contain the C-terminus of the light chain. PMID:23634697
Effects of a Short Drilling Implant Protocol on Osteotomy Site Temperature and Drill Torque.

PubMed

Mihali, Sorin G; Canjau, Silvana; Cernescu, Anghel; Bortun, Cristina M; Wang, Hom-Lay; Bratu, Emanuel

2018-02-01

To establish a protocol for reducing the drilling sequence during implant site preparation based on temperature and insertion torque. The traditional conventional drilling sequence (used several drills with 0.6-mm increment each time) was compared with the proposed short drilling protocol (only used 2 drills: initial and final drill). One hundred drilling osteotomies were performed in bovine and porcine bones. Sets of 2 osteotomy sites were created in 5 bone densities using 2 types of drilling protocols. Thermographic pictures were captured throughout all drilling procedures and analyzed using ThermaCAM Researcher Professional 2.10. Torque values were determined during drilling by measuring electrical input and drill speed. There were statistically significant differences in bone temperature between the conventional and short drilling protocols during implant site preparation (analysis of variance P = 0.0008). However, there were no significant differences between the 2 types of drilling protocols for both implant diameters. Implant site preparation time was significantly reduced when using the short drilling protocol compared with the conventional drilling protocol (P < 0.001). Within the limitations of the study, the short drilling protocol proposed herein may represent a safe approach for implant site preparation.
Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese

PubMed Central

Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

2016-01-01

AIM To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. METHODS LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. RESULTS Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. CONCLUSION LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA. PMID:27672588
ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences.

PubMed

Bonizzoni, Paola; Rizzi, Raffaella; Pesole, Graziano

2005-10-05

Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems--hence the need to develop novel strategies. We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

DOE PAGES

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

2016-02-24

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
Random whole metagenomic sequencing for forensic discrimination of soils.

PubMed

Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

2014-01-01

Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Transterm: a database to aid the analysis of regulatory sequences in mRNAs

PubMed Central

Jacobs, Grant H.; Chen, Augustine; Stevens, Stewart G.; Stockwell, Peter A.; Black, Michael A.; Tate, Warren P.; Brown, Chris M.

2009-01-01

Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affect translation (initiation, elongation, termination) for species or strains in Genbank, processed through a standard pipeline. The second is curated descriptions of experimentally determined regulatory elements that function as translational control elements in mRNAs. Transterm focuses on protein binding sites, particularly those in 3′-untranslated regions (3′-UTR). For this release the interface has been extensively updated based on user feedback. The data is now accessible by strain rather than species, for example there are 10 Escherichia coli strains (genomes) analysed separately. In addition to providing a repository of data, the database also provides tools for users to query their own mRNA sequences. Users can search sequences for Transterm or user defined regulatory elements, including protein or miRNA targets. Transterm also provides a central core of links to related resources for complementary analyses. PMID:18984623
rVISTA 2.0: Evolutionary Analysis of Transcription Factor Binding Sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loots, G G; Ovcharenko, I

2004-01-28

Identifying and characterizing the patterns of DNA cis-regulatory modules represents a challenge that has the potential to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and therefore are often conserved between related species. Using this evolutionary principle we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. The rVISTA tool combines transcription factor binding site (TFBS) predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are highly conserved and present in a specific configuration within an alignment. Heremore » we present the newly developed version 2.0 of the rVISTA tool that can process alignments generated by both zPicture and PipMaker alignment programs or use pre-computed pairwise alignments of seven vertebrate genomes available from the ECR Browser. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. rVISTA tool is publicly available at http://rvista.dcode.org/.« less
ScaffoldSeq: Software for characterization of directed evolution populations.

PubMed

Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J

2016-07-01

ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

PubMed

Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

2016-01-04

The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Cloning and sequence analysis of complementary DNA encoding an aberrantly rearranged human T-cell gamma chain.

PubMed Central

Dialynas, D P; Murre, C; Quertermous, T; Boss, J M; Leiden, J M; Seidman, J G; Strominger, J L

1986-01-01

Complementary DNA (cDNA) encoding a human T-cell gamma chain has been cloned and sequenced. At the junction of the variable and joining regions, there is an apparent deletion of two nucleotides in the human cDNA sequence relative to the murine gamma-chain cDNA sequence, resulting simultaneously in the generation of an in-frame stop codon and in a translational frameshift. For this reason, the sequence presented here encodes an aberrantly rearranged human T-cell gamma chain. There are several surprising differences between the deduced human and murine gamma-chain amino acid sequences. These include poor homology in the variable region, poor homology in a discrete segment of the constant region precisely bounded by the expected junctions of exon CII, and the presence in the human sequence of five potential sites for N-linked glycosylation. Images PMID:3458221
Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
Molecular analysis of 16S rRNA genes identifies potentially periodontal pathogenic bacteria and archaea in the plaque of partially erupted third molars.

PubMed

Mansfield, J M; Campbell, J H; Bhandari, A R; Jesionowski, A M; Vickerman, M M

2012-07-01

Small subunit rRNA sequencing and phylogenetic analysis were used to identify cultivable and uncultivable microorganisms present in the dental plaque of symptomatic and asymptomatic partially erupted third molars to determine the prevalence of putative periodontal pathogens in pericoronal sites. Template DNA prepared from subgingival plaque collected from partially erupted symptomatic and asymptomatic mandibular third molars and healthy incisors was used in polymerase chain reaction with broad-range oligonucleotide primers to amplify 16S rRNA bacterial and archaeal genes. Amplicons were cloned, sequenced, and compared with known nucleotide sequences in online databases to identify the microorganisms present. Two thousand three hundred two clones from the plaque of 12 patients carried bacterial sequences from 63 genera belonging to 11 phyla, including members of the uncultivable TM7, SR1, and Chloroflexi, and difficult-to-cultivate Synergistetes and Spirochaetes. Dialister invisus, Filifactor alocis, Fusobacterium nucleatum, Porphyromonas endodontalis, Prevotella denticola, Tannerella forsythia, and Treponema denticola, which have been associated with periodontal disease, were found in significantly greater abundance in pericoronal compared with incisor sites. Dialister invisus and F nucleatum were found in greater abundance in sites exhibiting clinical symptoms. The archaeal species, Methanobrevibacter oralis, which has been associated with severe periodontitis, was found in 3 symptomatic patients. These findings have provided new insights into the complex microbiota of pericoronitis. Several bacterial and archaeal species implicated in periodontal disease were recovered in greater incidence and abundance from the plaque of partially erupted third molars compared with incisors, supporting the hypothesis that the pericoronal region may provide a favored niche for periodontal pathogens in otherwise healthy mouths. Copyright © 2012 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Life-history, substrate choice and Cytochrome Oxidase I variations in sandy beach peracaridans along the Rio de la Plata estuary

NASA Astrophysics Data System (ADS)

Fanini, L.; Zampicinini, G.; Tsigenopoulos, C. S.; Barboza, F. R.; Lozoya, J. P.; Gómez, J.; Celentano, E.; Lercari, D.; Marchetti, G. M.; Defeo, O.

2017-03-01

Life-history, substrate choice and Cytochrome Oxidase I (COI) sequences were analysed in populations of two peracaridans, the supralittoral talitrid Atlantorchestoidea brasiliensis and the intertidal cirolanid Excirolana armata. Three populations of each species, from beaches with similar grain size and located at different points along the natural gradient generated by the Rio de la Plata estuary were analysed. Abundance of E. armata increased with distance from the estuary, while the opposite trend was observed for A. brasiliensis. The proportion of females decreased towards high salinities for both species, significantly for E. armata. A test on substrate salinity preference revealed the absence of patterns due to active choice in E. armata. By contrast, A. brasiliensis showed no preference for the population closer to the estuary, while individuals from the other two sites significantly preferred high salinity substrates. Mitochondrial COI sequences were obtained from A. brasiliensis specimens tested for behaviour. Sequence analysis showed the population from the intermediate site to differ significantly from the other two. No significant genetic differentiation was instead found between populations from the two most distant sites, nor between individuals that expressed different salinity preference. Results showed that diverse sets of traits at the population level enable sandy beach species to cope with local environmental changes: life-history and behavioural traits appear to change in response to different ecological conditions, and, in the case of A brasiliensis, independently of the population structure inferred from COI sequence variation. Information from multiple traits allowed detection of population profiles, highlighting the relevance of multidisciplinary information and the concurrent analysis of field data and laboratory experiments, to detect responses of resident biota to environmental changes.
Sulfonium Ion Derivatization, Isobaric Stable Isotope Labeling and Data Dependent CID- and ETD-MS/MS for Enhanced Phosphopeptide Quantitation, Identification and Phosphorylation Site Characterization

PubMed Central

Lu, Yali; Zhou, Xiao; Stemmer, Paul M.; Reid, Gavin E.

2014-01-01

An amine specific peptide derivatization strategy involving the use of novel isobaric stable isotope encoded ‘fixed charge’ sulfonium ion reagents, coupled with an analysis strategy employing capillary HPLC, ESI-MS, and automated data dependent ion trap CID-MS/MS, -MS3, and/or ETD-MS/MS, has been developed for the improved quantitative analysis of protein phosphorylation, and for identification and characterization of their site(s) of modification. Derivatization of 50 synthetic phosphopeptides with S,S′-dimethylthiobutanoylhydroxysuccinimide ester iodide (DMBNHS), followed by analysis using capillary HPLC-ESI-MS, yielded an average 2.5-fold increase in ionization efficiencies and a significant increase in the presence and/or abundance of higher charge state precursor ions compared to the non-derivatized phosphopeptides. Notably, 44% of the phosphopeptides (22 of 50) in their underivatized states yielded precursor ions whose maximum charge states corresponded to +2, while only 8% (4 of 50) remained at this maximum charge state following DMBNHS derivatization. Quantitative analysis was achieved by measuring the abundances of the diagnostic product ions corresponding to the neutral losses of ‘light’ (S(CH3)2) and ‘heavy’ (S(CD3)2) dimethylsulfide exclusively formed upon CID-MS/MS of isobaric stable isotope labeled forms of the DMBNHS derivatized phosphopeptides. Under these conditions, the phosphate group stayed intact. Access for a greater number of peptides to provide enhanced phosphopeptide sequence identification and phosphorylation site characterization was achieved via automated data-dependent CID-MS3 or ETD-MS/MS analysis due to the formation of the higher charge state precursor ions. Importantly, improved sequence coverage was observed using ETD-MS/MS following introduction of the sulfonium ion fixed charge, but with no detrimental effects on ETD fragmentation efficiency. PMID:21952753

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium

PubMed Central

2014-01-01

We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838
Evolution of toll-like receptors in the context of terrestrial ungulates and cetaceans diversification.

PubMed

Ishengoma, Edson; Agaba, Morris

2017-02-16

Toll-like receptors (TLRs) are the frontline actors in the innate immune response to various pathogens and are expected to be targets of natural selection in species adapted to habitats with contrasting pathogen burdens. The recent publication of genome sequences of giraffe and okapi together afforded the opportunity to examine the evolution of selected TLRs in broad range of terrestrial ungulates and cetaceans during their complex habitat diversification. Through direct sequence comparisons and standard evolutionary approaches, the extent of nucleotide and protein sequence diversity in seven Toll-like receptors (TLR2, TLR3, TLR4, TLR5, TLR7, TLR9 and TLR10) between giraffe and closely related species was determined. In addition, comparison of the patterning of key TLR motifs and domains between giraffe and related species was performed. The quantification of selection pressure and divergence on TLRs among terrestrial ungulates and cetaceans was also performed. Sequence analysis shows that giraffe has 94-99% nucleotide identity with okapi and cattle for all TLRs analyzed. Variations in the number of Leucine-rich repeats were observed in some of TLRs between giraffe, okapi and cattle. Patterning of key TLR domains did not reveal any significant differences in the domain architecture among giraffe, okapi and cattle. Molecular evolutionary analysis for selection pressure identifies positive selection on key sites for all TLRs examined suggesting that pervasive evolutionary pressure has taken place during the evolution of terrestrial ungulates and cetaceans. Analysis of positively selected sites showed some site to be part of Leucine-rich motifs suggesting functional relevance in species-specific recognition of pathogen associated molecular patterns. Notably, clade analysis reveals significant selection divergence between terrestrial ungulates and cetaceans in viral sensing TLR3. Mapping of giraffe TLR3 key substitutions to the structure of the receptor indicates that at least one of giraffe altered sites coincides with TLR3 residue known to play a critical role in receptor signaling activity. There is overall structural conservation in TLRs among giraffe, okapi and cattle indicating that the mechanism for innate immune response utilizing TLR pathways may not have changed very much during the evolution of these species. However, a broader phylogenetic analysis revealed signatures of adaptive evolution among terrestrial ungulates and cetaceans, including the observed selection divergence in TLR3. This suggests that long term ecological dynamics has led to species-specific innovation and functional variation in the mechanisms mediating innate immunity in terrestrial ungulates and cetaceans.
Isolation and primary structural analysis of two conjugated polyketone reductases from Candida parapsilosis.

PubMed

Hidalgo, A R; Akond, M A; Kita, K; Kataoka, M; Shimizu, S

2001-12-01

Two conjugated polyketone reductases (CPRs) were isolated from Candida parapsilosis IFO 0708. The primary structures of CPRs (C1 and C2) were analyzed by amino acid sequencing. The amino acid sequences of both enzymes had high similarity to those of several proteins of the aldo-keto-reductase (AKR) superfamily. However, several amino acid residues in the putative active sites of AKRs were not conserved in CPRs-C1 and -C2.
Targeted DNA sequencing and in situ mutation analysis using mobile phone microscopy

NASA Astrophysics Data System (ADS)

Kühnemund, Malte; Wei, Qingshan; Darai, Evangelia; Wang, Yingjie; Hernández-Neuta, Iván; Yang, Zhao; Tseng, Derek; Ahlford, Annika; Mathot, Lucy; Sjöblom, Tobias; Ozcan, Aydogan; Nilsson, Mats

2017-01-01

Molecular diagnostics is typically outsourced to well-equipped centralized laboratories, often far from the patient. We developed molecular assays and portable optical imaging designs that permit on-site diagnostics with a cost-effective mobile-phone-based multimodal microscope. We demonstrate that targeted next-generation DNA sequencing reactions and in situ point mutation detection assays in preserved tumour samples can be imaged and analysed using mobile phone microscopy, achieving a new milestone for tele-medicine technologies.
From sequence analysis of three novel ascorbate peroxidases from Arabidopsis thaliana to structure, function and evolution of seven types of ascorbate peroxidase.

PubMed Central

Jespersen, H M; Kjaersgård, I V; Ostergaard, L; Welinder, K G

1997-01-01

Ascorbate peroxidases are haem proteins that efficiently scavenge H2O2 in the cytosol and chloroplasts of plants. Database analyses retrieved 52 expressed sequence tags coding for Arabidopsis thaliana ascorbate peroxidases. Complete sequencing of non-redundant clones revealed three novel types in addition to the two cytosol types described previously in Arabidopsis. Analysis of sequence data available for all plant ascorbate peroxidases resulted in the following classification: two types of cytosol soluble ascorbate peroxidase designated cs1 and cs2; three types of cytosol membrane-bound ascorbate peroxidase, namely cm1, bound to microbodies via a C-terminal membrane-spanning segment, and cm2 and cm3, both of unknown location; two types of chloroplast ascorbate peroxidase with N-terminal transit sequences, the stromal ascorbate peroxidase (chs), and the thylakoid-bound ascorbate peroxidase showing a C-terminal transmembrane segment and designated cht. Further comparison of the patterns of conserved residues and the crystal structure of pea ascorbate peroxidase showed that active site residues are conserved, and three peptide segments implicated in interaction with reducing substrate are similar, excepting cm2 and cm3 types. A change of Phe-175 in cytosol types to Trp-175 in chloroplast types might explain the greater ascorbate specificity of chloroplast compared with cytosol ascorbate peroxidases. Residues involved in homodimeric subunit interaction are conserved only in cs1, cs2 and cm1 types. The proximal cation (K+)-binding site observed in pea ascorbate peroxidase seems to be conserved. In addition, cm1, cm2, cm3, chs and cht ascorbate peroxidases contain Asp-43, Asn-57 and Ser-59, indicative of a distal monovalent cation site. The data support the hypothesis that present-day peroxidases evolved by an early gene duplication event. PMID:9291097
Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India.

PubMed

Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

2017-03-01

Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.
Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India

PubMed Central

Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

2017-01-01

Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199
Primary and secondary structural analyses of glutathione S-transferase pi from human placenta.

PubMed

Ahmad, H; Wilson, D E; Fritz, R R; Singh, S V; Medh, R D; Nagle, G T; Awasthi, Y C; Kurosky, A

1990-05-01

The primary structure of glutathione S-transferase (GST) pi from a single human placenta was determined. The structure was established by chemical characterization of tryptic and cyanogen bromide peptides as well as automated sequence analysis of the intact enzyme. The structural analysis indicated that the protein is comprised of 209 amino acid residues and gave no evidence of post-translational modifications. The amino acid sequence differed from that of the deduced amino acid sequence determined by nucleotide sequence analysis of a cDNA clone (Kano, T., Sakai, M., and Muramatsu, M., 1987, Cancer Res. 47, 5626-5630) at position 104 which contained both valine and isoleucine whereas the deduced sequence from nucleotide sequence analysis identified only isoleucine at this position. These results demonstrated that in the one individual placenta studied at least two GST pi genes are coexpressed, probably as a result of allelomorphism. Computer assisted consensus sequence evaluation identified a hydrophobic region in GST pi (residues 155-181) that was predicted to be either a buried transmembrane helical region or a signal sequence region. The significance of this hydrophobic region was interpreted in relation to the mode of action of the enzyme especially in regard to the potential involvement of a histidine in the active site mechanism. A comparison of the chemical similarity of five known human GST complete enzyme structures, one of pi, one of mu, two of alpha, and one microsomal, gave evidence that all five enzymes have evolved by a divergent evolutionary process after gene duplication, with the microsomal enzyme representing the most divergent form.
Positive selection in octopus haemocyanin indicates functional links to temperature adaptation.

PubMed

Oellermann, Michael; Strugnell, Jan M; Lieb, Bernhard; Mark, Felix C

2015-07-05

Octopods have successfully colonised the world's oceans from the tropics to the poles. Yet, successful persistence in these habitats has required adaptations of their advanced physiological apparatus to compensate impaired oxygen supply. Their oxygen transporter haemocyanin plays a major role in cold tolerance and accordingly has undergone functional modifications to sustain oxygen release at sub-zero temperatures. However, it remains unknown how molecular properties evolved to explain the observed functional adaptations. We thus aimed to assess whether natural selection affected molecular and structural properties of haemocyanin that explains temperature adaptation in octopods. Analysis of 239 partial sequences of the haemocyanin functional units (FU) f and g of 28 octopod species of polar, temperate, subtropical and tropical origin revealed natural selection was acting primarily on charge properties of surface residues. Polar octopods contained haemocyanins with higher net surface charge due to decreased glutamic acid content and higher numbers of basic amino acids. Within the analysed partial sequences, positive selection was present at site 2545, positioned between the active copper binding centre and the FU g surface. At this site, methionine was the dominant amino acid in polar octopods and leucine was dominant in tropical octopods. Sites directly involved in oxygen binding or quaternary interactions were highly conserved within the analysed sequence. This study has provided the first insight into molecular and structural mechanisms that have enabled octopods to sustain oxygen supply from polar to tropical conditions. Our findings imply modulation of oxygen binding via charge-charge interaction at the protein surface, which stabilize quaternary interactions among functional units to reduce detrimental effects of high pH on venous oxygen release. Of the observed partial haemocyanin sequence, residue 2545 formed a close link between the FU g surface and the active centre, suggesting a role as allosteric binding site. The prevalence of methionine at this site in polar octopods, implies regulation of oxygen affinity via increased sensitivity to allosteric metal binding. High sequence conservation of sites directly involved in oxygen binding indicates that functional modifications of octopod haemocyanin rather occur via more subtle mechanisms, as observed in this study.
Bacterial community composition of chronic periodontitis and novel oral sampling sites for detecting disease indicators

PubMed Central

2014-01-01

Background Periodontitis is an infectious and inflammatory disease of polymicrobial etiology that can lead to the destruction of bones and tissues that support the teeth. The management of chronic periodontitis (CP) relies heavily on elimination or at least control of known pathogenic consortia associated with the disease. Until now, microbial plaque obtained from the subgingival (SubG) sites has been the primary focus for bacterial community analysis using deep sequencing. In addition to the use of SubG plaque, here, we investigated whether plaque obtained from supragingival (SupG) and tongue dorsum sites can serve as alternatives for monitoring CP-associated bacterial biomarkers. Results Using SubG, SupG, and tongue plaque DNA from 11 healthy and 13 diseased subjects, we sequenced V3 regions (approximately 200 bases) of the 16S rRNA gene using Illumina sequencing. After quality filtering, approximately 4.1 million sequences were collapsed into operational taxonomic units (OTUs; sequence identity cutoff of >97%) that were classified to a total of 19 phyla spanning 114 genera. Bacterial community diversity and overall composition was not affected by health or disease, and multiresponse permutation procedure (MRPP) on Bray-Curtis distance measures only supported weakly distinct bacterial communities in SubG and tongue plaque depending on health or disease status (P < 0.05). Nonetheless, in SubG and tongue sites, the relative abundance of Firmicutes was increased significantly from health to disease and members of Synergistetes were found in higher abundance across all sites in disease. Taxa indicative of CP were identified in all three locations (for example, Treponema denticola, Porphyromonas gingivalis, Synergistes oral taxa 362 and 363). Conclusions For the first time, this study demonstrates that SupG and tongue dorsum plaque can serve as alternative sources for detecting and enumerating known and novel bacterial biomarkers of CP. This finding is clinically important because, in contrast with SubG sampling that requires trained professionals, obtaining plaque from SupG and tongue sites is convenient and minimally-invasive and offers a novel means to track CP-biomarker organisms during treatment outcome monitoring. PMID:25225610
Characterization of Fe(II) oxidizing bacterial activities and communities at two acidic Appalachian coalmine drainage-impacted sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Senko, John M.; Wanjugi, Pauline; Lucas, Melanie

2008-06-12

We characterized the microbiologically mediated oxidative precipitation of Fe(II) from coalminederived acidic mine drainage (AMD) along flow-paths at two sites in northern Pennsylvania. At the Gum Boot site, dissolved Fe(II) was efficiently removed from AMD whereas minimal Fe(II) removal occurred at the Fridays-2 site. Neither site received human intervention to treat the AMD. Culturable Fe(II) oxidizing bacteria were most abundant at sampling locations along the AMD flow path corresponding to greatest Fe(II) removal and where overlying water contained abundant dissolved O2. Rates of Fe(II) oxidation determined in laboratory-based sediment incubations were also greatest at these sampling locations. Ribosomal RNA intergenicmore » spacer analysis and sequencing of partial 16S rRNA genes recovered from sediment bacterial communities revealed similarities among populations at points receiving regular inputs of Fe(II)-rich AMD and provided evidence for the presence of bacterial lineages capable of Fe(II) oxidation. A notable difference between bacterial communities at the two sites was the abundance of Chloroflexi-affiliated 16S rRNA gene sequences in clone libraries derived from the Gum Boot sediments. Our results suggest that inexpensive and reliable AMD treatment strategies can be implemented by mimicking the conditions present at the Gum Boot field site.« less
Mojo Hand, a TALEN design tool for genome editing applications.

PubMed

Neff, Kevin L; Argue, David P; Ma, Alvin C; Lee, Han B; Clark, Karl J; Ekker, Stephen C

2013-01-16

Recent studies of transcription activator-like (TAL) effector domains fused to nucleases (TALENs) demonstrate enormous potential for genome editing. Effective design of TALENs requires a combination of selecting appropriate genetic features, finding pairs of binding sites based on a consensus sequence, and, in some cases, identifying endogenous restriction sites for downstream molecular genetic applications. We present the web-based program Mojo Hand for designing TAL and TALEN constructs for genome editing applications (http://www.talendesign.org). We describe the algorithm and its implementation. The features of Mojo Hand include (1) automatic download of genomic data from the National Center for Biotechnology Information, (2) analysis of any DNA sequence to reveal pairs of binding sites based on a user-defined template, (3) selection of restriction-enzyme recognition sites in the spacer between the TAL monomer binding sites including options for the selection of restriction enzyme suppliers, and (4) output files designed for subsequent TALEN construction using the Golden Gate assembly method. Mojo Hand enables the rapid identification of TAL binding sites for use in TALEN design. The assembly of TALEN constructs, is also simplified by using the TAL-site prediction program in conjunction with a spreadsheet management aid of reagent concentrations and TALEN formulation. Mojo Hand enables scientists to more rapidly deploy TALENs for genome editing applications.
Analysis of variable sites between two complete South China tiger (Panthera tigris amoyensis) mitochondrial genomes.

PubMed

Zhang, Wenping; Yue, Bisong; Wang, Xiaofang; Zhang, Xiuyue; Xie, Zhong; Liu, Nonglin; Fu, Wenyuan; Yuan, Yaohua; Chen, Daqing; Fu, Danghua; Zhao, Bo; Yin, Yuzhong; Yan, Xiahui; Wang, Xinjing; Zhang, Rongying; Liu, Jie; Li, Maoping; Tang, Yao; Hou, Rong; Zhang, Zhihe

2011-10-01

In order to investigate the mitochondrial genome of Panthera tigris amoyensis, two South China tigers (P25 and P27) were analyzed following 15 cymt-specific primer sets. The entire mtDNA sequence was found to be 16,957 bp and 17,001 bp long for P25 and P27 respectively, and this difference in length between P25 and P27 occurred in the number of tandem repeats in the RS-3 segment of the control region. The structural characteristics of complete P. t. amoyensis mitochondrial genomes were also highly similar to those of P. uncia. Additionally, the rate of point mutation was only 0.3% and a total of 59 variable sites between P25 and P27 were found. Out of the 59 variable sites, 6 were located in 6 different tRNA genes, 6 in the 2 rRNA genes, 7 in non-coding regions (one located between tRNA-Asn and tRNA-Tyr and six in the D-loop), and 40 in 10 protein-coding genes. COI held the largest amount of variable sites (9 sites) and Cytb contained the highest variable rate (0.7%) in the complete sequences. Moreover, out of the 40 variable sites located in 10 protein-coding genes, 12 sites were nonsynonymous.
Technical adequacy of bisulfite sequencing and pyrosequencing for detection of mitochondrial DNA methylation: Sources and avoidance of false-positive detection.

PubMed

Owa, Chie; Poulin, Matthew; Yan, Liying; Shioda, Toshi

2018-01-01

The existence of cytosine methylation in mammalian mitochondrial DNA (mtDNA) is a controversial subject. Because detection of DNA methylation depends on resistance of 5'-modified cytosines to bisulfite-catalyzed conversion to uracil, examined parameters that affect technical adequacy of mtDNA methylation analysis. Negative control amplicons (NCAs) devoid of cytosine methylation were amplified to cover the entire human or mouse mtDNA by long-range PCR. When the pyrosequencing template amplicons were gel-purified after bisulfite conversion, bisulfite pyrosequencing of NCAs did not detect significant levels of bisulfite-resistant cytosines (brCs) at ND1 (7 CpG sites) or CYTB (8 CpG sites) genes (CI95 = 0%-0.94%); without gel-purification, significant false-positive brCs were detected from NCAs (CI95 = 4.2%-6.8%). Bisulfite pyrosequencing of highly purified, linearized mtDNA isolated from human iPS cells or mouse liver detected significant brCs (~30%) in human ND1 gene when the sequencing primer was not selective in bisulfite-converted and unconverted templates. However, repeated experiments using a sequencing primer selective in bisulfite-converted templates almost completely (< 0.8%) suppressed brC detection, supporting the false-positive nature of brCs detected using the non-selective primer. Bisulfite-seq deep sequencing of linearized, gel-purified human mtDNA detected 9.4%-14.8% brCs for 9 CpG sites in ND1 gene. However, because all these brCs were associated with adjacent non-CpG brCs showing the same degrees of bisulfite resistance, DNA methylation in this mtDNA-encoded gene was not confirmed. Without linearization, data generated by bisulfite pyrosequencing or deep sequencing of purified mtDNA templates did not pass the quality control criteria. Shotgun bisulfite sequencing of human mtDNA detected extremely low levels of CpG methylation (<0.65%) over non-CpG methylation (<0.55%). Taken together, our study demonstrates that adequacy of mtDNA methylation analysis using methods dependent on bisulfite conversion needs to be established for each experiment, taking effects of incomplete bisulfite conversion and template impurity or topology into consideration.
Seasonal and regional diversity of maple sap microbiota revealed using community PCR fingerprinting and 16S rRNA gene clone libraries.

PubMed

Filteau, Marie; Lagacé, Luc; LaPointe, Gisèle; Roy, Denis

2010-04-01

An arbitrary primed community PCR fingerprinting technique based on capillary electrophoresis was developed to study maple sap microbial community characteristics among 19 production sites in Québec over the tapping season. Presumptive fragment identification was made with corresponding fingerprint profiles of bacterial isolate cultures. Maple sap microbial communities were subsequently compared using a representative subset of 13 16S rRNA gene clone libraries followed by gene sequence analysis. Results from both methods indicated that all maple sap production sites and flow periods shared common microbiota members, but distinctive features also existed. Changes over the season in relative abundance of predominant populations showed evidence of a common pattern. Pseudomonas (64%) and Rahnella (8%) were the most abundantly and frequently represented genera of the 2239 sequences analyzed. Janthinobacterium, Leuconostoc, Lactococcus, Weissella, Epilithonimonas and Sphingomonas were revealed as occasional contaminants in maple sap. Maple sap microbiota showed a low level of deep diversity along with a high variation of similar 16S rRNA gene sequences within the Pseudomonas genus. Predominance of Pseudomonas is suggested as a typical feature of maple sap microbiota across geographical regions, production sites, and sap flow periods.
[Genetic characteristics of hemagglutinin in measles viruses isolated in Henan Province, China].

PubMed

Feng, Da-Xing; Seng, Ming-Hua; Liu, Qian; Zhang, Zhen-Ying

2014-03-01

This study aims to investigate the genetic characteristics of hemagglutinin in wild-type measles viruses in Henan Province, China and to provide a basis for measles control and elimination. Specimens were collected from suspected measles cases in Henan during 2008-2012. Cell culture was performed for virus isolation, and RT-PCR was used to amplify hemagglutinin gene. The PCR products were sequenced and analyzed, including construction of phylogenetic tree and analysis of the distance between the isolated virus and the reference virus; then, the variations in predicted amino acids were analyzed. The results showed that 12 measles viruses were isolated in Henan Province and identified as H1a genotype; the nucleotide and amino acid homologies were 98.0%-100% and 97.2%-99.8%, respectively. One glycosylation site changed in all the 12 sequences because of the amino acid mutation from serine to asparagine at the 240th site, as compared with Edmonston-wt. USA/54/A. Overall, the wild-type measles virus genotype circulating in Henan Province from 2008 to 2012 was H1a, with high homology between strains; there were some variations in amino acid sequences, resulting in glycosylation site deletion.
Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis

PubMed Central

Bussemaker, Harmen J.; Li, Hao; Siggia, Eric D.

2000-01-01

The availability of complete genome sequences and mRNA expression data for all genes creates new opportunities and challenges for identifying DNA sequence motifs that control gene expression. An algorithm, “MobyDick,” is presented that decomposes a set of DNA sequences into the most probable dictionary of motifs or words. This method is applicable to any set of DNA sequences: for example, all upstream regions in a genome or all genes expressed under certain conditions. Identification of words is based on a probabilistic segmentation model in which the significance of longer words is deduced from the frequency of shorter ones of various lengths, eliminating the need for a separate set of reference data to define probabilities. We have built a dictionary with 1,200 words for the 6,000 upstream regulatory regions in the yeast genome; the 500 most significant words (some with as few as 10 copies in all of the upstream regions) match 114 of 443 experimentally determined sites (a significance level of 18 standard deviations). When analyzing all of the genes up-regulated during sporulation as a group, we find many motifs in addition to the few previously identified by analyzing the subclusters individually to the expression subclusters. Applying MobyDick to the genes derepressed when the general repressor Tup1 is deleted, we find known as well as putative binding sites for its regulatory partners. PMID:10944202
Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

PubMed

Adhikari, Utpal Kumar; Rahman, M Mizanur

2017-04-01

The nirk gene encoding the copper-containing nitrite reductase (CuNiR), a key catalytic enzyme in the environmental denitrification process that helps to produce nitric oxide from nitrite. The molecular mechanism of denitrification process is definitely complex and in this case a theoretical investigation has been conducted to know the sequence information and amino acid composition of the active site of CuNiR enzyme using various Bioinformatics tools. 10 Fasta formatted sequences were retrieved from the NCBI database and the domain and disordered regions identification and phylogenetic analyses were done on these sequences. The comparative modeling of protein was performed through Modeller 9v14 program and visualized by PyMOL tools. Validated protein models were deposited in the Protein Model Database (PMDB) (PMDB id: PM0080150 to PM0080159). Active sites of nirk encoding CuNiR enzyme were identified by Castp server. The PROCHECK showed significant scores for four protein models in the most favored regions of the Ramachandran plot. Active sites and cavities prediction exhibited that the amino acid, namely Glycine, Alanine, Histidine, Aspartic acid, Glutamic acid, Threonine, and Glutamine were common in four predicted protein models. The present in silico study anticipates that active site analyses result will pave the way for further research on the complex denitrification mechanism of the selected species in the experimental laboratory. Copyright © 2016. Published by Elsevier Ltd.
High-resolution melt analysis to identify and map sequence-tagged site anchor points onto linkage maps: a white lupin (Lupinus albus) map as an exemplar.

PubMed

Croxford, Adam E; Rogers, Tom; Caligari, Peter D S; Wilkinson, Michael J

2008-01-01

* The provision of sequence-tagged site (STS) anchor points allows meaningful comparisons between mapping studies but can be a time-consuming process for nonmodel species or orphan crops. * Here, the first use of high-resolution melt analysis (HRM) to generate STS markers for use in linkage mapping is described. This strategy is rapid and low-cost, and circumvents the need for labelled primers or amplicon fractionation. * Using white lupin (Lupinus albus, x = 25) as a case study, HRM analysis was applied to identify 91 polymorphic markers from expressed sequence tag (EST)-derived and genomic libraries. Of these, 77 generated STS anchor points in the first fully resolved linkage map of the species. The map also included 230 amplified fragment length polymorphisms (AFLP) loci, spanned 1916 cM (84.2% coverage) and divided into the expected 25 linkage groups. * Quantitative trait loci (QTL) analyses performed on the population revealed genomic regions associated with several traits, including the agronomically important time to flowering (tf), alkaloid synthesis and stem height (Ph). Use of HRM-STS markers also allowed us to make direct comparisons between our map and that of the related crop, Lupinus angustifolius, based on the conversion of RFLP, microsatellite and single nucleotide polymorphism (SNP) markers into HRM markers.
Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis

PubMed Central

Zheng, Lu-Lu; Niu, Shen; Hao, Pei; Feng, KaiYan; Cai, Yu-Dong; Li, Yixue

2011-01-01

Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations. PMID:22174779

Formation of cis-diamminedichloroplatinum(II) 1,2-intrastrand cross-links on DNA is flanking-sequence independent.

PubMed

Burstyn, J N; Heiger-Bernays, W J; Cohen, S M; Lippard, S J

2000-11-01

Mapping of cis-diamminedichloroplatinum(II) (cis-DDP, cisplatin) DNA adducts over >3000 nucleotides was carried out using a replication blockage assay. The sites of inhibition of modified T4 DNA polymerase, also referred to as stop sites, were analyzed to determine the effects of local sequence context on the distribution of intrastrand cisplatin cross-links. In a 3120 base fragment from replicative form M13mp18 DNA containing 24.6% guanine, 25.5% thymine, 26.9% adenine and 23.0% cytosine, 166 individual stop sites were observed at a bound platinum/nucleotide ratio of 1-2 per thousand. The majority of stop sites (90%) occurred at G(n>2) sequences and the remainder were located at sites containing an AG dinucleotide. For all of the GG sites present in the mapped sequences, including those with Gn(>)2, 89% blocked replication, whereas for the AG sites only 17% blocked replication. These blockage sites were independent of flanking nucleotides in a sequence of N(1)G*G*N(2) where N(1), N(2) = A, C, G, T and G*G* indicates a 1,2-intrastrand platinum cross-link. The absence of long-range sequence dependence was confirmed by monitoring the reaction of cisplatin with a plasmid containing an 800 bp insert of the human telomere repeat sequence (TTAGGG)(n). Platination reactions monitored at several formal platinum/nucleotide ratios or as a function of time reveal that the telomere insert was not preferentially damaged by cisplatin. Both replication blockage and telomere-insert plasmid platination experiments indicate that cisplatin 1,2-intrastrand adducts do not form preferentially at G-rich sequences in vitro.
Multiple splicing defects in an intronic false exon.

PubMed

Sun, H; Chasin, L A

2000-09-01

Splice site consensus sequences alone are insufficient to dictate the recognition of real constitutive splice sites within the typically large transcripts of higher eukaryotes, and large numbers of pseudoexons flanked by pseudosplice sites with good matches to the consensus sequences can be easily designated. In an attempt to identify elements that prevent pseudoexon splicing, we have systematically altered known splicing signals, as well as immediately adjacent flanking sequences, of an arbitrarily chosen pseudoexon from intron 1 of the human hprt gene. The substitution of a 5' splice site that perfectly matches the 5' consensus combined with mutation to match the CAG/G sequence of the 3' consensus failed to get this model pseudoexon included as the central exon in a dhfr minigene context. Provision of a real 3' splice site and a consensus 5' splice site and removal of an upstream inhibitory sequence were necessary and sufficient to confer splicing on the pseudoexon. This activated context also supported the splicing of a second pseudoexon sequence containing no apparent enhancer. Thus, both the 5' splice site sequence and the polypyrimidine tract of the pseudoexon are defective despite their good agreement with the consensus. On the other hand, the pseudoexon body did not exert a negative influence on splicing. The introduction into the pseudoexon of a sequence selected for binding to ASF/SF2 or its replacement with beta-globin exon 2 only partially reversed the effect of the upstream negative element and the defective polypyrimidine tract. These results support the idea that exon-bridging enhancers are not a prerequisite for constitutive exon definition and suggest that intrinsically defective splice sites and negative elements play important roles in distinguishing the real splicing signal from the vast number of false splicing signals.
DNA Barcodes of Asian Houbara Bustard (Chlamydotis undulata macqueenii)

PubMed Central

Arif, Ibrahim A.; Khan, Haseeb A.; Williams, Joseph B.; Shobrak, Mohammad; Arif, Waad I.

2012-01-01

Populations of Houbara Bustards have dramatically declined in recent years. Captive breeding and reintroduction programs have had limited success in reviving population numbers and thus new technological solutions involving molecular methods are essential for the long term survival of this species. In this study, we sequenced the 694 bp segment of COI gene of the four specimens of Asian Houbara Bustard (Chlamydotis undulata macqueenii). We also compared these sequences with earlier published barcodes of 11 individuals comprising different families of the orders Gruiformes, Ciconiiformes, Podicipediformes and Crocodylia (out group). The pair-wise sequence comparison showed a total of 254 variable sites across all the 15 sequences from different taxa. Three of the four specimens of Houbara Bustard had an identical sequence of COI gene and one individual showed a single nucleotide difference (G > A transition at position 83). Within the bustard family (Otididae), comparison among the three species (Asian Houbara Bustard, Great Bustard (Otis tarda) and the Little Bustard (Tetrax tetrax)), representing three different genera, showed 116 variable sites. For another family (Rallidae), the intra-family variable sites among the individuals of four different genera were found to be 146. The COI genetic distances among the 15 individuals varied from 0.000 to 0.431. Phylogenetic analysis using 619 bp nucleotide segment of COI clearly discriminated all the species representing different genera, families and orders. All the four specimens of Houbara Bustard formed a single clade and are clearly separated from other two individuals of the same family (Otis tarda and Tetrax tetrax). The nucleotide sequence of partial segment of COI gene effectively discriminated the closely related species. This is the first study reporting the barcodes of Houbara Bustard and would be helpful in future molecular studies, particularly for the conservation of this threatened bird in Saudi Arabia. PMID:22408462
Prototype foamy virus envelope glycoprotein leader peptide processing is mediated by a furin-like cellular protease, but cleavage is not essential for viral infectivity.

PubMed

Duda, Anja; Stange, Annett; Lüftenegger, Daniel; Stanke, Nicole; Westphal, Dana; Pietschmann, Thomas; Eastman, Scott W; Linial, Maxine L; Rethwilm, Axel; Lindemann, Dirk

2004-12-01

Analogous to cellular glycoproteins, viral envelope proteins contain N-terminal signal sequences responsible for targeting them to the secretory pathway. The prototype foamy virus (PFV) envelope (Env) shows a highly unusual biosynthesis. Its precursor protein has a type III membrane topology with both the N and C terminus located in the cytoplasm. Coexpression of FV glycoprotein and interaction of its leader peptide (LP) with the viral capsid is essential for viral particle budding and egress. Processing of PFV Env into the particle-associated LP, surface (SU), and transmembrane (TM) subunits occur posttranslationally during transport to the cell surface by yet-unidentified cellular proteases. Here we provide strong evidence that furin itself or a furin-like protease and not the signal peptidase complex is responsible for both processing events. N-terminal protein sequencing of the SU and TM subunits of purified PFV Env-immunoglobulin G immunoadhesin identified furin consensus sequences upstream of both cleavage sites. Mutagenesis analysis of two overlapping furin consensus sequences at the PFV LP/SU cleavage site in the wild-type protein confirmed the sequencing data and demonstrated utilization of only the first site. Fully processed SU was almost completely absent in viral particles of mutants having conserved arginine residues replaced by alanines in the first furin consensus sequence, but normal processing was observed upon mutation of the second motif. Although these mutants displayed a significant loss in infectivity as a result of reduced particle release, no correlation to processing inhibition was observed, since another mutant having normal LP/SU processing had a similar defect.
Albumin Redhill (-1 Arg, 320 Ala----Thr): a glycoprotein variant of human serum albumin whose precursor has an aberrant signal peptidase cleavage site.

PubMed

Brennan, S O; Myles, T; Peach, R J; Donaldson, D; George, P M

1990-01-01

Albumin Redhill is an electrophoretically slow genetic variant of human serum albumin that does not bind 63Ni2+ and has a molecular mass 2.5 kDa higher than normal albumin. Its inability to bind Ni2+ was explained by the finding of an additional residue of Arg at position -1. This did not explain the molecular basis of the genetic variation (since proalbumin contains adjacent Arg residues at -1 and -2) or the increase in apparent molecular mass. Fractionation of tryptic digests on concanavalin A-Sepharose followed by peptide mapping of the bound and unbound fractions and sequence analysis of the glycopeptides identified a mutation of 320 Ala----Thr. This introduces an Asn-Tyr-Thr oligosaccharide attachment sequence centered on Asn-318 and explains the increase in molecular mass. This, however, did not satisfactorily explain the presence of the additional Arg residue at position -1. DNA sequencing of polymerase chain reaction-amplified genomic DNA encoding the prepro sequence of albumin indicated an additional mutation of -2 Arg----Cys. This introduces a prepro sequence, Met-Lys-Trp-Val-Thr-Phe-Ile-Ser-Leu-Leu-Phe-Leu-Phe-Ser-Ser-Ala-Tyr- Ser-Arg-Gly-Val-Phe-Cys-Arg (cf.-Tyr-Ser-Arg-Gly-Val-Phe-Arg-Arg- in normal human pre-proalbumin). We propose that the new Phe-Cys-Arg sequence in the propeptide is an aberrant signal peptidase cleavage site and that the signal peptidase cleaves the propeptide of albumin Redhill in the lumen of the endoplasmic reticulum before it reaches the Golgi vesicles, the site of the diarginyl-specific proalbumin convertase.
454 pyrosequencing analysis of bacterial diversity revealed by a comparative study of soils from mining subsidence and reclamation areas.

PubMed

Li, Yuanyuan; Chen, Longqian; Wen, Hongyu; Zhou, Tianjian; Zhang, Ting; Gao, Xiali

2014-03-28

Significant alteration in the microbial community can occur across reclamation areas suffering subsidence from mining. A reclamation site undergoing fertilization practices and an adjacent coal-excavated subsidence site (sites A and B, respectively) were examined to characterize the bacterial diversity using 454 high-throughput 16S rDNA sequencing. The dominant taxonomic groups in both the sites were Proteobacteria, Acidobacteria, Bacteroidetes, Betaproteobacteria, Actinobacteria, Gammaproteobacteria, Alphaproteobacteria, Deltaproteobacteria, Chloroflexi, and Firmicutes. However, the bacterial communities' abundance, diversity, and composition differed significantly between the sites. Site A presented higher bacterial diversity and more complex community structures than site B. The majority of sequences related to Proteobacteria, Gemmatimonadetes, Chloroflexi, Nitrospirae, Firmicutes, Betaproteobacteria, Deltaproteobacteria, and Anaerolineae were from site A; whereas those related to Actinobacteria, Planctomycetes, Bacteroidetes, Verrucomicrobia, Gammaproteobacteria, Nitriliruptoria, Alphaproteobacteria, and Phycisphaerae originated from site B. The distribution of some bacterial groups and subgroups in the two sites correlated with soil properties and vegetation due to reclamation practice. Site A exhibited enriched bacterial community, soil organic matter (SOM), and total nitrogen (TN), suggesting the presence of relatively diverse microorganisms. SOM and TN were important factors shaping the underlying microbial communities. Furthermore, the specific plant functional group (legumes) was also an important factor influencing soil microbial community composition. Thus, the effectiveness of 454 pyrosequencing in analyzing soil bacterial diversity was validated and an association between land ecological system restoration, mostly mediated by microbial communities, and an improvement in soil properties in coalmining reclamation areas was suggested.
A diverse family of serine proteinase genes expressed in cotton boll weevil (Anthonomus grandis): implications for the design of pest-resistant transgenic cotton plants.

PubMed

Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F

2004-09-01

Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
Virome Assembly and Annotation: A Surprise in the Namib Desert

PubMed Central

Hesse, Uljana; van Heusden, Peter; Kirby, Bronwyn M.; Olonade, Israel; van Zyl, Leonardo J.; Trindade, Marla

2017-01-01

Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies. PMID:28167933
Targeted DNA sequencing and in situ mutation analysis using mobile phone microscopy

PubMed Central

Kühnemund, Malte; Wei, Qingshan; Darai, Evangelia; Wang, Yingjie; Hernández-Neuta, Iván; Yang, Zhao; Tseng, Derek; Ahlford, Annika; Mathot, Lucy; Sjöblom, Tobias; Ozcan, Aydogan; Nilsson, Mats

2017-01-01

Molecular diagnostics is typically outsourced to well-equipped centralized laboratories, often far from the patient. We developed molecular assays and portable optical imaging designs that permit on-site diagnostics with a cost-effective mobile-phone-based multimodal microscope. We demonstrate that targeted next-generation DNA sequencing reactions and in situ point mutation detection assays in preserved tumour samples can be imaged and analysed using mobile phone microscopy, achieving a new milestone for tele-medicine technologies. PMID:28094784
Multiplex sequence analysis demonstrates the competitive growth advantage of the A-to-G mutants of clarithromycin-resistant Helicobacter pylori.

PubMed

Wang, G; Rahman, M S; Humayun, M Z; Taylor, D E

1999-03-01

Clarithromycin resistance in Helicobacter pylori is due to point mutation within the 23S rRNA. We examined the growth rates of different types of site-directed mutants and demonstrated quantitatively the competitive growth advantage of A-to-G mutants over other types of mutants by a multiplex sequencing assay. The results provide a rational explanation of why A-to-G mutants are predominantly observed among clarithromycin-resistant clinical isolates.
Multiplex Sequence Analysis Demonstrates the Competitive Growth Advantage of the A-to-G Mutants of Clarithromycin-Resistant Helicobacter pylori

PubMed Central

Wang, Ge; Rahman, M. Sayeedur; Humayun, M. Zafri; Taylor, Diane E.

1999-01-01

Clarithromycin resistance in Helicobacter pylori is due to point mutation within the 23S rRNA. We examined the growth rates of different types of site-directed mutants and demonstrated quantitatively the competitive growth advantage of A-to-G mutants over other types of mutants by a multiplex sequencing assay. The results provide a rational explanation of why A-to-G mutants are predominantly observed among clarithromycin-resistant clinical isolates. PMID:10049289
ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data

PubMed Central

Morgan, Martin; Anders, Simon; Lawrence, Michael; Aboyoun, Patrick; Pagès, Hervé; Gentleman, Robert

2009-01-01

Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org PMID:19654119
Ensembl 2004.

PubMed

Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

2004-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
A DNA sequence element that advances replication origin activation time in Saccharomyces cerevisiae.

PubMed

Pohl, Thomas J; Kolor, Katherine; Fangman, Walton L; Brewer, Bonita J; Raghuraman, M K

2013-11-06

Eukaryotic origins of DNA replication undergo activation at various times in S-phase, allowing the genome to be duplicated in a temporally staggered fashion. In the budding yeast Saccharomyces cerevisiae, the activation times of individual origins are not intrinsic to those origins but are instead governed by surrounding sequences. Currently, there are two examples of DNA sequences that are known to advance origin activation time, centromeres and forkhead transcription factor binding sites. By combining deletion and linker scanning mutational analysis with two-dimensional gel electrophoresis to measure fork direction in the context of a two-origin plasmid, we have identified and characterized a 19- to 23-bp and a larger 584-bp DNA sequence that are capable of advancing origin activation time.
High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

PubMed

Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

2015-06-17

High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.
Novel Immune Modulating Cellular Vaccine for Prostate Cancer

DTIC Science & Technology

2014-10-01

restriction sites. Murine PSMA : The cDNA encoding mPSMA was purchased from Sino Biologicals and was cloned into the HindIII and BamHI sites of pSP73-Sph/A64...sequence) and reverse primer 5’-TATATAGAGCTCTCAGATGTTCCGATACACATCTC-3’ Murine PSMA no signal sequence (mPSMA-SS): Murine PSMA minus the signal sequence...contains a HindIII site for cloning and utilizes an ATG that lies downstream of the signal sequence as the start codon in PSMA -SS ( PSMA without signal
The Bacterial Response Regulator ArcA Uses a Diverse Binding Site Architecture to Regulate Carbon Oxidation Globally

PubMed Central

Park, Dan M.; Akhtar, Md. Sohail; Ansari, Aseem Z.; Landick, Robert; Kiley, Patricia J.

2013-01-01

Despite the importance of maintaining redox homeostasis for cellular viability, how cells control redox balance globally is poorly understood. Here we provide new mechanistic insight into how the balance between reduced and oxidized electron carriers is regulated at the level of gene expression by mapping the regulon of the response regulator ArcA from Escherichia coli, which responds to the quinone/quinol redox couple via its membrane-bound sensor kinase, ArcB. Our genome-wide analysis reveals that ArcA reprograms metabolism under anaerobic conditions such that carbon oxidation pathways that recycle redox carriers via respiration are transcriptionally repressed by ArcA. We propose that this strategy favors use of catabolic pathways that recycle redox carriers via fermentation akin to lactate production in mammalian cells. Unexpectedly, bioinformatic analysis of the sequences bound by ArcA in ChIP-seq revealed that most ArcA binding sites contain additional direct repeat elements beyond the two required for binding an ArcA dimer. DNase I footprinting assays suggest that non-canonical arrangements of cis-regulatory modules dictate both the length and concentration-sensitive occupancy of DNA sites. We propose that this plasticity in ArcA binding site architecture provides both an efficient means of encoding binding sites for ArcA, σ70-RNAP and perhaps other transcription factors within the same narrow sequence space and an effective mechanism for global control of carbon metabolism to maintain redox homeostasis. PMID:24146625
Transcriptional activation of transforming growth factor alpha by estradiol: requirement for both a GC-rich site and an estrogen response element half-site.

PubMed

Vyhlidal, C; Samudio, I; Kladde, M P; Safe, S

2000-06-01

17beta-Estradiol (E2) induces transforming growth factor alpha (TGFalpha) gene expression in MCF-7 cells and previous studies have identified a 53 bp (-252 to -200) sequence containing two imperfect estrogen responsive elements (EREs) that contribute to E2 responsiveness. Deletion analysis of the TGFalpha gene promoter in this study identified a second upstream region of the promoter (-623 to -549) that is also E2 responsive. This sequence contains three GC-rich sites and an imperfect ERE half-site, and the specific cis-elements and trans-acting factors were determined by promoter analysis in transient transfection experiments, gel mobility shift assays and in vitro DNA footprinting. The results are consistent with an estrogen receptor alpha (ERalpha)/Sp1 complex interacting with an Sp1(N)(30) ERE half-site ((1/2)) motif in which both ERalpha and Sp1 bind promoter DNA. The ER/Sp1-DNA complex is formed using nuclear extracts from MCF-7 cells but not with recombinant human ERalpha or Sp1 proteins, suggesting that other nuclear factor(s) are required for complex stabilization. The E2-responsive Sp1(N)(x)ERE(1/2) motif identified in the TGFalpha gene promoter has also been characterized in the cathepsin D and heat shock protein 27 gene promoters; however, in the latter two promoters the numbers of intervening nucleotides are 23 and 10 respectively.
Sediment Enzyme Activities and Microbial Community Diversity in an Oligotrophic Drinking Water Reservoir, Eastern China

PubMed Central

Zhang, Haihan; Huang, Tinglin; Liu, Tingting

2013-01-01

Drinking water reservoir plays a vital role in the security of urban water supply, yet little is known about microbial community diversity harbored in the sediment of this oligotrophic freshwater environmental ecosystem. In the present study, integrating community level physiological profiles (CLPPs), nested polymerase chain reaction (PCR)-denaturing gradient gel electrophoresis (DGGE) and clone sequence technologies, we examined the sediment urease and protease activities, bacterial community functional diversity, genetic diversity of bacterial and fungal communities in sediments from six sampling sites of Zhou cun drinking water reservoir, eastern China. The results showed that sediment urease activity was markedly distinct along the sites, ranged from 2.48 to 11.81 mg NH3-N/(g·24h). The highest average well color development (AWCD) was found in site C, indicating the highest metabolic activity of heterotrophic bacterial community. Principal component analysis (PCA) revealed tremendous differences in the functional (metabolic) diversity patterns of the sediment bacterial communities from different sites. Meanwhile, DGGE fingerprints also indicated spatial changes of genetic diversity of sediment bacterial and fungal communities. The sequence BLAST analysis of all the sediment samples found that Comamonas sp. was the dominant bacterial species harbored in site A. Alternaria alternate, Allomyces macrogynus and Rhizophydium sp. were most commonly detected fungal species in sediments of the Zhou cun drinking water reservoir. The results from this work provide new insights about the heterogeneity of sediment microbial community metabolic activity and genetic diversity in the oligotrophic drinking water reservoir. PMID:24205265
Structural and functional analysis of an enhancer GPEI having a phorbol 12-O-tetradecanoate 13-acetate responsive element-like sequence found in the rat glutathione transferase P gene.

PubMed

Okuda, A; Imagawa, M; Maeda, Y; Sakai, M; Muramatsu, M

1989-10-05

We have recently identified a typical enhancer, termed GPEI, located about 2.5 kilobases upstream from the transcription initiation site of the rat glutathione transferase P gene. Analyses of 5' and 3' deletion mutants revealed that the cis-acting sequence of GPEI contained the phorbol 12-O-tetradecanoate 13-acetate responsive element (TRE)-like sequence in it. For the maximal activity, however, GPEI required an adjacent upstream sequence of about 19 base pairs in addition to the TRE-like sequence. With the DNA binding gel-shift assay, we could detect protein(s) that specifically binds to the TRE-like sequence of GPEI fragment, which was possibly c-jun.c-fos complex or a similar protein complex. The sequence immediately upstream of the TRE-like sequence did not have any activity by itself, but augmented the latter activity by about 5-fold.

Correlation of fitness landscapes from three orthologous TIM barrels originates from sequence and structure constraints

PubMed Central

Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.; Matthews, C. Robert

2017-01-01

Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. PMID:28262665
[Topographic mapping of retinal function with a scanning laser ophthalmoscope and multifocal electroretinography using short M-sequences].

PubMed

Rudolph, G; Bechmann, M; Berninger, T; Kutschbach, E; Held, U; Tornow, R P; Kalpadakis, P; Zol'nikova, I V; Shamshinova, A M

2001-01-01

A new method of multifocal electroretinography making use of scanning laser ophthalmoscope with a wavelength of 630 nm (SLO-m-ERG), evoking short spatial visual stimuli on the retina, is proposed. Algorithm of presenting the visual stimuli and analysis of distribution of local electroretinograms on the surface of the retina is based on short m-sequences. Mathematical cross correlation analysis shows a three-dimensional distribution of bioelectrical activity of the retina in the central visual field. In normal subjects the cone bioelectrical activity is the maximum in the macular area (corresponding to the density of cone distribution) and absent in the blind spot. The method detects the slightest pathological changes in the retina under control of the site of stimulation and ophthalmoscopic picture of the fundus oculi. The site of the pathological process correlates with the topography of changes in bioelectrical activity of the examined retinal area in diseases of the macular area and pigmented retinitis detectable by ophthalmoscopy.
Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA.

PubMed

Guo, Shicheng; Diep, Dinh; Plongthongkum, Nongluk; Fung, Ho-Lim; Zhang, Kang; Zhang, Kun

2017-04-01

Adjacent CpG sites in mammalian genomes can be co-methylated owing to the processivity of methyltransferases or demethylases, yet discordant methylation patterns have also been observed, which are related to stochastic or uncoordinated molecular processes. We focused on a systematic search and investigation of regions in the full human genome that show highly coordinated methylation. We defined 147,888 blocks of tightly coupled CpG sites, called methylation haplotype blocks, after analysis of 61 whole-genome bisulfite sequencing data sets and validation with 101 reduced-representation bisulfite sequencing data sets and 637 methylation array data sets. Using a metric called methylation haplotype load, we performed tissue-specific methylation analysis at the block level. Subsets of informative blocks were further identified for deconvolution of heterogeneous samples. Finally, using methylation haplotypes we demonstrated quantitative estimation of tumor load and tissue-of-origin mapping in the circulating cell-free DNA of 59 patients with lung or colorectal cancer.
AT-rich sequence elements promote nascent transcript cleavage leading to RNA polymerase II termination

PubMed Central

White, Eleanor; Kamieniarz-Gdula, Kinga; Dye, Michael J.; Proudfoot, Nick J.

2013-01-01

RNA Polymerase II (Pol II) termination is dependent on RNA processing signals as well as specific terminator elements located downstream of the poly(A) site. One of the two major terminator classes described so far is the Co-Transcriptional Cleavage (CoTC) element. We show that homopolymer A/T tracts within the human β-globin CoTC-mediated terminator element play a critical role in Pol II termination. These short A/T tracts, dispersed within seemingly random sequences, are strong terminator elements, and bioinformatics analysis confirms the presence of such sequences in 70% of the putative terminator regions (PTRs) genome-wide. PMID:23258704
Prediction of active sites of enzymes by maximum relevance minimum redundancy (mRMR) feature selection.

PubMed

Gao, Yu-Fei; Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Li, Zhan-Dong; Jiang, Yang

2013-01-27

Identification of catalytic residues plays a key role in understanding how enzymes work. Although numerous computational methods have been developed to predict catalytic residues and active sites, the prediction accuracy remains relatively low with high false positives. In this work, we developed a novel predictor based on the Random Forest algorithm (RF) aided by the maximum relevance minimum redundancy (mRMR) method and incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility to predict active sites of enzymes and achieved an overall accuracy of 0.885687 and MCC of 0.689226 on an independent test dataset. Feature analysis showed that every category of the features except disorder contributed to the identification of active sites. It was also shown via the site-specific feature analysis that the features derived from the active site itself contributed most to the active site determination. Our prediction method may become a useful tool for identifying the active sites and the key features identified by the paper may provide valuable insights into the mechanism of catalysis.
Novel splice mutation in microthalmia-associated transcription factor in Waardenburg Syndrome.

PubMed

Brenner, Laura; Burke, Kelly; Leduc, Charles A; Guha, Saurav; Guo, Jiancheng; Chung, Wendy K

2011-01-01

Waardenburg Syndrome (WS) is a syndromic form of hearing loss associated with mutations in six different genes. We identified a large family with WS that had previously undergone clinical testing, with no reported pathogenic mutation. Using linkage analysis, a region on 3p14.1 with an LOD score of 6.6 was identified. Microthalmia-Associated Transcription Factor, a gene known to cause WS, is located within this region of linkage. Sequencing of Microthalmia-Associated Transcription Factor demonstrated a c.1212 G>A synonymous variant that segregated with the WS in the family and was predicted to cause a novel splicing site that was confirmed with expression analysis of the mRNA. This case illustrates the need to computationally analyze novel synonymous sequence variants for possible effects on splicing to maximize the clinical sensitivity of sequence-based genetic testing.
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.

PubMed Central

Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A

1993-01-01

The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
OSIRIS-REx Touch-And-Go (TAG) Navigation Performance

NASA Technical Reports Server (NTRS)

Berry, Kevin; Antreasian, Peter; Moreau, Michael C.; May, Alex; Sutter, Brian

2015-01-01

The Origins Spectral Interpretation Resource identification Security Regolith Explorer (OSIRIS-REx) mission is a NASA New Frontiers mission launching in 2016 to rendezvous with the near-Earth asteroid (101955) Bennu in late 2018. Following an extensive campaign of proximity operations activities to characterize the properties of Bennu and select a suitable sample site, OSIRIES-REx will fly a Touch-And-Go (TAG) trajectory to the asteroid's surface to obtain a regolith sample. The paper summarizes the mission design of the TAG sequence, the propulsive required to achieve the trajectory, and the sequence of events leading up to the TAG event. The paper will summarize the Monte-Carlo simulation of the TAG sequence and present analysis results that demonstrate the ability to conduct the TAG within 25 meters of the selected sample site and +-2 cms of the targeted contact velocity. The paper will describe some of the challenges associated with conducting precision navigation operations and ultimately contacting a very small asteroid.
Non-Coding RNA Analysis Using the Rfam Database.

PubMed

Kalvari, Ioanna; Nawrocki, Eric P; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D; Bateman, Alex; Petrov, Anton I

2018-06-01

Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
OSIRI-REx Touch and Go (TAG) Navigation Performance

NASA Technical Reports Server (NTRS)

Berry, Kevin; Antreasian, Peter; Moreau, Michael C.; May, Alex; Sutter, Brian

2015-01-01

The Origins Spectral Interpretation Resource Identification Security Regolith Explorer (OSIRIS-REx) mission is a NASA New Frontiers mission launching in 2016 to rendezvous with the near-Earth asteroid (101955) Bennu in late 2018. Following an extensive campaign of proximity operations activities to characterize the properties of Bennu and select a suitable sample site, OSIRIS-REx will fly a Touch-And-Go (TAG) trajectory to the asteroid's surface to obtain a regolith sample. The paper summarizes the mission design of the TAG sequence, the propulsive maneuvers required to achieve the trajectory, and the sequence of events leading up to the TAG event. The paper also summarizes the Monte-Carlo simulation of the TAG sequence and presents analysis results that demonstrate the ability to conduct the TAG within 25 meters of the selected sample site and 2 cm/s of the targeted contact velocity. The paper describes some of the challenges associated with conducting precision navigation operations and ultimately contacting a very small asteroid.
A large inversion in the linear chromosome of Streptomyces griseus caused by replicative transposition of a new Tn3 family transposon.

PubMed

Murata, M; Uchida, T; Yang, Y; Lezhava, A; Kinashi, H

2011-04-01

We have comprehensively analyzed the linear chromosomes of Streptomyces griseus mutants constructed and kept in our laboratory. During this study, macrorestriction analysis of AseI and DraI fragments of mutant 402-2 suggested a large chromosomal inversion. The junctions of chromosomal inversion were cloned and sequenced and compared with the corresponding target sequences in the parent strain 2247. Consequently, a transposon-involved mechanism was revealed. Namely, a transposon originally located at the left target site was replicatively transposed to the right target site in an inverted direction, which generated a second copy and at the same time caused a 2.5-Mb chromosomal inversion. The involved transposon named TnSGR was grouped into a new subfamily of the resolvase-encoding Tn3 family transposons based on its gene organization. At the end, terminal diversity of S. griseus chromosomes is discussed by comparing the sequences of strains 2247 and IFO13350.
An analysis by metabolic labelling of the encephalomyocarditis virus ribosomal frameshifting efficiency and stimulators.

PubMed

Ling, Roger; Firth, Andrew E

2017-08-01

Programmed -1 ribosomal frameshifting is a mechanism of gene expression whereby specific signals within messenger RNAs direct a proportion of ribosomes to shift -1 nt and continue translating in the new reading frame. Such frameshifting normally depends on an RNA structure stimulator 3'-adjacent to a 'slippery' heptanucleotide shift site sequence. Recently we identified an unusual frameshifting mechanism in encephalomyocarditis virus, where the stimulator involves a trans-acting virus protein. Thus, in contrast to other examples of -1 frameshifting, the efficiency of frameshifting in encephalomyocarditis virus is best studied in the context of virus infection. Here we use metabolic labelling to analyse the frameshifting efficiency of wild-type and mutant viruses. Confirming previous results, frameshifting depends on a G_GUU_UUU shift site sequence and a 3'-adjacent stem-loop structure, but is not appreciably affected by the 'StopGo' sequence present ~30 nt upstream. At late timepoints, frameshifting was estimated to be 46-76 % efficient.
Molecular Characterization of Transgenic Events Using Next Generation Sequencing Approach.

PubMed

Guttikonda, Satish K; Marri, Pradeep; Mammadov, Jafar; Ye, Liang; Soe, Khaing; Richey, Kimberly; Cruse, James; Zhuang, Meibao; Gao, Zhifang; Evans, Clive; Rounsley, Steve; Kumpatla, Siva P

2016-01-01

Demand for the commercial use of genetically modified (GM) crops has been increasing in light of the projected growth of world population to nine billion by 2050. A prerequisite of paramount importance for regulatory submissions is the rigorous safety assessment of GM crops. One of the components of safety assessment is molecular characterization at DNA level which helps to determine the copy number, integrity and stability of a transgene; characterize the integration site within a host genome; and confirm the absence of vector DNA. Historically, molecular characterization has been carried out using Southern blot analysis coupled with Sanger sequencing. While this is a robust approach to characterize the transgenic crops, it is both time- and resource-consuming. The emergence of next-generation sequencing (NGS) technologies has provided highly sensitive and cost- and labor-effective alternative for molecular characterization compared to traditional Southern blot analysis. Herein, we have demonstrated the successful application of both whole genome sequencing and target capture sequencing approaches for the characterization of single and stacked transgenic events and compared the results and inferences with traditional method with respect to key criteria required for regulatory submissions.
Stress-induced rearrangement of Fusarium retrotransposon sequences.

PubMed

Anaya, N; Roncero, M I

1996-11-27

Rearrangement of fusarium oxysporum retrotransposon skippy was induced by growth in the presence of potassium chlorate. Three fungal strains, one sensitive to chlorate (Co60) and two resistant to chlorate and deficient for nitrate reductase (Co65 and Co94), were studied by Southern analysis of their genomic DNA. Polymorphism was detected in their hybridization banding pattern, relative to the wild type grown in the absence of chlorate, using various enzymes with or without restriction sites within the retrotransposon. Results were consistent with the assumption that three different events had occurred in strain Co60: genomic amplification of skippy yielding tandem arrays of the element, generation of new skippy sequences, and deletion of skippy sequences. Amplification of Co60 genomic DNA using the polymerase chain reaction and divergent primers derived from the retrotransposon generated a new band, corresponding to one long terminal repeat plus flanking sequences, that was not present in the wild-type strain. Molecular analysis of nitrate reductase-deficient mutants showed that generation and deletion of skippy sequences, but not genomic amplification in tandem repeats, had occurred in their genomes.
Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing

PubMed Central

Shen, Yingjia; Venu, R.C.; Nobuta, Kan; Wu, Xiaohui; Notibala, Varun; Demirci, Caghan; Meyers, Blake C.; Wang, Guo-Liang; Ji, Guoli; Li, Qingshun Q.

2011-01-01

Polyadenylation sites mark the ends of mRNA transcripts. Alternative polyadenylation (APA) may alter sequence elements and/or the coding capacity of transcripts, a mechanism that has been demonstrated to regulate gene expression and transcriptome diversity. To study the role of APA in transcriptome dynamics, we analyzed a large-scale data set of RNA “tags” that signify poly(A) sites and expression levels of mRNA. These tags were derived from a wide range of tissues and developmental stages that were mutated or exposed to environmental treatments, and generated using digital gene expression (DGE)–based protocols of the massively parallel signature sequencing (MPSS-DGE) and the Illumina sequencing-by-synthesis (SBS-DGE) sequencing platforms. The data offer a global view of APA and how it contributes to transcriptome dynamics. Upon analysis of these data, we found that ∼60% of Arabidopsis genes have multiple poly(A) sites. Likewise, ∼47% and 82% of rice genes use APA, supported by MPSS-DGE and SBS-DGE tags, respectively. In both species, ∼49%–66% of APA events were mapped upstream of annotated stop codons. Interestingly, 10% of the transcriptomes are made up of APA transcripts that are differentially distributed among developmental stages and in tissues responding to environmental stresses, providing an additional level of transcriptome dynamics. Examples of pollen-specific APA switching and salicylic acid treatment-specific APA clearly demonstrated such dynamics. The significance of these APAs is more evident in the 3034 genes that have conserved APA events between rice and Arabidopsis. PMID:21813626
Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

PubMed Central

Avsec, Žiga; Cheng, Jun; Gagneur, Julien

2018-01-01

Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928
Promoter mapping of the mouse Tcp-10bt gene in transgenic mice identifies essential male germ cell regulatory sequences.

PubMed

Ewulonu, U K; Snyder, L; Silver, L M; Schimenti, J C

1996-03-01

Transgenic mice were generated to localize essential promoter elements in the mouse testis-expressed Tcp-10 genes. These genes are expressed exclusively in male germ cells, and exhibit a diffuse range of transcriptional start sites, possibly due to the absence of a TATA box. A series of transgene constructs containing different amounts of 5' flanking DNA revealed that all sequences necessary for appropriate temporal and tissue-specific transcription of Tcp-10 reside between positions -1 to -973. All transgenic animals containing these sequences expressed a chimeric transgene at high levels, in a pattern that paralleled the endogenous genes. These experiments further defined a 227 bp fragment from -746 to -973 that was absolutely essential for expression. In a gel-shift assay, this 227-bp fragment bound nuclear protein from testis, but not other tissues, to yield two retarded bands. Sequence analysis of this fragment revealed a half-site for the AP-2 transcription factor recognition sequence. Gel shift assays using native or mutant oligonucleotides demonstrated that the putative AP-2 recognition sequence was essential for generating the retarded bands. Since the binding activity is testis-specific, but AP-2 expression is not exclusive to male germ cells, it is possible that transcription of Tcp-10 requires interaction between AP-2 and a germ cell-specific transcription factor.
Genomic structure and paralogous regions of the inversion breakpoint occurring between human chromosome 3p12.3 and orangutan chromosome 2.

PubMed

Yue, Y; Grossmann, B; Tsend-Ayush, E; Grützner, F; Ferguson-Smith, M A; Yang, F; Haaf, T

2005-01-01

Intrachromosomal duplications play a significant role in human genome pathology and evolution. To better understand the molecular basis of evolutionary chromosome rearrangements, we performed molecular cytogenetic and sequence analyses of the breakpoint region that distinguishes human chromosome 3p12.3 and orangutan chromosome 2. FISH with region-specific BAC clones demonstrated that the breakpoint-flanking sequences are duplicated intrachromosomally on orangutan 2 and human 3q21 as well as at many pericentromeric and subtelomeric sites throughout the genomes. Breakage and rearrangement of the human 3p12.3-homologous region in the orangutan lineage were associated with a partial loss of duplicated sequences in the breakpoint region. Consistent with our FISH mapping results, computational analysis of the human chromosome 3 genomic sequence revealed three 3p12.3-paralogous sequence blocks on human chromosome 3q21 and smaller blocks on the short arm end 3p26-->p25. This is consistent with the view that sequences from an ancestral site at 3q21 were duplicated at 3p12.3 in a common ancestor of orangutan and humans. Our results show that evolutionary chromosome rearrangements are associated with microduplications and microdeletions, contributing to the DNA differences between closely related species. Copyright (c) 2005 S. Karger AG, Basel.
Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq)

PubMed Central

Langley, Alexander R.; Gräf, Stefan; Smith, James C.; Krude, Torsten

2016-01-01

Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. PMID:27587586
Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

PubMed

Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

2016-12-01

Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Complete sequence of the genome of avian paramyxovirus type 9 and comparison with other paramyxoviruses

PubMed Central

Samuel, Arthur S.; Kumar, Sachin; Madhuri, Subbiah; Collins, Peter L.; Samal, Siba K.

2009-01-01

The complete genome consensus sequence was determined for avian paramyxovirus (APMV) serotype 9 prototype strain PMV-9/domestic Duck/New York/22/78. The genome is 15,438 nucleotides (nt) long and encodes six non-overlapping genes in the order of 3′-N-P/V/W-M-F-HN-L-5′ with intergenic regions of 0–30 nt. The genome length follows the “rule of six” and contains a 55-nt leader sequence at the 3′ end and a 47-nt trailer sequence at the 5′ end. The cleavage site of the F protein is I-R-E-G-R-I↓F, which does not conform to the conventional cleavage site of the ubiquitous cellular protease furin. The virus required exogenous protease for in vitro replication and grew only in a few established cell lines, indicating a restricted host range. Alignment and phylogenetic analysis of the predicted amino acid sequences of APMV-9 proteins with the cognate proteins of viruses of all five genera of family Paramyxoviridae showed that APMV-9 is more closely related to APMV-1 than to other APMVs. The mean death time in embryonated chicken eggs was found to be more than 120 h, indicating APMV-9 to be avirulent for chickens. PMID:19185593
Zaba: a novel miniature transposable element present in genomes of legume plants.

PubMed

Macas, J; Neumann, P; Pozárková, D

2003-08-01

A novel family of miniature transposable elements, named Zaba, was identified in pea (Pisum sativum) and subsequently also in other legume species using computer analysis of their DNA sequences. Zaba elements are 141-190 bp long, generate 10-bp target site duplications, and their terminal inverted repeats make up most of the sequence. Zaba elements thus resemble class 3 foldback transposons. The elements are only moderately repetitive in pea (tens to hundreds copies per haploid genome), but they are present in up to thousands of copies in the genomes of several Medicago and Vicia species. More detailed analysis of the elements from pea, including isolation of new sequences from a genomic library, revealed that a fraction of these elements are truncated, and that their last transposition probably did not occur recently. A search for Zaba sequences in EST databases showed that at least some elements are transcribed, most probably due to their association with genic regions.
Cleavage sites within the poliovirus capsid protein precursors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Larsen, G.R.; Anderson, C.W.; Dorner, A.J.

1982-01-01

Partial amino-terminal sequence analysis was performed on radiolabeled poliovirus capsid proteins VP1, VP2, and VP3. A computer-assisted comparison of the amino acid sequences obtained with that predicted by the nucleotide sequence of the poliovirus genome allows assignment of the amino terminus of each capsid protein to a unique position within the virus polyprotein. Sequence analysis of trypsin-digested VP4, which has a blocked amino terminus, demonstrates that VP4 is encoded at or very near to the amino terminus of the polyprotein. The gene order of the capsid proteins is VP4-VP2-VP3-VP1. Cleavage of VP0 to VP4 and VP2 is shown to occurmore » between asparagine and serine, whereas the cleavages that separate VP2/VP3 and VP3/VP1 occur between glutamine and glycine residues. This finding supports the hypothesis that the cleavage of VP0, which occurs during virion morphogenesis, is distinct from the cleavages that separate functional regions of the polyprotein.« less
A survey of the sorghum transcriptome using single-molecule long reads

DOE PAGES

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

2016-06-24

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads

PubMed Central

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

2016-01-01

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
Conservation and variability of West Nile virus proteins.

PubMed

Koo, Qi Ying; Khan, Asif M; Jung, Keun-Ok; Ramdas, Shweta; Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir; Salmon, Jerome; August, J Thomas

2009-01-01

West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.
Chronology of Eocene-Miocene sequences on the New Jersey shallow shelf: implications for regional, interregional, and global correlations

USGS Publications Warehouse

Browning, James V.; Miller, Kenneth G.; Sugarman, Peter J.; Barron, John; McCarthy, Francine M.G.; Kulhanek, Denise K.; Katz, Miriam E.; Feigenson, Mark D.

2013-01-01

Integrated Ocean Drilling Program Expedition 313 continuously cored and logged latest Eocene to early-middle Miocene sequences at three sites (M27, M28, and M29) on the inner-middle continental shelf offshore New Jersey, providing an opportunity to evaluate the ages, global correlations, and significance of sequence boundaries. We provide a chronology for these sequences using integrated strontium isotopic stratigraphy and biostratigraphy (primarily calcareous nannoplankton, diatoms, and dinocysts [dinoflagellate cysts]). Despite challenges posed by shallow-water sediments, age resolution is typically ±0.5 m.y. and in many sequences is as good as ±0.25 m.y. Three Oligocene sequences were sampled at Site M27 on sequence bottomsets. Fifteen early to early-middle Miocene sequences were dated at Sites M27, M28, and M29 across clinothems in topsets, foresets (where the sequences are thickest), and bottomsets. A few sequences have coarse (∼1 m.y.) or little age constraint due to barren zones; we constrain the age estimates of these less well dated sequences by applying the principle of superposition, i.e., sediments above sequence boundaries in any site are younger than the sediments below the sequence boundaries at other sites. Our age control provides constraints on the timing of deposition in the clinothem; sequences on the topsets are generally the youngest in the clinothem, whereas the bottomsets generally are the oldest. The greatest amount of time is represented on foresets, although we have no evidence for a correlative conformity. Our chronology provides a baseline for regional and interregional correlations and sea-level reconstructions: (1) we correlate a major increase in sedimentation rate precisely with the timing of the middle Miocene climate changes associated with the development of a permanent East Antarctic Ice Sheet; and (2) the timing of sequence boundaries matches the deep-sea oxygen isotopic record, implicating glacioeustasy as a major driver for forming sequence boundaries.
Site-directed mutagenesis in Petunia × hybrida protoplast system using direct delivery of purified recombinant Cas9 ribonucleoproteins.

PubMed

Subburaj, Saminathan; Chung, Sung Jin; Lee, Choongil; Ryu, Seuk-Min; Kim, Duk Hyoung; Kim, Jin-Soo; Bae, Sangsu; Lee, Geung-Joo

2016-07-01

Site-directed mutagenesis of nitrate reductase genes using direct delivery of purified Cas9 protein preassembled with guide RNA produces mutations efficiently in Petunia × hybrida protoplast system. The clustered, regularly interspaced, short palindromic repeat (CRISPR)-CRISPR associated endonuclease 9 (CRISPR/Cas9) system has been recently announced as a powerful molecular breeding tool for site-directed mutagenesis in higher plants. Here, we report a site-directed mutagenesis method targeting Petunia nitrate reductase (NR) gene locus. This method could create mutations efficiently using direct delivery of purified Cas9 protein and single guide RNA (sgRNA) into protoplast cells. After transient introduction of RNA-guided endonuclease (RGEN) ribonucleoproteins (RNPs) with different sgRNAs targeting NR genes, mutagenesis at the targeted loci was detected by T7E1 assay and confirmed by targeted deep sequencing. T7E1 assay showed that RGEN RNPs induced site-specific mutations at frequencies ranging from 2.4 to 21 % at four different sites (NR1, 2, 4 and 6) in the PhNR gene locus with average mutation efficiency of 14.9 ± 2.2 %. Targeted deep DNA sequencing revealed mutation rates of 5.3-17.8 % with average mutation rate of 11.5 ± 2 % at the same NR gene target sites in DNA fragments of analyzed protoplast transfectants. Further analysis from targeted deep sequencing showed that the average ratio of deletion to insertion produced collectively by the four NR-RGEN target sites (NR1, 2, 4, and 6) was about 63:37. Our results demonstrated that direct delivery of RGEN RNPs into protoplast cells of Petunia can be exploited as an efficient tool for site-directed mutagenesis of genes or genome editing in plant systems.
GeneChip{sup {trademark}} screening assay for cystic fibrosis mutations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cronn, M.T.; Miyada, C.G.; Fucini, R.V.

1994-09-01

GeneChip{sup {trademark}} assays are based on high density, carefully designed arrays of short oligonucleotide probes (13-16 bases) built directly on derivatized silica substrates. DNA target sequence analysis is achieved by hybridizing fluorescently labeled amplification products to these arrays. Fluorescent hybridization signals located within the probe array are translated into target sequence information using the known probe sequence at each array feature. The mutation screening assay for cystic fibrosis includes sets of oligonucleotide probes designed to detect numerous different mutations that have been described in 14 exons and one intron of the CFTR gene. Each mutation site is addressed by amore » sub-array of at least 40 probe sequences, half designed to detect the wild type gene sequence and half designed to detect the reported mutant sequence. Hybridization with homozygous mutant, homozygous wild type or heterozygous targets results in distinctive hybridization patterns within a sub-array, permitting specific discrimination of each mutation. The GeneChip probe arrays are very small (approximately 1 cm{sup 2}). There miniature size coupled with their high information content make GeneChip probe arrays a useful and practical means for providing CF mutation analysis in a clinical setting.« less
Analysis of the cytochrome c oxidase subunit II (COX2) gene in giant panda, Ailuropoda melanoleuca.

PubMed

Ling, S S; Zhu, Y; Lan, D; Li, D S; Pang, H Z; Wang, Y; Li, D Y; Wei, R P; Zhang, H M; Wang, C D; Hu, Y D

2017-01-23

The giant panda, Ailuropoda melanoleuca (Ursidae), has a unique bamboo-based diet; however, this low-energy intake has been sufficient to maintain the metabolic processes of this species since the fourth ice age. As mitochondria are the main sites for energy metabolism in animals, the protein-coding genes involved in mitochondrial respiratory chains, particularly cytochrome c oxidase subunit II (COX2), which is the rate-limiting enzyme in electron transfer, could play an important role in giant panda metabolism. Therefore, the present study aimed to isolate, sequence, and analyze the COX2 DNA from individuals kept at the Giant Panda Protection and Research Center, China, and compare these sequences with those of the other Ursidae family members. Multiple sequence alignment showed that the COX2 gene had three point mutations that defined three haplotypes, with 60% of the sequences corresponding to haplotype I. The neutrality tests revealed that the COX2 gene was conserved throughout evolution, and the maximum likelihood phylogenetic analysis, using homologous sequences from other Ursidae species, showed clustering of the COX2 sequences of giant pandas, suggesting that this gene evolved differently in them.
SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments.

PubMed

Ajawatanawong, Pravech; Atkinson, Gemma C; Watson-Haigh, Nathan S; Mackenzie, Bryony; Baldauf, Sandra L

2012-07-01

Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.
MethylViewer: computational analysis and editing for bisulfite sequencing and methyltransferase accessibility protocol for individual templates (MAPit) projects.

PubMed

Pardo, Carolina E; Carr, Ian M; Hoffman, Christopher J; Darst, Russell P; Markham, Alexander F; Bonthron, David T; Kladde, Michael P

2011-01-01

Bisulfite sequencing is a widely-used technique for examining cytosine DNA methylation at nucleotide resolution along single DNA strands. Probing with cytosine DNA methyltransferases followed by bisulfite sequencing (MAPit) is an effective technique for mapping protein-DNA interactions. Here, MAPit methylation footprinting with M.CviPI, a GC methyltransferase we previously cloned and characterized, was used to probe hMLH1 chromatin in HCT116 and RKO colorectal cancer cells. Because M.CviPI-probed samples contain both CG and GC methylation, we developed a versatile, visually-intuitive program, called MethylViewer, for evaluating the bisulfite sequencing results. Uniquely, MethylViewer can simultaneously query cytosine methylation status in bisulfite-converted sequences at as many as four different user-defined motifs, e.g. CG, GC, etc., including motifs with degenerate bases. Data can also be exported for statistical analysis and as publication-quality images. Analysis of hMLH1 MAPit data with MethylViewer showed that endogenous CG methylation and accessible GC sites were both mapped on single molecules at high resolution. Disruption of positioned nucleosomes on single molecules of the PHO5 promoter was detected in budding yeast using M.CviPII, increasing the number of enzymes available for probing protein-DNA interactions. MethylViewer provides an integrated solution for primer design and rapid, accurate and detailed analysis of bisulfite sequencing or MAPit datasets from virtually any biological or biochemical system.
Nuclear factors that bind to the enhancer region of nondefective Friend murine leukemia virus.

PubMed Central

Manley, N R; O'Connell, M A; Sharp, P A; Hopkins, N

1989-01-01

Nondefective Friend murine leukemia virus (MuLV) causes erythroleukemia when injected into newborn NFS mice, while Moloney MuLV causes T-cell lymphoma. Exchange of the Friend virus enhancer region, a sequence of about 180 nucleotides including the direct repeat and a short 3'-adjacent segment, for the corresponding region in Moloney MuLV confers the ability to cause erythroid disease on Moloney MuLV. We have used the electrophoretic mobility shift assay and methylation interference analysis to identify cellular factors which bind to the Friend virus enhancer region and compared these with factors, previously identified, that bind to the Moloney virus direct repeat (N. A. Speck and D. Baltimore, Mol. Cell. Biol. 7:1101-1110, 1987). We identified five binding sites for sequence-specific DNA-binding proteins in the Friend virus enhancer region. While some binding sites are present in both the Moloney and Friend virus enhancers, both viruses contain unique sites not present in the other. Although none of the factors identified in this report which bind to these unique sites are present exclusively in T cells or erythroid cells, they bind to three regions of the enhancer shown by genetic analysis to encode disease specificity and thus are candidates to mediate the tissue-specific expression and distinct disease specificities encoded by these virus enhancer elements. Images PMID:2778872
Global Transcriptional Start Site Mapping Using Differential RNA Sequencing Reveals Novel Antisense RNAs in Escherichia coli

PubMed Central

Thomason, Maureen K.; Bischler, Thorsten; Eisenbart, Sara K.; Förstner, Konrad U.; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay

2014-01-01

While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. PMID:25266388
Haemagglutinin and neuraminidase sequencing delineate nosocomial influenza outbreaks with accuracy equivalent to whole genome sequencing.

PubMed

Houghton, Rebecca; Ellis, Joanna; Galiano, Monica; Clark, Tristan W; Wyllie, Sarah

2017-04-01

We describe haemagglutinin (HA) and neuraminidase (NA) sequencing in an apparent cross-site influenza A(H1N1) outbreak in renal transplant and haemodialysis patients, confirmed with whole genome sequencing (WGS). Isolates were sequenced from influenza positive individuals. Phylogenetic trees were constructed using HA and NA sequencing and subsequently WGS. Sequence data was analysed to determine genetic relatedness of viruses obtained from inpatient and outpatient cohorts and compared with epidemiological outbreak information. There were 6 patient cases of influenza in the inpatient renal ward cohort (associated with 3 deaths) and 9 patient cases in the outpatient haemodialysis unit cohort (no deaths). WGS confirmed clustered transmission of two genetically different influenza A(H1N1)pdm09 strains initially identified by analysis of HA and NA genes. WGS took longer, and in this case was not required to determine whether or not the two seemingly linked outbreaks were related. Rapid sequencing of HA and NA genes may be sufficient to aid early influenza outbreak investigation making it appealing for future outbreak investigation. However, as next generation sequencing becomes cheaper and more widely available and bioinformatics software is now freely accessible next generation whole genome analysis may increasingly become a valuable tool for real-time Influenza outbreak investigation. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

PubMed

Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

2017-03-27

Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

PubMed

Khoe, Clairine V; Chung, Long H; Murray, Vincent

2018-06-01

The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
High Throughput Sequencing Identifies Misregulated Genes in the Drosophila Polypyrimidine Tract-Binding Protein (hephaestus) Mutant Defective in Spermatogenesis.

PubMed

Sridharan, Vinod; Heimiller, Joseph; Robida, Mark D; Singh, Ravinder

2016-01-01

The Drosophila polypyrimidine tract-binding protein (dmPTB or hephaestus) plays an important role during spermatogenesis. The heph2 mutation in this gene results in a specific defect in spermatogenesis, causing aberrant spermatid individualization and male sterility. However, the array of molecular defects in the mutant remains uncharacterized. Using an unbiased high throughput sequencing approach, we have identified transcripts that are misregulated in this mutant. Aberrant transcripts show altered expression levels, exon skipping, and alternative 5' ends. We independently verified these findings by reverse-transcription and polymerase chain reaction (RT-PCR) analysis. Our analysis shows misregulation of transcripts that have been connected to spermatogenesis, including components of the actomyosin cytoskeletal apparatus. We show, for example, that the Myosin light chain 1 (Mlc1) transcript is aberrantly spliced. Furthermore, bioinformatics analysis reveals that Mlc1 contains a high affinity binding site(s) for dmPTB and that the site is conserved in many Drosophila species. We discuss that Mlc1 and other components of the actomyosin cytoskeletal apparatus offer important molecular links between the loss of dmPTB function and the observed developmental defect in spermatogenesis. This study provides the first comprehensive list of genes misregulated in vivo in the heph2 mutant in Drosophila and offers insight into the role of dmPTB during spermatogenesis.
Best Practices for Environmental Site Management: A Practical Guide for Applying Environmental Sequence Stratigraphy to Improve Conceptual Site Models

EPA Science Inventory

Presented here is a practical guide on the application of the geologic principles of sequence stratigraphy and facies models to the characterization of stratigraphic heterogeneity at hazardous waste sites. This technology is applicable to sites underlain by clastic aquifers (int...
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2.

PubMed

Gotter, Anthony L; Shaikh, Tamim H; Budarf, Marcia L; Rhodes, C Harker; Emanuel, Beverly S

2004-01-01

Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem-loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem-loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure.

A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2

PubMed Central

Gotter, Anthony L.; Shaikh, Tamim H.; Budarf, Marcia L.; Rhodes, C. Harker; Emanuel, Beverly S.

2010-01-01

Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem–loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem–loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure. PMID:14613967
Digital Biological Converter

DTIC Science & Technology

2013-06-28

of cuts that each fragment should be cut into so the fragments are no greater than a specific length threshold. Additionally, vector sequences and...restriction sites are attached to each fragment while ensuring the restriction sites are unique to each sequence. The vector sequences serve as hooks...for assembly into vector for cloning purposes, and also as primer binding domains for PCR ampl ification. The restriction sites are added to
Divergence of Structure and Function in the Haloacid Dehalogenase Enzyme Superfamily: Bacteroides thetaiotaomicron BT2127 is an Inorganic Pyrophosphatase+

PubMed Central

Huang, Hua; Yury, Patskovsky; Toro, Rafael; Farelli, Jeremiah D.; Pandya, Chetanya; Almo, Steven C.; Allen, Karen N.; Dunaway-Mariano, Debra

2012-01-01

The explosion of protein sequence information requires that current strategies for function assignment must evolve to complement experimental approaches with computationally-based function prediction. This necessitates the development of strategies based on the identification of sequence markers in the form of specificity determinants and a more informed definition of orthologues. Herein, we have undertaken the function assignment of the unknown Haloalkanoate Dehalogenase superfamily member BT2127 (Uniprot accession # Q8A5V9) from Bacteroides thetaiotaomicron using an integrated bioinformatics/structure/mechanism approach. The substrate specificity profile and steady-state rate constants of BT2127 (with kcat/Km value for pyrophosphate of ∼1 × 105 M−1 s−1), together with the gene context, supports the assigned in vivo function as an inorganic pyrophosphatase. The X-ray structural analysis of the wild-type BT2127 and several variants generated by site-directed mutagenesis shows that substrate discrimination is based, in part, on active site space restrictions imposed by the cap domain (specifically by residues Tyr76 and Glu47). Structure guided site directed mutagenesis coupled with kinetic analysis of the mutant enzymes identified the residues required for catalysis, substrate binding, and domain-domain association. Based on this structure-function analysis, the catalytic residues Asp11, Asp13, Thr113, and Lys147 as well the metal binding residues Asp171, Asn172 and Glu47 were used as markers to confirm BT2127 orthologues identified via sequence searches. This bioinformatic analysis demonstrated that the biological range of BT2127 orthologue is restricted to the phylum Bacteroidetes/Chlorobi. The key structural determinants in the divergence of BT2127 and its closest homologue β-phosphoglucomutase control the leaving group size (phosphate vs. glucose-phosphate) and the position of the Asp acid/base in the open vs. closed conformations. HADSF pyrophosphatases represent a third mechanistic and fold type for bacterial pyrophosphatases. PMID:21894910
Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS

PubMed Central

Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong

2012-01-01

Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126
Transposase-Mediated Excision, Conjugative Transfer, and Diversity of ICE6013 Elements in Staphylococcus aureus.

PubMed

Sansevere, Emily A; Luo, Xiao; Park, Joo Youn; Yoon, Sunghyun; Seo, Keun Seok; Robinson, D Ashley

2017-04-15

ICE 6013 represents one of two families of integrative conjugative elements (ICEs) identified in the pan-genome of the human and animal pathogen Staphylococcus aureus Here we investigated the excision and conjugation functions of ICE 6013 and further characterized the diversity of this element. ICE 6013 excision was not significantly affected by growth, temperature, pH, or UV exposure and did not depend on recA The IS 30 -like DDE transposase (Tpase; encoded by orf1 and orf2 ) of ICE 6013 must be uninterrupted for excision to occur, whereas disrupting three of the other open reading frames (ORFs) on the element significantly affects the level of excision. We demonstrate that ICE 6013 conjugatively transfers to different S. aureus backgrounds at frequencies approaching that of the conjugative plasmid pGO1. We found that excision is required for conjugation, that not all S. aureus backgrounds are successful recipients, and that transconjugants acquire the ability to transfer ICE 6013 Sequencing of chromosomal integration sites in serially passaged transconjugants revealed a significant integration site preference for a 15-bp AT-rich palindromic consensus sequence, which surrounds the 3-bp target site that is duplicated upon integration. A sequence analysis of ICE 6013 from different host strains of S. aureus and from eight other species of staphylococci identified seven divergent subfamilies of ICE 6013 that include sequences previously classified as a transposon, a plasmid, and various ICEs. In summary, these results indicate that the IS 30 -like Tpase functions as the ICE 6013 recombinase and that ICE 6013 represents a diverse family of mobile genetic elements that mediate conjugation in staphylococci. IMPORTANCE Integrative conjugative elements (ICEs) encode the abilities to integrate into and excise from bacterial chromosomes and plasmids and mediate conjugation between bacteria. As agents of horizontal gene transfer, ICEs may affect bacterial evolution. ICE 6013 represents one of two known families of ICEs in the pathogen Staphylococcus aureus , but its core functions of excision and conjugation are not well studied. Here, we show that ICE 6013 depends on its IS 30 -like DDE transposase for excision, which is unique among ICEs, and we demonstrate the conjugative transfer and integration site preference of ICE 6013 A sequence analysis revealed that ICE 6013 has diverged into seven subfamilies that are dispersed among staphylococci. Copyright © 2017 American Society for Microbiology.
Information analysis of sequences that bind the replication initiator RepA | Center for Cancer Research

Cancer.gov

The tall letters represent the highly conserved bases in DNA binding sites of several prokaryotic repressors and activators. Conservation is strongest where major grooves of the double helical DNA (represented by crests of a cosine wave) face the protein. This shows that conservation analysis alone can be used to predict the face of DNA that contacts the proteins.
Web-ware bioinformatical analysis and structure modelling of N-terminus of human multisynthetase complex auxiliary component protein p43.

PubMed

Deineko, Viktor

2006-01-01

Human multisynthetase complex auxiliary component, protein p43 is an endothelial monocyte-activating polypeptide II precursor. In this study, comprehensive sequence analysis of N-terminus has been performed to identify structural domains, motifs, sites of post-translation modification and other functionally important parameters. The spatial structure model of full-chain protein p43 is obtained.
Functional identification and regulatory analysis of Δ6-fatty acid desaturase from the oleaginous fungus Mucor sp. EIM-10.

PubMed

Jiang, Xianzhang; Liu, Hongjiao; Niu, Yongchao; Qi, Feng; Zhang, Mingliang; Huang, Jianzhong

2017-03-01

To enlarge the diversity of the desaturases associated with PUFA biosynthesis and to better understand the transcriptional regulation of desaturases, a Δ 6 -desaturase gene (Md6) from Mucor sp. and its 5'-upstream sequence was functionally identified in Saccharomyces cerevisiae. Expression of the Δ 6 -fatty acid desaturase (Md6) in S. cerevisiae showed that Md6 could convert linolenic acid to γ-linolenic acid. Computational analysis of the promoter of Md6 suggested it contains several eukaryotic fundamental transcription regulatory elements. In vivo functional analysis of the promoter showed the 5'-upstream sequence of Md6 could initiate expression of GFP and Md6 itself in S. cerevisiae. A series deletion analysis of the promoter suggested that sequence between -919 to -784 bp (relative to start site) named as eMd6 is the key factor for high activity of Δ 6 -desaturase. The activity of Δ 6 -desaturase was increased by 2.8-fold and 2.5-fold when the eMd6 sequence was placed upstream of -434 with forward or reverse orientations respectively. To our best knowledge, the native promoter of Md6 from Mucor is the strongest promoter for Δ 6 -desaturase reported so far and the sequence between -919 to -784 bp is an enhancer for Δ 6 -desaturase activity.
Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq).

PubMed

Watters, Kyle E; Lucks, Julius B

2016-01-01

Mapping RNA structure with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry has proven to be a versatile method for characterizing RNA structure in a variety of contexts. SHAPE reagents covalently modify RNAs in a structure-dependent manner to create adducts at the 2'-OH group of the ribose backbone at nucleotides that are structurally flexible. The positions of these adducts are detected using reverse transcriptase (RT) primer extension, which stops one nucleotide before the modification, to create a pool of cDNAs whose lengths reflect the location of SHAPE modification. Quantification of the cDNA pools is used to estimate the "reactivity" of each nucleotide in an RNA molecule to the SHAPE reagent. High reactivities indicate nucleotides that are structurally flexible, while low reactivities indicate nucleotides that are inflexible. These SHAPE reactivities can then be used to infer RNA structures by restraining RNA structure prediction algorithms. Here, we provide a state-of-the-art protocol describing how to perform in vitro RNA structure probing with SHAPE chemistry using next-generation sequencing to quantify cDNA pools and estimate reactivities (SHAPE-Seq). The use of next-generation sequencing allows for higher throughput, more consistent data analysis, and multiplexing capabilities. The technique described herein, SHAPE-Seq v2.0, uses a universal reverse transcription priming site that is ligated to the RNA after SHAPE modification. The introduced priming site allows for the structural analysis of an RNA independent of its sequence.
Short interspersed elements (SINEs) are a major source of canine genomic diversity.

PubMed

Wang, Wei; Kirkness, Ewen F

2005-12-01

SINEs are retrotransposons that have enjoyed remarkable reproductive success during the course of mammalian evolution, and have played a major role in shaping mammalian genomes. Previously, an analysis of survey-sequence data from an individual dog (a poodle) indicated that canine genomes harbor a high frequency of alleles that differ only by the absence or presence of a SINEC_Cf repeat. Comparison of this survey-sequence data with a draft genome sequence of a distinct dog (a boxer) has confirmed this prediction, and revealed the chromosomal coordinates for >10,000 loci that are bimorphic for SINEC_Cf insertions. Analysis of SINE insertion sites from the genomes of nine additional dogs indicates that 3%-5% are absent from either the poodle or boxer genome sequences--suggesting that an additional 10,000 bimorphic loci could be readily identified in the general dog population. We describe a methodology that can be used to identify these loci, and could be adapted to exploit these bimorphic loci for genotyping purposes. Approximately half of all annotated canine genes contain SINEC_Cf repeats, and these elements are occasionally transcribed. When transcribed in the antisense orientation, they provide splice acceptor sites that can result in incorporation of novel exons. The high frequency of bimorphic SINE insertions in the dog population is predicted to provide numerous examples of allele-specific transcription patterns that will be valuable for the study of differential gene expression among multiple dog breeds.
Epitope mapping of the variable repetitive region with the MB antigen of Ureaplasma urealyticum.

PubMed Central

Zheng, X; Lau, K; Frazier, M; Cassell, G H; Watson, H L

1996-01-01

One of the major surface structures of Ureaplasma urealyticum recognized by antibodies of patients during infection is the MB antigen. Previously, we showed by Western blot (immunoblot) analysis that any one of the anti-MB monoclonal antibodies (MAbs) 3B1.5, 5B1.1, and 10C6.6 could block the binding of patient antibodies to MB. Subsequent DNA sequencing revealed that a unique six-amino-acid direct tandem repeat region composed the carboxy two-thirds of this antigen. In the present study, using antibody-reactive peptide scanning of this repeat region, we demonstrated that the amino acids defining the epitopes for MAbs 3B1.5 5B1.1 and 10C6.6 are EQP, GK, and KEQPA, respectively. Peptide scanning analysis of an infected patient's serum antibody response showed that the dominant epitope was defined by the sequence PAGK. Mapping of these continuous epitopes revealed overlap between all MAb and patient polyclonal antibody binding sites, thus explaining the ability of a single MAb to apparently block all polyclonal antibody binding sites. We also show that a single amino acid difference in the sequence of the repeats of serovars 3 and 14 accounts for the lack of reactivity with serovar 14 of two of the serovar 3-specific MAbs. Finally, the data demonstrate the need to obtain the sequences of the mba genes of all serovars before an effective serovar-specific antibody detection method can be developed. PMID:8914774
Generation and Analysis of the Expressed Sequence Tags from the Mycelium of Ganoderma lucidum

PubMed Central

Huang, Yen-Hua; Wu, Hung-Yi; Wu, Keh-Ming; Liu, Tze-Tze; Liou, Ruey-Fen; Tsai, Shih-Feng; Shiao, Ming-Shi; Ho, Low-Tone; Tzean, Shean-Shong; Yang, Ueng-Cheng

2013-01-01

Ganoderma lucidum (G. lucidum) is a medicinal mushroom renowned in East Asia for its potential biological effects. To enable a systematic exploration of the genes associated with the various phenotypes of the fungus, the genome consortium of G. lucidum has carried out an expressed sequence tag (EST) sequencing project. Using a Sanger sequencing based approach, 47,285 ESTs were obtained from in vitro cultures of G. lucidum mycelium of various durations. These ESTs were further clustered and merged into 7,774 non-redundant expressed loci. The features of these expressed contigs were explored in terms of over-representation, alternative splicing, and natural antisense transcripts. Our results provide an invaluable information resource for exploring the G. lucidum transcriptome and its regulation. Many cases of the genes over-represented in fast-growing dikaryotic mycelium are closely related to growth, such as cell wall and bioactive compound synthesis. In addition, the EST-genome alignments containing putative cassette exons and retained introns were manually curated and then used to make inferences about the predominating splice-site recognition mechanism of G. lucidum. Moreover, a number of putative antisense transcripts have been pinpointed, from which we noticed that two cases are likely to reveal hitherto undiscovered biological pathways. To allow users to access the data and the initial analysis of the results of this project, a dedicated web site has been created at http://csb2.ym.edu.tw/est/. PMID:23658685
Cloning and sequencing of the pheP gene, which encodes the phenylalanine-specific transport system of Escherichia coli.

PubMed Central

Pi, J; Wookey, P J; Pittard, A J

1991-01-01

The phenylalanine-specific permease gene (pheP) of Escherichia coli has been cloned and sequenced. The gene was isolated on a 6-kb Sau3AI fragment from a chromosomal library, and its presence was verified by complementation of a mutant lacking the functional phenylalanine-specific permease. Subcloning from this fragment localized the pheP gene on a 2.7-kb HindIII-HindII fragment. The nucleotide sequence of this 2.7-kb region was determined. An open reading frame was identified which extends from a putative start point of translation (GTG at position 636) to a termination signal (TAA at position 2010). The assignment of the GTG as the initiation codon was verified by site-directed mutagenesis of the initiation codon and by introducing a chain termination mutation into the pheP-lacZ fusion construct. A single initiation site of transcription 30 bp upstream of the start point of translation was identified by the primer extension analysis. The pheP structural gene consists of 1,374 nucleotides specifying a protein of 458 amino acid residues. The PheP protein is very hydrophobic (71% nonpolar residues). A topological model predicted from the sequence analysis defines 12 transmembrane segments. This protein is highly homologous with the AroP (general aromatic transport) system of E. coli (59.6% identity) and to a lesser extent with the yeast permeases CAN1 (arginine), PUT4 (proline), and HIP1 (histidine) of Saccharomyces cerevisiae. Images PMID:1711024
Diversity of the small subunit ribosomal RNA gene of the arbuscular mycorrhizal fungi colonizing Clintonia borealis from a mixed-wood boreal forest.

PubMed

DeBellis, Tonia; Widden, Paul

2006-11-01

Arbuscular mycorrhizal fungi (AMF) communities in Clintonia borealis roots from a boreal mixed forests in northwestern Québec were investigated. Roots were sampled from 100 m2 plots whose overstory was dominated by either trembling aspen (Populus tremuloides Michx.), white birch (Betula papyrifera Marsh.), or mixed white spruce (Picea glauca (Moench) Voss) and balsam fir (Abies balsamea (L.) Mill.). Part of the 18S ribosomal gene of the AMF was amplified and the resulting PCR products were cloned. Restriction analysis of the 576 resulting clones yielded 92 different restriction patterns which were then sequenced. Fifty-two sequences closely matched other Glomus sequences from Genbank. Phylogenetic analysis revealed 10 different AMF sequence types, most of which clustered with other uncultured AM sequences from plant roots from various field sites. Compared with other AMF communities from comparable studies, richness and diversity were higher than observed in an arable field, but lower than seen in a tropical forest and a temperate wetland. The AMF communities from Clintonia roots under the different canopy types did not differ significantly and the dominant sequence type, which clustered with AM sequences from a variety of environments and hosts at distant geographical locations, represented 66.9% of all the clones analyzed.
VKCDB: voltage-gated K+ channel database updated and upgraded.

PubMed

Gallin, Warren J; Boutet, Patrick A

2011-01-01

The Voltage-gated K(+) Channel DataBase (VKCDB) (http://vkcdb.biology.ualberta.ca) makes a comprehensive set of sequence data readily available for phylogenetic and comparative analysis. The current update contains 2063 entries for full-length or nearly full-length unique channel sequences from Bacteria (477), Archaea (18) and Eukaryotes (1568), an increase from 346 solely eukaryotic entries in the original release. In addition to protein sequences for channels, corresponding nucleotide sequences of the open reading frames corresponding to the amino acid sequences are now available and can be extracted in parallel with sets of protein sequences. Channels are categorized into subfamilies by phylogenetic analysis and by using hidden Markov model analyses. Although the raw database contains a number of fragmentary, duplicated, obsolete and non-channel sequences that were collected in early steps of data collection, the web interface will only return entries that have been validated as likely K(+) channels. The retrieval function of the web interface allows retrieval of entries that contain a substantial fraction of the core structural elements of VKCs, fragmentary entries, or both. The full database can be downloaded as either a MySQL dump or as an XML dump from the web site. We have now implemented automated updates at quarterly intervals.
In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae

PubMed Central

Kurotani, Atsushi; Sakurai, Tetsuya

2015-01-01

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups. PMID:26307970
In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae.

PubMed

Kurotani, Atsushi; Sakurai, Tetsuya

2015-08-20

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups.
Tomato (Solanum lycopersicum) variety discrimination and hybridization analysis based on the 5S rRNA region.

PubMed

Sun, Yan-Lin; Kang, Ho-Min; Kim, Young-Sik; Baek, Jun-Pill; Zheng, Shi-Lin; Xiang, Jin-Jun; Hong, Soon-Kwan

2014-05-04

The tomato ( Solanum lycopersicum ) is a major vegetable crop worldwide. To satisfy popular demand, more than 500 tomato varieties have been bred. However, a clear variety identification has not been found. Thorough understanding of the phylogenetic relationship and hybridization information of tomato varieties is very important for further variety breeding. Thus, in this study, we collected 26 tomato varieties and attempted to distinguish them based on the 5S rRNA region, which is widely used in the determination of phylogenetic relations. Sequence analysis of the 5S rRNA region suggested that a large number of nucleotide variations exist among tomato varieties. These variable nucleotide sites were also informative regarding hybridization. Chromas sequencing of Yellow Mountain View and Seuwiteuking varieties indicated three and one variable nucleotide sites in the non-transcribed spacer (NTS) of the 5S rRNA region showing hybridization, respectively. Based on a phylogenetic tree constructed using the 5S rRNA sequences, we observed that 16 tomato varieties were divided into three groups at 95% similarity. Rubiking and Sseommeoking, Lang Selection Procedure and Seuwiteuking, and Acorn Gold and Yellow Mountain View exhibited very high identity with their partners. This work will aid variety authentication and provides a basis for further tomato variety breeding.
Interferon-gamma of the giant panda (Ailuropoda melanoleuca): complementary DNA cloning, expression, and phylogenetic analysis.

PubMed

Tao, Yaqiong; Zeng, Bo; Xu, Liu; Yue, Bisong; Yang, Dong; Zou, Fangdong

2010-01-01

Interferon-gamma (IFN-gamma) is the only member of type II IFN and is vital in the regulation of immune and inflammatory responses. Herein we report the cloning, expression, and sequence analysis of IFN-gamma from the giant panda (Ailuropoda melanoleuca). The open reading frame of this gene is 501 base pair in length and encodes a polypeptide consisting of 166 amino acids. All conserved N-linked glycosylation sites and cysteine residues among carnivores were found in the predicted amino acid sequence of the giant panda. Recombinant giant panda IFN-gamma with a V5 epitope and polyhistidine tag was expressed in HEK293 host cells and confirmed by Western blotting. Phylogenetic analysis of mammalian IFN-gamma-coding sequences indicated that the giant panda IFN-gamma was closest to that of carnivores, then to ungulates and dolphin, and shared a distant relationship with mouse and human. These results represent a first step into the study of IFN-gamma in giant panda.
The Neandertal type site revisited: Interdisciplinary investigations of skeletal remains from the Neander Valley, Germany

PubMed Central

Schmitz, Ralf W.; Serre, David; Bonani, Georges; Feine, Susanne; Hillgruber, Felix; Krainitzki, Heike; Pääbo, Svante; Smith, Fred H.

2002-01-01

The 1856 discovery of the Neandertal type specimen (Neandertal 1) in western Germany marked the beginning of human paleontology and initiated the longest-standing debate in the discipline: the role of Neandertals in human evolutionary history. We report excavations of cave sediments that were removed from the Feldhofer caves in 1856. These deposits have yielded over 60 human skeletal fragments, along with a large series of Paleolithic artifacts and faunal material. Our analysis of this material represents the first interdisciplinary analysis of Neandertal remains incorporating genetic, direct dating, and morphological dimensions simultaneously. Three of these skeletal fragments fit directly on Neandertal 1, whereas several others have distinctively Neandertal features. At least three individuals are represented in the skeletal sample. Radiocarbon dates for Neandertal 1, from which a mtDNA sequence was determined in 1997, and a second individual indicate an age of ≈40,000 yr for both. mtDNA analysis on the same second individual yields a sequence that clusters with other published Neandertal sequences. PMID:12232049

Some links on this page may take you to non-federal websites. Their policies may differ from this site.