silico sequence analysis: Topics by Science.gov

Sample records for silico sequence analysis

The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies.

PubMed

Yoshida, Catherine E; Kruczkiewicz, Peter; Laing, Chad R; Lingohr, Erika J; Gannon, Victor P J; Nash, John H E; Taboada, Eduardo N

2016-01-01

For nearly 100 years serotyping has been the gold standard for the identification of Salmonella serovars. Despite the increasing adoption of DNA-based subtyping approaches, serotype information remains a cornerstone in food safety and public health activities aimed at reducing the burden of salmonellosis. At the same time, recent advances in whole-genome sequencing (WGS) promise to revolutionize our ability to perform advanced pathogen characterization in support of improved source attribution and outbreak analysis. We present the Salmonella In Silico Typing Resource (SISTR), a bioinformatics platform for rapidly performing simultaneous in silico analyses for several leading subtyping methods on draft Salmonella genome assemblies. In addition to performing serovar prediction by genoserotyping, this resource integrates sequence-based typing analyses for: Multi-Locus Sequence Typing (MLST), ribosomal MLST (rMLST), and core genome MLST (cgMLST). We show how phylogenetic context from cgMLST analysis can supplement the genoserotyping analysis and increase the accuracy of in silico serovar prediction to over 94.6% on a dataset comprised of 4,188 finished genomes and WGS draft assemblies. In addition to allowing analysis of user-uploaded whole-genome assemblies, the SISTR platform incorporates a database comprising over 4,000 publicly available genomes, allowing users to place their isolates in a broader phylogenetic and epidemiological context. The resource incorporates several metadata driven visualizations to examine the phylogenetic, geospatial and temporal distribution of genome-sequenced isolates. As sequencing of Salmonella isolates at public health laboratories around the world becomes increasingly common, rapid in silico analysis of minimally processed draft genome assemblies provides a powerful approach for molecular epidemiology in support of public health investigations. Moreover, this type of integrated analysis using multiple sequence-based methods of sub-typing allows for continuity with historical serotyping data as we transition towards the increasing adoption of genomic analyses in epidemiology. The SISTR platform is freely available on the web at https://lfz.corefacility.ca/sistr-app/.
[Prediction of ETA oligopeptides antagonists from Glycine max based on in silico proteolysis].

PubMed

Qiao, Lian-Sheng; Jiang, Lu-di; Luo, Gang-Gang; Lu, Fang; Chen, Yan-Kun; Wang, Ling-Zhi; Li, Gong-Yu; Zhang, Yan-Ling

2017-02-01

Oligopeptides are one of the the key pharmaceutical effective constituents of traditional Chinese medicine(TCM). Systematic study on composition and efficacy of TCM oligopeptides is essential for the analysis of material basis and mechanism of TCM. In this study, the potential anti-hypertensive oligopeptides from Glycine max and their endothelin receptor A (ETA) antagonistic activity were discovered and predicted based on in silico technologies.Main protein sequences of G. max were collected and oligopeptides were obtained using in silico gastrointestinal tract proteolysis. Then, the pharmacophore of ETA antagonistic peptides was constructed and included one hydrophobic feature, one ionizable negative feature, one ring aromatic feature and five excluded volumes. Meanwhile, three-dimensional structure of ETA was developed by homology modeling methods for further docking studies. According to docking analysis and consensus score, the key amino acid of GLN165 was identified for ETA antagonistic activity. And 27 oligopeptides from G. max were predicted as the potential ETA antagonists by pharmacophore and docking studies.In silico proteolysis could be used to analyze the protein sequences from TCM. According to combination of in silico proteolysis and molecular simulation, the biological activities of oligopeptides could be predicted rapidly based on the known TCM protein sequence. It might provide the methodology basis for rapidly and efficiently implementing the mechanism analysis of TCM oligopeptides. Copyright© by the Chinese Pharmaceutical Association.
Whole-exome sequencing analysis of Waardenburg syndrome in a Chinese family.

PubMed

Chen, Dezhong; Zhao, Na; Wang, Jing; Li, Zhuoyu; Wu, Changxin; Fu, Jie; Xiao, Han

2017-01-01

Waardenburg syndrome (WS) is a dominantly inherited, genetically heterogeneous auditory-pigmentary syndrome characterized by non-progressive sensorineural hearing loss and iris discoloration. By whole-exome sequencing (WES), we identified a nonsense mutation (c.598C>T) in PAX3 gene, predicted to be disease causing by in silico analysis. This is the first report of genetically diagnosed case of WS PAX3 c.598C>T nonsense mutation in Chinese ethnic origin by WES and in silico functional prediction methods.
Whole-exome sequencing analysis of Waardenburg syndrome in a Chinese family

PubMed Central

Chen, Dezhong; Zhao, Na; Wang, Jing; Li, Zhuoyu; Wu, Changxin; Fu, Jie; Xiao, Han

2017-01-01

Waardenburg syndrome (WS) is a dominantly inherited, genetically heterogeneous auditory-pigmentary syndrome characterized by non-progressive sensorineural hearing loss and iris discoloration. By whole-exome sequencing (WES), we identified a nonsense mutation (c.598C>T) in PAX3 gene, predicted to be disease causing by in silico analysis. This is the first report of genetically diagnosed case of WS PAX3 c.598C>T nonsense mutation in Chinese ethnic origin by WES and in silico functional prediction methods. PMID:28690861
When Genomics Is Not Enough: Experimental Evidence for a Decrease in LINE-1 Activity During the Evolution of Australian Marsupials

PubMed Central

Gallus, Susanne; Lammers, Fritjof

2016-01-01

The autonomous transposable element LINE-1 is a highly abundant element that makes up between 15% and 20% of therian mammal genomes. Since their origin before the divergence of marsupials and placental mammals, LINE-1 elements have contributed actively to the genome landscape. A previous in silico screen of the Tasmanian devil genome revealed a lack of functional coding LINE-1 sequences. In this study we present the results of an in vitro analysis from a partial LINE-1 reverse transcriptase coding sequence in five marsupial species. Our experimental screen supports the in silico findings of the genome-wide degradation of LINE-1 sequences in the Tasmanian devil, and identifies a high frequency of degraded LINE-1 sequences in other Australian marsupials. The comparison between the experimentally obtained LINE-1 sequences and reference genome assemblies suggests that conclusions from in silico analyses of retrotransposition activity can be influenced by incomplete genome assemblies from short reads. PMID:27389686
Purification, developmental expression, and in silico characterization of α-amylase inhibitor from Echinochloa frumentacea.

PubMed

Panwar, Priyankar; Verma, A K; Dubey, Ashutosh

2018-05-01

Barnyard ( Echinochloa frumentacea ) and finger ( Eleusine coracana ) millet growing at northwestern Himalaya were explored for the α-amylase inhibitor (α-AI). The mature seeds of barnyard millet variety PRJ1 had maximum α-AI activity which increases in different developmental stage. α-AI was purified up to 22.25-fold from barnyard millet variety PRJ1. Semi-quantitative PCR of different developmental stages of barnyard millet seeds showed increased levels of the transcript from 7 to 28 days. Sequence analysis revealed that it contained 315 bp nucleotide which encodes 104 amino acid sequence with molecular weight 10.72 kDa. The predicted 3D structure of α-AI was 86.73% similar to a bifunctional inhibitor of ragi. In silico analysis of 71 α-AI protein sequences were carried out for biochemical features, homology search, multiple sequence alignment, phylogenetic tree construction, motif, and superfamily distribution of protein sequences. Analysis of multiple sequence alignment revealed the existence of conserved regions NPLP[S/G]CRWYVV[S/Q][Q/R]TCG[V/I] throughout sequences. Superfam analysis revealed that α-AI protein sequences were distributed among seven different superfamilies.
Rapid in silico cloning of genes using expressed sequence tags (ESTs).

PubMed

Gill, R W; Sanseau, P

2000-01-01

Expressed sequence tags (ESTs) are short single-pass DNA sequences obtained from either end of cDNA clones. These ESTs are derived from a vast number of cDNA libraries obtained from different species. Human ESTs are the bulk of the data and have been widely used to identify new members of gene families, as markers on the human chromosomes, to discover polymorphism sites and to compare expression patterns in different tissues or pathologies states. Information strategies have been devised to query EST databases. Since most of the analysis is performed with a computer, the term "in silico" strategy has been coined. In this chapter we will review the current status of EST databases, the pros and cons of EST-type data and describe possible strategies to retrieve meaningful information.
Evaluation of a Genome-Scale In Silico Metabolic Model for Geobacter metallireducens Using Proteomic Data from a Field Biostimulation Experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fang, Yilin; Wilkins, Michael J.; Yabusaki, Steven B.

2012-12-12

Biomass and shotgun global proteomics data that reflected relative protein abundances from samples collected during the 2008 experiment at the U.S. Department of Energy Integrated Field-Scale Subsurface Research Challenge site in Rifle, Colorado, provided an unprecedented opportunity to validate a genome-scale metabolic model of Geobacter metallireducens and assess its performance with respect to prediction of metal reduction, biomass yield, and growth rate under dynamic field conditions. Reconstructed from annotated genomic sequence, biochemical, and physiological data, the constraint-based in silico model of G. metallireducens relates an annotated genome sequence to the physiological functions with 697 reactions controlled by 747 enzyme-coding genes.more » Proteomic analysis showed that 180 of the 637 G. metallireducens proteins detected during the 2008 experiment were associated with specific metabolic reactions in the in silico model. When the field-calibrated Fe(III) terminal electron acceptor process reaction in a reactive transport model for the field experiments was replaced with the genome-scale model, the model predicted that the largest metabolic fluxes through the in silico model reactions generally correspond to the highest abundances of proteins that catalyze those reactions. Central metabolism predicted by the model agrees well with protein abundance profiles inferred from proteomic analysis. Model discrepancies with the proteomic data, such as the relatively low fluxes through amino acid transport and metabolism, revealed pathways or flux constraints in the in silico model that could be updated to more accurately predict metabolic processes that occur in the subsurface environment.« less
In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

PubMed Central

Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

2008-01-01

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319
In silico prediction of the pathogenic effect of a novel variant of BCKDHA leading to classical maple syrup urine disease identified using clinical exome sequencing.

PubMed

Fernández-Lainez, Cynthia; Aláez-Verson, Carmen; Ibarra-González, Isabel; Enríquez-Flores, Sergio; Carrillo-Sanchez, Karol; Flores-Lagunes, Leonardo; Guillén-López, Sara; Belmont-Martínez, Leticia; Vela-Amieva, Marcela

2018-04-16

Maple syrup urine disease (MSUD) is a metabolic disorder caused by mutations in three of the branched-chain α-keto acid dehydrogenase complex (BCKDC) genes. Classical MSUD symptom can be observed immediately after birth and include ketoacidosis, irritability, lethargy, and coma, which can lead to death or irreversible neurodevelopmental delay in survivors. The molecular diagnosis of MSUD can be time-consuming and difficult to establish using conventional Sanger sequencing because it could be due to pathogenic variants of any of the BCKDC genes. Next-generation sequencing-based methodologies have revolutionized the molecular diagnosis of inborn errors in metabolism and offer a superior approach for genotyping these patients. Here, we report an MSUD case whose molecular diagnosis was performed by clinical exome sequencing (CES), and the possible structural pathogenic effect of a novel E1α subunit pathogenic variant was analyzed using in silico analysis of α and β subunit crystallographic structure. Molecular analysis revealed a new homozygous non-sense c.1267C>T or p.Gln423Ter variant of BCKDHA. The novel BCKDHA variant is considered pathogenic because it caused a premature stop codon that probably led to the loss of the last 22 amino acid residues of the E1α subunit C-terminal end. In silico analysis of this region showed that it is in contact with several residues of the E1β subunit mainly through polar contacts, hydrogen bonds, and hydrophobic interactions. CES strategy could benefit the patients and families by offering precise and prompt diagnosis and better genetic counseling. Copyright © 2018 Elsevier B.V. All rights reserved.
A comprehensive characterization of rare mitochondrial DNA variants in neuroblastoma.

PubMed

Calabrese, Francesco Maria; Clima, Rosanna; Pignataro, Piero; Lasorsa, Vito Alessandro; Hogarty, Michael D; Castellano, Aurora; Conte, Massimo; Tonini, Gian Paolo; Iolascon, Achille; Gasparre, Giuseppe; Capasso, Mario

2016-08-02

Neuroblastoma, a tumor of the developing sympathetic nervous system, is a common childhood neoplasm that is often lethal. Mitochondrial DNA (mtDNA) mutations have been found in most tumors including neuroblastoma. We extracted mtDNA data from a cohort of neuroblastoma samples that had undergone Whole Exome Sequencing (WES) and also used snap-frozen samples in which mtDNA was entirely sequenced by Sanger technology. We next undertook the challenge of determining those mutations that are relevant to, or arisen during tumor development. The bioinformatics pipeline used to extract mitochondrial variants from matched tumor/blood samples was enriched by a set of filters inclusive of heteroplasmic fraction, nucleotide variability, and in silico prediction of pathogenicity. Our in silico multistep workflow applied both on WES and Sanger-sequenced neuroblastoma samples, allowed us to identify a limited burden of somatic and germline mitochondrial mutations with a potential pathogenic impact. The few singleton germline and somatic mitochondrial mutations emerged, according to our in silico analysis, do not appear to impact on the development of neuroblastoma. Our findings are consistent with the hypothesis that most mitochondrial somatic mutations can be considered as 'passengers' and consequently have no discernible effect in this type of cancer.
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaofan; Peris, David; Kominek, Jacek

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE PAGES

Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

2016-09-16

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies

USDA-ARS?s Scientific Manuscript database

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding it...
Isolation and characterization of full-length putative alcohol dehydrogenase genes from polygonum minus

NASA Astrophysics Data System (ADS)

Hamid, Nur Athirah Abd; Ismail, Ismanizan

2013-11-01

Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

DOE PAGES

Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.; ...

2017-04-09

The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less
Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.

The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less
Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach.

PubMed

Musumeci, Matías A; Lozada, Mariana; Rial, Daniela V; Mac Cormack, Walter P; Jansson, Janet K; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M

2017-04-09

The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.
Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

PubMed Central

Musumeci, Matías A.; Lozada, Mariana; Rial, Daniela V.; Mac Cormack, Walter P.; Jansson, Janet K.; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M.

2017-01-01

The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer–Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments. PMID:28397770
In silico study of breast cancer associated gene 3 using LION Target Engine and other tools.

PubMed

León, Darryl A; Cànaves, Jaume M

2003-12-01

Sequence analysis of individual targets is an important step in annotation and validation. As a test case, we investigated human breast cancer associated gene 3 (BCA3) with LION Target Engine and with other bioinformatics tools. LION Target Engine confirmed that the BCA3 gene is located on 11p15.4 and that the two most likely splice variants (lacking exon 3 and exons 3 and 5, respectively) exist. Based on our manual curation of sequence data, it is proposed that an additional variant (missing only exon 5) published in a public sequence repository, is a prediction artifact. A significant number of new orthologs were also identified, and these were the basis for a high-quality protein secondary structure prediction. Moreover, our research confirmed several distinct functional domains as described in earlier reports. Sequence conservation from multiple sequence alignments, splice variant identification, secondary structure predictions, and predicted phosphorylation sites suggest that the removal of interaction sites through alternative splicing might play a modulatory role in BCA3. This in silico approach shows the depth and relevance of an analysis that can be accomplished by including a variety of publicly available tools with an integrated and customizable life science informatics platform.

Bioinformatics Identification of Modules of Transcription Factor Binding Sites in Alzheimer's Disease-Related Genes by In Silico Promoter Analysis and Microarrays

PubMed Central

Augustin, Regina; Lichtenthaler, Stefan F.; Greeff, Michael; Hansen, Jens; Wurst, Wolfgang; Trümbach, Dietrich

2011-01-01

The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases. PMID:21559189
SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

PubMed

Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade

2018-01-01

Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.
A comprehensive characterization of rare mitochondrial DNA variants in neuroblastoma

PubMed Central

Pignataro, Piero; Lasorsa, Vito Alessandro; Hogarty, Michael D.; Castellano, Aurora; Conte, Massimo; Tonini, Gian Paolo; Iolascon, Achille; Gasparre, Giuseppe; Capasso, Mario

2016-01-01

Background Neuroblastoma, a tumor of the developing sympathetic nervous system, is a common childhood neoplasm that is often lethal. Mitochondrial DNA (mtDNA) mutations have been found in most tumors including neuroblastoma. We extracted mtDNA data from a cohort of neuroblastoma samples that had undergone Whole Exome Sequencing (WES) and also used snap-frozen samples in which mtDNA was entirely sequenced by Sanger technology. We next undertook the challenge of determining those mutations that are relevant to, or arisen during tumor development. The bioinformatics pipeline used to extract mitochondrial variants from matched tumor/blood samples was enriched by a set of filters inclusive of heteroplasmic fraction, nucleotide variability, and in silico prediction of pathogenicity. Results Our in silico multistep workflow applied both on WES and Sanger-sequenced neuroblastoma samples, allowed us to identify a limited burden of somatic and germline mitochondrial mutations with a potential pathogenic impact. Conclusions The few singleton germline and somatic mitochondrial mutations emerged, according to our in silico analysis, do not appear to impact on the development of neuroblastoma. Our findings are consistent with the hypothesis that most mitochondrial somatic mutations can be considered as ‘passengers’ and consequently have no discernible effect in this type of cancer. PMID:27351283
In silico prediction of splice-altering single nucleotide variants in the human genome.

PubMed

Jian, Xueqiu; Boerwinkle, Eric; Liu, Xiaoming

2014-12-16

In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
In silico characterization of a novel pathogenic deletion mutation identified in XPA gene in a Pakistani family with severe xeroderma pigmentosum.

PubMed

Nasir, Muhammad; Ahmad, Nafees; Sieber, Christian M K; Latif, Amir; Malik, Salman Akbar; Hameed, Abdul

2013-09-24

Xeroderma Pigmentosum (XP) is a rare skin disorder characterized by skin hypersensitivity to sunlight and abnormal pigmentation. The aim of this study was to investigate the genetic cause of a severe XP phenotype in a consanguineous Pakistani family and in silico characterization of any identified disease-associated mutation. The XP complementation group was assigned by genotyping of family for known XP loci. Genotyping data mapped the family to complementation group A locus, involving XPA gene. Mutation analysis of the candidate XP gene by DNA sequencing revealed a novel deletion mutation (c.654del A) in exon 5 of XPA gene. The c.654del A, causes frameshift, which pre-maturely terminates protein and result into a truncated product of 222 amino acid (aa) residues instead of 273 (p.Lys218AsnfsX5). In silico tools were applied to study the likelihood of changes in structural motifs and thus interaction of mutated protein with binding partners. In silico analysis of mutant protein sequence, predicted to affect the aa residue which attains coiled coil structure. The coiled coil structure has an important role in key cellular interactions, especially with DNA damage-binding protein 2 (DDB2), which has important role in DDB-mediated nucleotide excision repair (NER) system. Our findings support the fact of genetic and clinical heterogeneity in XP. The study also predicts the critical role of DDB2 binding region of XPA protein in NER pathway and opens an avenue for further research to study the functional role of the mutated protein domain.
Isolation and in silico analysis of Fe-superoxide dismutase in the cyanobacterium Nostoc commune.

PubMed

Kesheri, Minu; Kanchan, Swarna; Richa; Sinha, Rajeshwar P

2014-12-15

Cyanobacteria are known to endure various stress conditions due to the inbuilt potential for oxidative stress alleviation owing to the presence of an array of antioxidants. The present study shows that Antarctic cyanobacterium Nostoc commune possesses two antioxidative enzymes viz., superoxide dismutase (SOD) and catalase that jointly cope with environmental stresses prevailing at its natural habitat. Native-PAGE analysis illustrates the presence of a single prominent isoform recognized as Fe-SOD and three distinct isoforms of catalase. The protein sequence of Fe-SOD in N. commune retrieved from NCBI protein sequence database was used for in silico analysis. 3D structure of N. commune was predicted by comparative modeling using MODELLER 9v11. Further, this model was validated for its quality by Ramachandran plot, ERRAT, Verify 3D and ProSA-web which revealed good structure quality of the model. Multiple sequence alignment showed high conservation in N and C-terminal domain regions along with all metal binding positions in Fe-SOD which were also found to be highly conserved in all 28 cyanobacterial species under study, including N. commune. In silico prediction of isoelectric point and molecular weight of Fe-SOD was found to be 5.48 and 22,342.98Da respectively. The phylogenetic tree revealed that among 28 cyanobacterial species, Fe-SOD in N. commune was the closest evolutionary homolog of Fe-SOD in Nostoc punctiforme as evident by strong bootstrap value. Thus, N. commune may serve as a good biological model for studies related to survival of life under extreme conditions prevailing at the Antarctic region. Moreover cyanobacteria may be exploited for biochemical and biotechnological applications of enzymatic antioxidants. Copyright © 2014 Elsevier B.V. All rights reserved.
Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

PubMed

Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

2009-07-14

In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.
New milk protein-derived peptides with potential antimicrobial activity: an approach based on bioinformatic studies.

PubMed

Dziuba, Bartłomiej; Dziuba, Marta

2014-08-20

New peptides with potential antimicrobial activity, encrypted in milk protein sequences, were searched for with the use of bioinformatic tools. The major milk proteins were hydrolyzed in silico by 28 enzymes. The obtained peptides were characterized by the following parameters: molecular weight, isoelectric point, composition and number of amino acid residues, net charge at pH 7.0, aliphatic index, instability index, Boman index, and GRAVY index, and compared with those calculated for known 416 antimicrobial peptides including 59 antimicrobial peptides (AMPs) from milk proteins listed in the BIOPEP database. A simple analysis of physico-chemical properties and the values of biological activity indicators were insufficient to select potentially antimicrobial peptides released in silico from milk proteins by proteolytic enzymes. The final selection was made based on the results of multidimensional statistical analysis such as support vector machines (SVM), random forest (RF), artificial neural networks (ANN) and discriminant analysis (DA) available in the Collection of Anti-Microbial Peptides (CAMP database). Eleven new peptides with potential antimicrobial activity were selected from all peptides released during in silico proteolysis of milk proteins.
New Milk Protein-Derived Peptides with Potential Antimicrobial Activity: An Approach Based on Bioinformatic Studies

PubMed Central

Dziuba, Bartłomiej; Dziuba, Marta

2014-01-01

New peptides with potential antimicrobial activity, encrypted in milk protein sequences, were searched for with the use of bioinformatic tools. The major milk proteins were hydrolyzed in silico by 28 enzymes. The obtained peptides were characterized by the following parameters: molecular weight, isoelectric point, composition and number of amino acid residues, net charge at pH 7.0, aliphatic index, instability index, Boman index, and GRAVY index, and compared with those calculated for known 416 antimicrobial peptides including 59 antimicrobial peptides (AMPs) from milk proteins listed in the BIOPEP database. A simple analysis of physico-chemical properties and the values of biological activity indicators were insufficient to select potentially antimicrobial peptides released in silico from milk proteins by proteolytic enzymes. The final selection was made based on the results of multidimensional statistical analysis such as support vector machines (SVM), random forest (RF), artificial neural networks (ANN) and discriminant analysis (DA) available in the Collection of Anti-Microbial Peptides (CAMP database). Eleven new peptides with potential antimicrobial activity were selected from all peptides released during in silico proteolysis of milk proteins. PMID:25141106
In silico search, characterization and validation of new EST-SSR markers in the genus Prunus.

PubMed

Sorkheh, Karim; Prudencio, Angela S; Ghebinejad, Azim; Dehkordi, Mehrana Kohei; Erogul, Deniz; Rubio, Manuel; Martínez-Gómez, Pedro

2016-07-07

Simple sequence repeats (SSRs) are defined as sequence repeat units between 1 and 6 bp that occur in both coding and non-coding regions abundant in eukaryotic genomes, which may affect the expression of genes. In this study, expressed sequence tags (ESTs) of eight Prunus species were analyzed for in silico mining of EST-SSRs, protein annotation, and open reading frames (ORFs), and the identification of codon repetitions. A total of 316 SSRs were identified using MISA software. Dinucleotide SSR motifs (26.31 %) were found to be the most abundant type of repeats, followed by tri- (14.58 %), tetra- (0.53 %), and penta- (0.27 %) nucleotide motifs. An attempt was made to design primer pairs for 316 identified SSRs but these were successful for only 175 SSR sequences. The positions of SSRs with respect to ORFs were detected, and annotation of sequences containing SSRs was performed to assign function to each sequence. SSRs were also characterized (in terms of position in the reference genome and associated gene) using the two available Prunus reference genomes (mei and peach). Finally, 38 SSR markers were validated across peach, almond, plum, and apricot genotypes. This validation showed a higher transferability level of EST-SSR developed in P. mume (mei) in comparison with the rest of species analyzed. Findings will aid analysis of functionally important molecular markers and facilitate the analysis of genetic diversity.
Cytochrome C oxydase deficiency: SURF1 gene investigation in patients with Leigh syndrome.

PubMed

Maalej, Marwa; Kammoun, Thouraya; Alila-Fersi, Olfa; Kharrat, Marwa; Ammar, Marwa; Felhi, Rahma; Mkaouar-Rebai, Emna; Keskes, Leila; Hachicha, Mongia; Fakhfakh, Faiza

2018-03-18

Leigh syndrome (LS) is a rare progressive neurodegenerative disorder occurring in infancy. The most common clinical signs reported in LS are growth retardation, optic atrophy, ataxia, psychomotor retardation, dystonia, hypotonia, seizures and respiratory disorders. The paper reported a manifestation of 3 Tunisian patients presented with LS syndrome. The aim of this study is the MT[HYPHEN]ATP6 and SURF1 gene screening in Tunisian patients affected with classical Leigh syndrome and the computational investigation of the effect of detected mutations on its structure and functions by clinical and bioinformatics analyses. After clinical investigations, three Tunisian patients were tested for mutations in both MT-ATP6 and SURF1 genes by direct sequencing followed by in silico analyses to predict the effects of sequence variation. The result of mutational analysis revealed the absence of mitochondrial mutations in MT-ATP6 gene and the presence of a known homozygous splice site mutation c.516-517delAG in sibling patients added to the presence of a novel double het mutations in LS patient (c.752-18 A > C/c. c.751 + 16G > A). In silico analyses of theses intronic variations showed that it could alters splicing processes as well as SURF1 protein translation. Leigh syndrome (LS) is a rare progressive neurodegenerative disorder occurring in infancy. The most common clinical signs reported in LS are growth retardation, optic atrophy, ataxia, psychomotor retardation, dystonia, hypotonia, seizures and respiratory disorders. The paper reported a manifestation of 3 Tunisian patients presented with LS syndrome. The aim of this study is MT-ATP6 and SURF1 genes screening in Tunisian patients affected with classical Leigh syndrome and the computational investigation of the effect of detected mutations on its structure and functions. After clinical investigations, three Tunisian patients were tested for mutations in both MT-ATP6 and SURF1 genes by direct sequencing followed by in silico analysis to predict the effects of sequence variation. The result of mutational analysis revealed the absence of mitochondrial mutations in MT-ATP6 gene and the presence of a known homozygous splice site mutation c.516-517delAG in sibling patients added to the presence of a novel double het mutations in LS patient (c.752-18 A>C/ c.751+16G>A). In silico analysis of theses intronic vaiations showed that it could alters splicing processes as well as SURF1 protein translation. Copyright © 2018 Elsevier Inc. All rights reserved.
In silico characterization of a novel pathogenic deletion mutation identified in XPA gene in a Pakistani family with severe xeroderma pigmentosum

PubMed Central

2013-01-01

Background Xeroderma Pigmentosum (XP) is a rare skin disorder characterized by skin hypersensitivity to sunlight and abnormal pigmentation. The aim of this study was to investigate the genetic cause of a severe XP phenotype in a consanguineous Pakistani family and in silico characterization of any identified disease-associated mutation. Results The XP complementation group was assigned by genotyping of family for known XP loci. Genotyping data mapped the family to complementation group A locus, involving XPA gene. Mutation analysis of the candidate XP gene by DNA sequencing revealed a novel deletion mutation (c.654del A) in exon 5 of XPA gene. The c.654del A, causes frameshift, which pre-maturely terminates protein and result into a truncated product of 222 amino acid (aa) residues instead of 273 (p.Lys218AsnfsX5). In silico tools were applied to study the likelihood of changes in structural motifs and thus interaction of mutated protein with binding partners. In silico analysis of mutant protein sequence, predicted to affect the aa residue which attains coiled coil structure. The coiled coil structure has an important role in key cellular interactions, especially with DNA damage-binding protein 2 (DDB2), which has important role in DDB-mediated nucleotide excision repair (NER) system. Conclusions Our findings support the fact of genetic and clinical heterogeneity in XP. The study also predicts the critical role of DDB2 binding region of XPA protein in NER pathway and opens an avenue for further research to study the functional role of the mutated protein domain. PMID:24063568
Approaches for in silico finishing of microbial genome sequences

PubMed Central

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

2017-01-01

Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352
Approaches for in silico finishing of microbial genome sequences.

PubMed

Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva

The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Reliable differentiation of Meyerozyma guilliermondii from Meyerozyma caribbica by internal transcribed spacer restriction fingerprinting.

PubMed

Romi, Wahengbam; Keisam, Santosh; Ahmed, Giasuddin; Jeyaram, Kumaraswamy

2014-02-28

Meyerozyma guilliermondii (anamorph Candida guilliermondii) and Meyerozyma caribbica (anamorph Candida fermentati) are closely related species of the genetically heterogenous M. guilliermondii complex. Conventional phenotypic methods frequently misidentify the species within this complex and also with other species of the Saccharomycotina CTG clade. Even the long-established sequencing of large subunit (LSU) rRNA gene remains ambiguous. We also faced similar problem during identification of yeast isolates of M. guilliermondii complex from indigenous bamboo shoot fermentation in North East India. There is a need for development of reliable and accurate identification methods for these closely related species because of their increasing importance as emerging infectious yeasts and associated biotechnological attributes. We targeted the highly variable internal transcribed spacer (ITS) region (ITS1-5.8S-ITS2) and identified seven restriction enzymes through in silico analysis for differentiating M. guilliermondii from M. caribbica. Fifty five isolates of M. guilliermondii complex which could not be delineated into species-specific taxonomic ranks by API 20 C AUX and LSU rRNA gene D1/D2 sequencing were subjected to ITS-restriction fragment length polymorphism (ITS-RFLP) analysis. TaqI ITS-RFLP distinctly differentiated the isolates into M. guilliermondii (47 isolates) and M. caribbica (08 isolates) with reproducible species-specific patterns similar to the in silico prediction. The reliability of this method was validated by ITS1-5.8S-ITS2 sequencing, mitochondrial DNA RFLP and electrophoretic karyotyping. We herein described a reliable ITS-RFLP method for distinct differentiation of frequently misidentified M. guilliermondii from M. caribbica. Even though in silico analysis differentiated other closely related species of M. guilliermondii complex from the above two species, it is yet to be confirmed by in vitro analysis using reference strains. This method can be used as a reliable tool for rapid and accurate identification of closely related species of M. guilliermondii complex and for differentiating emerging infectious yeasts of the Saccharomycotina CTG clade.
Transcript variations, phylogenetic tree and chromosomal localization of porcine aryl hydrocarbon receptor (AhR) and AhR nuclear translocator (ARNT) genes.

PubMed

Sadowska, Agnieszka; Paukszto, Lukasz; Nynca, Anna; Szczerbal, Izabela; Orlowska, Karina; Swigonska, Sylwia; Ruszkowska, Monika; Molcan, Tomasz; Jastrzebski, Jan P; Panasiewicz, Grzegorz; Ciereszko, Renata E

2017-03-01

Aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor best known for mediating xenobiotic-induced toxicity. AhR requires aryl hydrocarbon receptor nuclear translocator (ARNT) to form an active transcription complex and promote the activation of genes which have dioxin responsive element in their regulatory regions. The present study was performed to determine the complete cDNA sequences of porcine AhR and ARNT genes and their chromosomal localization. Total RNA from porcine livers were used to obtain the sequence of the entire porcine transcriptome by next-generation sequencing (NGS; lllumina HiSeq2500). In addition, both, in silico analysis and fluorescence in situ hybridization (FISH) were used to determine chromosomal localization of porcine AhR and ARNT genes. In silico analysis of nucleotide sequences showed that there were two transcript variants of AhR and ARNT genes in the pig. In addition, computer analysis revealed that AhR gene in the pig is located on chromosome 9 and ARNT on chromosome 4. The results of FISH experiment confirmed the localization of porcine AhR and ARNT genes. In the present study, for the first time, the full cDNAs of AhR and ARNT were demonstrated in the pig. In future, it would be interesting to determine the tissue distribution of AhR and ARNT transcript variants in the pig and to test whether these variants are associated with different biological functions and/or different activation pathways.
Exploring internal features of 16S rRNA gene for identification of clinically relevant species of the genus Streptococcus

PubMed Central

2011-01-01

Background Streptococcus is an economically important genus as a number of species belonging to this genus are human and animal pathogens. The genus has been divided into different groups based on 16S rRNA gene sequence similarity. The variability observed among the members of these groups is low and it is difficult to distinguish them. The present study was taken up to explore 16S rRNA gene sequence to develop methods that can be used for preliminary identification and can supplement the existing methods for identification of clinically-relevant isolates of the genus Streptococcus. Methods 16S rRNA gene sequences belonging to the isolates of S. dysgalactiae, S. equi, S. pyogenes, S. agalactiae, S. bovis, S. gallolyticus, S. mutans, S. sobrinus, S. mitis, S. pneumoniae, S. thermophilus and S. anginosus were analyzed with the purpose to define genetic variability within each species to generate a phylogenetic framework, to identify species-specific signatures and in-silico restriction enzyme analysis. Results The framework based analysis was used to segregate Streptococcus spp. previously identified upto genus level. This segregation was validated using species-specific signatures and in-silico restriction enzyme analysis. 43 uncharacterized Streptococcus spp. could be identified using this approach. Conclusions The markers generated exploring 16S rRNA gene sequences provided useful tool that can be further used for identification of different species of the genus Streptococcus. PMID:21702978
In-silico mining, type and frequency analysis of genic microsatellites of finger millet (Eleusine coracana (L.) Gaertn.): a comparative genomic analysis of NBS-LRR regions of finger millet with rice.

PubMed

Kalyana Babu, B; Pandey, Dinesh; Agrawal, P K; Sood, Salej; Kumar, Anil

2014-05-01

In recent years, the increased availability of the DNA sequences has given the possibility to develop and explore the expressed sequence tags (ESTs) derived SSR markers. In the present study, a total of 1956 ESTs of finger millet were used to find the microsatellite type, distribution, frequency and developed a total of 545 primer pairs from the ESTs of finger millet. Thirty-two EST sequences had more than two microsatellites and 1357 sequences did not have any SSR repeats. The most frequent type of repeats was trimeric motif, however the second place was occupied by dimeric motif followed by tetra-, hexa- and penta repeat motifs. The most common dimer repeat motif was GA and in case of trimeric SSRs, it was CGG. The EST sequences of NBS-LRR region of finger millet and rice showed higher synteny and were found on nearly same positions on the rice chromosome map. A total of eight, out of 15 EST based SSR primers were polymorphic among the selected resistant and susceptible finger millet genotypes. The primer FMBLEST5 could able to differentiate them into resistant and susceptible genotypes. The alleles specific to the resistant and susceptible genotypes were sequenced using the ABI 3130XL genetic analyzer and found similarity to NBS-LRR regions of rice and finger millet and contained the characteristic kinase-2 and kinase 3a motifs of plant R-genes belonged to NBS-LRR region. The In-silico and comparative analysis showed that the genes responsible for blast resistance can be identified, mapped and further introgressed through molecular breeding approaches for enhancing the blast resistance in finger millet.
in silico identification of cross affinity towards Cry1Ac pesticidal protein with receptor enzyme in Bos taurus and sequence, structure analysis of crystal proteins for stability.

PubMed

Ebenezer, King Solomon; Nachimuthu, Ramesh; Thiagarajan, Prabha; Velu, Rajesh Kannan

2013-01-01

Any novel protein introduced into the GM crops need to be evaluated for cross affinity on living organisms. Many researchers are currently focusing on the impact of Bacillus thuringiensis cotton on soil and microbial diversity by field experiments. In spite of this, in silico approach might be helpful to elucidate the impact of cry genes. The crystal a protein which was produced by Bt at the time of sporulation has been used as a biological pesticide to target the insectivorous pests like Cry1Ac for Helicoverpa armigera and Cry2Ab for Spodoptera sp. and Heliothis sp. Here, we present the comprehensive in silico analysis of Cry1Ac and Cry2Ab proteins with available in silico tools, databases and docking servers. Molecular docking of Cry1Ac with procarboxypeptidase from Helicoverpa armigera and Cry1Ac with Leucine aminopeptidase from Bos taurus has showed the 125(th) amino acid position to be the preference site of Cry1Ac protein. The structures were compared with each other and it showed 5% of similarity. The cross affinity of this toxin that have confirmed the earlier reports of ill effects of Bt cotton consumed by cattle.
JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms

PubMed Central

Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

2015-01-01

The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/ PMID:26424080

JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

PubMed

Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

2015-01-01

The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. © The Author(s) 2015. Published by Oxford University Press.
[Molecular cloning and characterization in silico of phospholipase A(2) transcript isolated from Lachesis muta peruvian snake venom].

PubMed

Jimenez, Karim L; Zavaleta, Amparo I; Izaguirre, Victor; Yarleque, Armando; Inga, Rosio R

2010-01-01

Isolate and characterize in silico gene phospholipase A(2) (PLA(2)) isolated from Lachesis muta venom of the Peruvian Amazon. Technique RT-PCR from total RNA was using specific primers, the amplified DNA product was inserted into the pGEM vector for subsequent sequencing. By bioinformatic analysis identified an open reading frame of 414 nucleotides that encoded 138 amino acids including a signal peptide of 16 aminoacids, molecular weight and pI were 13,976 kDa and 5.66 respectively. The aminoacid sequence was called Lm-PLA(2)-Peru, contains an aspartate at position 49, this aminoacid in conjunction with other conserved residues such as Tyr-28, Gly-30, Gly-32, His-48, Tyr52, Asp99 are important for enzymatic activity. The comparison with the amino acid sequence data banks showed of similarity between PLA(2) from Lachesis stenophrys (93%) and other PLA(2) snake venoms and over 80% of other sPLA(2) family Viperidae venoms. A phylogenetic analysis showed that Lm-PLA(2)-Peru grouped with other acidic [Asp(49)] sPLA(2) previously isolated from Bothriechis schlegelii venom showing 89 % nucleotide sequence identity. Finally, the computer modeling indicated that enzyme had the characteristic structure of sPLA(2) group II that consisted of three α-helices, a β-wing, a short helix and a calcium-binding loop. The nucleotide sequence corresponding to the first transcript of gene from PLA(2) cloned of Lachesis muta venom, snake from the Peruvian rainforest.
Characterisation and In Silico Analysis of Interleukin-4 cDNA of Nilgai (Boselaphus tragocamelus) and Indian Buffalo (Bubalus bubalis)

PubMed Central

Saini, M.; Palai, T. K.; Das, D. K.; Hatle, K. M.; Gupta, P. K.

2013-01-01

Interleukin-4 (IL-4) produced from Th2 cells modulates both innate and adaptive immune responses. It is a common belief that wild animals possess better immunity against diseases than domestic and laboratory animals; however, the immune system of wild animals is not fully explored yet. Therefore, a comparative study was designed to explore the wildlife immunity through characterisation of IL-4 cDNA of nilgai, a wild ruminant, and Indian buffalo, a domestic ruminant. Total RNA was extracted from peripheral blood mononuclear cells of nilgai and Indian buffalo and reverse transcribed into cDNA. Respective cDNA was further cloned and sequenced. Sequences were analysed in silico and compared with their homologues available at GenBank. The deduced 135 amino acid protein of nilgai IL-4 is 95.6% similar to that of Indian buffalo. N-linked glycosylation sequence, leader sequence, Cysteine residues in the signal peptide region, and 3′ UTR of IL-4 were found to be conserved across species. Six nonsynonymous nucleotide substitutions were found in Indian buffalo compared to nilgai amino acid sequence. Tertiary structure of this protein in both species was modeled, and it was found that this protein falls under 4-helical cytokines superfamily and short chain cytokine family. Phylogenetic analysis revealed a single cluster of ruminants including both nilgai and Indian buffalo that was placed distinct from other nonruminant mammals. PMID:24348167
Spliced synthetic genes as internal controls in RNA sequencing experiments.

PubMed

Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

2016-09-01

RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Evaluation of a genome-scale in silico metabolic model for Geobacter metallireducens by using proteomic data from a field biostimulation experiment.

PubMed

Fang, Yilin; Wilkins, Michael J; Yabusaki, Steven B; Lipton, Mary S; Long, Philip E

2012-12-01

Accurately predicting the interactions between microbial metabolism and the physical subsurface environment is necessary to enhance subsurface energy development, soil and groundwater cleanup, and carbon management. This study was an initial attempt to confirm the metabolic functional roles within an in silico model using environmental proteomic data collected during field experiments. Shotgun global proteomics data collected during a subsurface biostimulation experiment were used to validate a genome-scale metabolic model of Geobacter metallireducens-specifically, the ability of the metabolic model to predict metal reduction, biomass yield, and growth rate under dynamic field conditions. The constraint-based in silico model of G. metallireducens relates an annotated genome sequence to the physiological functions with 697 reactions controlled by 747 enzyme-coding genes. Proteomic analysis showed that 180 of the 637 G. metallireducens proteins detected during the 2008 experiment were associated with specific metabolic reactions in the in silico model. When the field-calibrated Fe(III) terminal electron acceptor process reaction in a reactive transport model for the field experiments was replaced with the genome-scale model, the model predicted that the largest metabolic fluxes through the in silico model reactions generally correspond to the highest abundances of proteins that catalyze those reactions. Central metabolism predicted by the model agrees well with protein abundance profiles inferred from proteomic analysis. Model discrepancies with the proteomic data, such as the relatively low abundances of proteins associated with amino acid transport and metabolism, revealed pathways or flux constraints in the in silico model that could be updated to more accurately predict metabolic processes that occur in the subsurface environment.
In silico gene expression analysis – an overview

PubMed Central

Murray, David; Doran, Peter; MacMathuna, Padraic; Moss, Alan C

2007-01-01

Efforts aimed at deciphering the molecular basis of complex disease are underpinned by the availability of high throughput strategies for the identification of biomolecules that drive the disease process. The completion of the human genome-sequencing project, coupled to major technological developments, has afforded investigators myriad opportunities for multidimensional analysis of biological systems. Nowhere has this research explosion been more evident than in the field of transcriptomics. Affordable access and availability to the technology that supports such investigations has led to a significant increase in the amount of data generated. As most biological distinctions are now observed at a genomic level, a large amount of expression information is now openly available via public databases. Furthermore, numerous computational based methods have been developed to harness the power of these data. In this review we provide a brief overview of in silico methodologies for the analysis of differential gene expression such as Serial Analysis of Gene Expression and Digital Differential Display. The performance of these strategies, at both an operational and result/output level is assessed and compared. The key considerations that must be made when completing an in silico expression analysis are also presented as a roadmap to facilitate biologists. Furthermore, to highlight the importance of these in silico methodologies in contemporary biomedical research, examples of current studies using these approaches are discussed. The overriding goal of this review is to present the scientific community with a critical overview of these strategies, so that they can be effectively added to the tool box of biomedical researchers focused on identifying the molecular mechanisms of disease. PMID:17683638
Antimicrobial Peptides of Meat Origin - An In silico and In vitro Analysis.

PubMed

Keska, Paulina; Stadnik, Joanna

2017-01-01

The aim of this study was to evaluate the antimicrobial activity of meat protein-derived peptides against selected Gram-positive and Gram-negative bacteria. The in silico and in vitro approach was combined to determine the potency of antimicrobial peptides derived from pig (Sus scrofa) and cow (Bos taurus) proteins. The in silico studies consisted of an analysis of the amino acid composition of peptides obtained from the CAMPR database, their molecular weight and other physicochemical properties (isoelectric point, molar extinction coefficient, instability index, aliphatic index, hydropathy index and net charge). The degree of similarity was estimated between the antimicrobial peptide sequences derived from the slaughtered animals and the main meat proteins. Antimicrobial activity of peptides isolated from dry-cured meat products was analysed (in vitro) against two strains of pathogenic bacteria using the disc diffusion method. There was no evidence of growthinhibitory properties of peptides isolated from dry-cured meat products against Escherichia coli K12 ATCC 10798 and Staphylococcus aureus ATCC 25923. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions

PubMed Central

Tang, Rongying; Prosser, Debra O.; Love, Donald R.

2016-01-01

The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions −4 and +7 within consensus splice sites being largely uninformative. PMID:27313609
In-silico identification of miRNAs and their regulating target functions in Ocimum basilicum.

PubMed

Singh, Noopur; Sharma, Ashok

2014-12-01

microRNA is known to play an important role in growth and development of the plants and also in environmental stress. Ocimum basilicum (Basil) is a well known herb for its medicinal properties. In this study, we used in-silico approaches to identify miRNAs and their targets regulating different functions in O. basilicum using EST approach. Additionally, functional annotation, gene ontology and pathway analysis of identified target transcripts were also done. Seven miRNA families were identified. Meaningful regulations of target transcript by identified miRNAs were computationally evaluated. Four miRNA families have been reported by us for the first time from the Lamiaceae. Our results further confirmed that uracil was the predominant base in the first positions of identified mature miRNA sequence, while adenine and uracil were predominant in pre-miRNA sequences. Phylogenetic analysis was carried out to determine the relation between O. basilicum and other plant pre-miRNAs. Thirteen potential targets were evaluated for 4 miRNA families. Majority of the identified target transcripts regulated by miRNAs showed response to stress. miRNA 5021 was also indicated for playing an important role in the amino acid metabolism and co-factor metabolism in this plant. To the best of our knowledge this is the first in silico study describing miRNAs and their regulation in different metabolic pathways of O. basilicum. Copyright © 2014 Elsevier B.V. All rights reserved.
In Silico Functional Networks Identified in Fish Nucleated Red Blood Cells by Means of Transcriptomic and Proteomic Profiling.

PubMed

Puente-Marin, Sara; Nombela, Iván; Ciordia, Sergio; Mena, María Carmen; Chico, Verónica; Coll, Julio; Ortega-Villaizan, María Del Mar

2018-04-09

Nucleated red blood cells (RBCs) of fish have, in the last decade, been implicated in several immune-related functions, such as antiviral response, phagocytosis or cytokine-mediated signaling. RNA-sequencing (RNA-seq) and label-free shotgun proteomic analyses were carried out for in silico functional pathway profiling of rainbow trout RBCs. For RNA-seq, a de novo assembly was conducted, in order to create a transcriptome database for RBCs. For proteome profiling, we developed a proteomic method that combined: (a) fractionation into cytosolic and membrane fractions, (b) hemoglobin removal of the cytosolic fraction, (c) protein digestion, and (d) a novel step with pH reversed-phase peptide fractionation and final Liquid Chromatography Electrospray Ionization Tandem Mass Spectrometric (LC ESI-MS/MS) analysis of each fraction. Combined transcriptome- and proteome- sequencing data identified, in silico, novel and striking immune functional networks for rainbow trout nucleated RBCs, which are mainly linked to innate and adaptive immunity. Functional pathways related to regulation of hematopoietic cell differentiation, antigen presentation via major histocompatibility complex class II (MHCII), leukocyte differentiation and regulation of leukocyte activation were identified. These preliminary findings further implicate nucleated RBCs in immune function, such as antigen presentation and leukocyte activation.
In Silico Functional Networks Identified in Fish Nucleated Red Blood Cells by Means of Transcriptomic and Proteomic Profiling

PubMed Central

Puente-Marin, Sara; Ciordia, Sergio; Mena, María Carmen; Chico, Verónica; Coll, Julio

2018-01-01

Nucleated red blood cells (RBCs) of fish have, in the last decade, been implicated in several immune-related functions, such as antiviral response, phagocytosis or cytokine-mediated signaling. RNA-sequencing (RNA-seq) and label-free shotgun proteomic analyses were carried out for in silico functional pathway profiling of rainbow trout RBCs. For RNA-seq, a de novo assembly was conducted, in order to create a transcriptome database for RBCs. For proteome profiling, we developed a proteomic method that combined: (a) fractionation into cytosolic and membrane fractions, (b) hemoglobin removal of the cytosolic fraction, (c) protein digestion, and (d) a novel step with pH reversed-phase peptide fractionation and final Liquid Chromatography Electrospray Ionization Tandem Mass Spectrometric (LC ESI-MS/MS) analysis of each fraction. Combined transcriptome- and proteome- sequencing data identified, in silico, novel and striking immune functional networks for rainbow trout nucleated RBCs, which are mainly linked to innate and adaptive immunity. Functional pathways related to regulation of hematopoietic cell differentiation, antigen presentation via major histocompatibility complex class II (MHCII), leukocyte differentiation and regulation of leukocyte activation were identified. These preliminary findings further implicate nucleated RBCs in immune function, such as antigen presentation and leukocyte activation. PMID:29642539
Experimental Assessment of Splicing Variants Using Expression Minigenes and Comparison with In Silico Predictions

PubMed Central

Sharma, Neeraj; Sosnay, Patrick R.; Ramalho, Anabela S.; Douville, Christopher; Franca, Arianna; Gottschalk, Laura B.; Park, Jeenah; Lee, Melissa; Vecchio-Pagan, Briana; Raraigh, Karen S.; Amaral, Margarida D.; Karchin, Rachel; Cutting, Garry R.

2015-01-01

Assessment of the functional consequences of variants near splice sites is a major challenge in the diagnostic laboratory. To address this issue, we created expression minigenes (EMGs) to determine the RNA and protein products generated by splice site variants (n = 10) implicated in cystic fibrosis (CF). Experimental results were compared with the splicing predictions of eight in silico tools. EMGs containing the full-length Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) coding sequence and flanking intron sequences generated wild-type transcript and fully processed protein in Human Embryonic Kidney (HEK293) and CF bronchial epithelial (CFBE41o-) cells. Quantification of variant induced aberrant mRNA isoforms was concordant using fragment analysis and pyrosequencing. The splicing patterns of c.1585−1G>A and c.2657+5G>A were comparable to those reported in primary cells from individuals bearing these variants. Bioinformatics predictions were consistent with experimental results for 9/10 variants (MES), 8/10 variants (NNSplice), and 7/10 variants (SSAT and Sroogle). Programs that estimate the consequences of mis-splicing predicted 11/16 (HSF and ASSEDA) and 10/16 (Fsplice and SplicePort) experimentally observed mRNA isoforms. EMGs provide a robust experimental approach for clinical interpretation of splice site variants and refinement of in silico tools. PMID:25066652
Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses

PubMed Central

Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A.; Janke, Axel

2015-01-01

The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. PMID:26019166
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE PAGES

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...

2016-04-12

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
CisSERS: Customizable in silico sequence evaluation for restriction sites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus

High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
Genome wide in silico characterization of Dof gene families of pigeonpea (Cajanus cajan (L) Millsp.).

PubMed

Malviya, N; Gupta, S; Singh, V K; Yadav, M K; Bisht, N C; Sarangi, B K; Yadav, D

2015-02-01

The DNA binding with One Finger (Dof) protein is a plant specific transcription factor involved in the regulation of wide range of processes. The analysis of whole genome sequence of pigeonpea has identified 38 putative Dof genes (CcDof) distributed on 8 chromosomes. A total of 17 out of 38 CcDof genes were found to be intronless. A comprehensive in silico characterization of CcDof gene family including the gene structure, chromosome location, protein motif, phylogeny, gene duplication and functional divergence has been attempted. The phylogenetic analysis resulted in 3 major clusters with closely related members in phylogenetic tree revealed common motif distribution. The in silico cis-regulatory element analysis revealed functional diversity with predominance of light responsive and stress responsive elements indicating the possibility of these CcDof genes to be associated with photoperiodic control and biotic and abiotic stress. The duplication pattern showed that tandem duplication is predominant over segmental duplication events. The comparative phylogenetic analysis of these Dof proteins along with 78 soybean, 36 Arabidopsis and 30 rice Dof proteins revealed 7 major clusters. Several groups of orthologs and paralogs were identified based on phylogenetic tree constructed. Our study provides useful information for functional characterization of CcDof genes.
Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

PubMed

Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

2002-12-15

We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.
Differentiation of Toxocara canis and Toxocara cati based on PCR-RFLP analyses of rDNA-ITS and mitochondrial cox1 and nad1 regions.

PubMed

Mikaeili, Fattaneh; Mathis, Alexander; Deplazes, Peter; Mirhendi, Hossein; Barazesh, Afshin; Ebrahimi, Sepideh; Kia, Eshrat Beigom

2017-09-26

The definitive genetic identification of Toxocara species is currently based on PCR/sequencing. The objectives of the present study were to design and conduct an in silico polymerase chain reaction-restriction fragment length polymorphism method for identification of Toxocara species. In silico analyses using the DNASIS and NEBcutter softwares were performed with rDNA internal transcribed spacers, and mitochondrial cox1 and nad1 sequences obtained in our previous studies along with relevant sequences deposited in GenBank. Consequently, RFLP profiles were designed and all isolates of T. canis and T. cati collected from dogs and cats in different geographical areas of Iran were investigated with the RFLP method using some of the identified suitable enzymes. The findings of in silico analyses predicted that on the cox1 gene only the MboII enzyme is appropriate for PCR-RFLP to reliably distinguish the two species. No suitable enzyme for PCR-RFLP on the nad1 gene was identified that yields the same pattern for all isolates of a species. DNASIS software showed that there are 241 suitable restriction enzymes for the differentiation of T. canis from T. cati based on ITS sequences. RsaI, MvaI and SalI enzymes were selected to evaluate the reliability of the in silico PCR-RFLP. The sizes of restriction fragments obtained by PCR-RFLP of all samples consistently matched the expected RFLP patterns. The ITS sequences are usually conserved and the PCR-RFLP approach targeting the ITS sequence is recommended for the molecular differentiation of Toxocara species and can provide a reliable tool for identification purposes particularly at the larval and egg stages.
Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

PubMed Central

Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

2012-01-01

DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226
Development and validation of an rDNA operon based primer walking strategy applicable to de novo bacterial genome finishing

PubMed Central

Eastman, Alexander W.; Yuan, Ze-Chun

2015-01-01

Advances in sequencing technology have drastically increased the depth and feasibility of bacterial genome sequencing. However, little information is available that details the specific techniques and procedures employed during genome sequencing despite the large numbers of published genomes. Shotgun approaches employed by second-generation sequencing platforms has necessitated the development of robust bioinformatics tools for in silico assembly, and complete assembly is limited by the presence of repetitive DNA sequences and multi-copy operons. Typically, re-sequencing with multiple platforms and laborious, targeted Sanger sequencing are employed to finish a draft bacterial genome. Here we describe a novel strategy based on the identification and targeted sequencing of repetitive rDNA operons to expedite bacterial genome assembly and finishing. Our strategy was validated by finishing the genome of Paenibacillus polymyxa strain CR1, a bacterium with potential in sustainable agriculture and bio-based processes. An analysis of the 38 contigs contained in the P. polymyxa strain CR1 draft genome revealed 12 repetitive rDNA operons with varied intragenic and flanking regions of variable length, unanimously located at contig boundaries and within contig gaps. These highly similar but not identical rDNA operons were experimentally verified and sequenced simultaneously with multiple, specially designed primer sets. This approach also identified and corrected significant sequence rearrangement generated during the initial in silico assembly of sequencing reads. Our approach reduces the required effort associated with blind primer walking for contig assembly, increasing both the speed and feasibility of genome finishing. Our study further reinforces the notion that repetitive DNA elements are major limiting factors for genome finishing. Moreover, we provided a step-by-step workflow for genome finishing, which may guide future bacterial genome finishing projects. PMID:25653642

CAPRRESI: Chimera Assembly by Plasmid Recovery and Restriction Enzyme Site Insertion.

PubMed

Santillán, Orlando; Ramírez-Romero, Miguel A; Dávila, Guillermo

2017-06-25

Here, we present chimera assembly by plasmid recovery and restriction enzyme site insertion (CAPRRESI). CAPRRESI benefits from many strengths of the original plasmid recovery method and introduces restriction enzyme digestion to ease DNA ligation reactions (required for chimera assembly). For this protocol, users clone wildtype genes into the same plasmid (pUC18 or pUC19). After the in silico selection of amino acid sequence regions where chimeras should be assembled, users obtain all the synonym DNA sequences that encode them. Ad hoc Perl scripts enable users to determine all synonym DNA sequences. After this step, another Perl script searches for restriction enzyme sites on all synonym DNA sequences. This in silico analysis is also performed using the ampicillin resistance gene (ampR) found on pUC18/19 plasmids. Users design oligonucleotides inside synonym regions to disrupt wildtype and ampR genes by PCR. After obtaining and purifying complementary DNA fragments, restriction enzyme digestion is accomplished. Chimera assembly is achieved by ligating appropriate complementary DNA fragments. pUC18/19 vectors are selected for CAPRRESI because they offer technical advantages, such as small size (2,686 base pairs), high copy number, advantageous sequencing reaction features, and commercial availability. The usage of restriction enzymes for chimera assembly eliminates the need for DNA polymerases yielding blunt-ended products. CAPRRESI is a fast and low-cost method for fusing protein-coding genes.
Environmental metabarcodes for insects: in silico PCR reveals potential for taxonomic bias.

PubMed

Clarke, Laurence J; Soubrier, Julien; Weyrich, Laura S; Cooper, Alan

2014-11-01

Studies of insect assemblages are suited to the simultaneous DNA-based identification of multiple taxa known as metabarcoding. To obtain accurate estimates of diversity, metabarcoding markers ideally possess appropriate taxonomic coverage to avoid PCR-amplification bias, as well as sufficient sequence divergence to resolve species. We used in silico PCR to compare the taxonomic coverage and resolution of newly designed insect metabarcodes (targeting 16S) with that of existing markers [16S and cytochrome oxidase c subunit I (COI)] and then compared their efficiency in vitro. Existing metabarcoding primers amplified in silico <75% of insect species with complete mitochondrial genomes available, whereas new primers targeting 16S provided >90% coverage. Furthermore, metabarcodes targeting COI appeared to introduce taxonomic PCR-amplification bias, typically amplifying a greater percentage of Lepidoptera and Diptera species, while failing to amplify certain orders in silico. To test whether bias predicted in silico was observed in vitro, we created an artificial DNA blend containing equal amounts of DNA from 14 species, representing 11 insect orders and one arachnid. We PCR-amplified the blend using five primer sets, targeting either COI or 16S, with high-throughput amplicon sequencing yielding more than 6 million reads. In vitro results typically corresponded to in silico PCR predictions, with newly designed 16S primers detecting 11 insect taxa present, thus providing equivalent or better taxonomic coverage than COI metabarcodes. Our results demonstrate that in silico PCR is a useful tool for predicting taxonomic bias in mixed template PCR and that researchers should be wary of potential bias when selecting metabarcoding markers. © 2014 John Wiley & Sons Ltd.
In silico analysis of subtilisin from Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Mustafha, Siti Mardhiah; Murad, Abdul Munir Abdul; Mahadi, Nor Muhammad; Kamaruddin, Shazilah; Bakar, Farah Diba Abu

2015-09-01

Subtilisin constitute as a major player in industrial enzymes that has a wide range of application especially in the detergent industry. In this study, a cDNA encoding for subtilisin (GaSUBT) was extracted from the psychrophilic yeast, Glaciozyma antarctica PI12, PCR amplified and sequenced. Various bioinformatics tools were used to characterize the GaSUBT. GaSUBT contains 1587 bp nucleotides encoding for 529 amino acids. The predicted molecular weight of the deduced protein is 55.34 kDa with an isoelectric point of 6.25. GaSUBT was predicted to possess a signal peptide and pro-peptide consisting of a peptidase inhibitor I9 sequence. From the sequence alignment analysis of deduced amino acids with other subtilisins in the NCBI database showed that the sequences surrounding the catalytic triad that forms the catalytic domain are well conserved.
A dominant negative mutation at the ATP binding domain of AMHR2 is associated with a defective anti-Müllerian hormone signaling pathway.

PubMed

Li, Lin; Zhou, Xueya; Wang, Xi; Wang, Jing; Zhang, Wei; Wang, Binbin; Cao, Yunxia; Kee, Kehkooi

2016-09-01

Does a heterozygous mutation in AMHR2, identified in whole-exome sequencings (WES) of patients with primary ovarian insufficiency (POI), cause a defect in anti-Müllerian hormone (AMH) signaling? The I209N mutation at the adenosine triphosphate binding domain of AMHR2 exerts dominant negative defects in the AMH signaling pathway. Previous studies have demonstrated the associations of several sequence variants in AMH or AMHR2 with POI, but no functional assay has been performed to verify whether there was any defect on AMH signaling. Ninety-six unrelated female Chinese Han patients were diagnosed with idiopathic POI and subjected to WES. In silico analysis was done for the sequence variants followed by molecular assays to examine the functional effects of the sequence variants in human granulosa cells. In silico analysis, immunostaining, Western analysis, genome-wide expression analysis, quantitatively polymerase chain reaction were applied to the characterization of the sequence variants. We identified one novel heterozygous missense variant, p.Ala17Glu (A17E), in AMHR2. Subsequently, A17E and two independently reported missense variants, p.Ile209Asn (I209N) and p.Leu354Phe (L354F), were evaluated for effects on the AMH signaling pathway. In silico analysis predicted that all three variants may be deleterious. However, only one variant, I209N, showed severe defects in transducing the AMH signal as well as impaired SMAD1/5/8 phosphorylation. Furthermore, using genome-wide gene expression analysis, we identified genes whose expression was affected by the mutation, these included genes previously reported to participate in AMH signaling as well as newly identified genes. They are EMILIN2, FAM155A, GATA2, HES5, ID1, ID2, RLTPR, SMAD7, CBL, MALAT1 and SMARCA2. None. Although the in vitro assays demonstrated the causative effect of I209N on AMH signaling, further studies need to validate its long-term effects on folliculogenesis and POI. These results will aid both researchers and clinicians in understanding the molecular pathology of AMH signaling and POI to develop diagnostic assays or therapeutics approaches. Research funding is provided by the Ministry of Science and Technology of China [2012CB944704; 2012CB966702], and the National Natural Science Foundation of China [Grant number: 31171429]. The authors declare no conflict of interest. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
In silico analysis of a novel MKRN3 missense mutation in familial central precocious puberty.

PubMed

Neocleous, Vassos; Shammas, Christos; Phelan, Marie M; Nicolaou, Stella; Phylactou, Leonidas A; Skordis, Nicos

2016-01-01

The onset of puberty is influenced by the interplay of stimulating and restraining factors, many of which have a genetic origin. Premature activation of the GnRH secretion in central precocious puberty (CPP) may arise either from gain-of-function mutations of the KISS1 and KISS1R genes or from loss-of-function manner mutations of the MKRN3 gene leading to MKRN3 deficiency. To explore the genetic causes responsible for CPP and the potential role of the RING finger protein 3 (MKRN3) gene. We investigated potential sequence variations in the intronless MKRN3 gene by Sanger sequencing of the entire 507 amino acid coding region of exon 1 in a family with two affected girls presented with CPP at the age of 6 and 5·7 years, respectively. A novel heterozygous g.Gly312Asp missense mutation in the MKRN3 gene was identified in these siblings. The imprinted MKRN3 missense mutation was also identified as expected in the unaffected father and followed as expected an imprinted mode of inheritance. In silico analysis of the altered missense variant using the computational algorithms Polyphen2, SIFT and Mutation Taster predicted a damage and pathogenic alteration causing CPP. The pathogenicity of the alteration at the protein level via an in silico structural model is also explored. A novel mutation in the MKRN3 gene in two sisters with CPP was identified, supporting the fundamental role of this gene in the suppression of the hypothalamic GnRH neurons. © 2015 John Wiley & Sons Ltd.
First Pass Annotation of Promoters on Human Chromosome 22

PubMed Central

Scherf, Matthias; Klingenhoff, Andreas; Frech, Kornelie; Quandt, Kerstin; Schneider, Ralf; Grote, Korbinian; Frisch, Matthias; Gailus-Durner, Valérie; Seidel, Alexander; Brack-Werner, Ruth; Werner, Thomas

2001-01-01

The publication of the first almost complete sequence of a human chromosome (chromosome 22) is a major milestone in human genomics. Together with the sequence, an excellent annotation of genes was published which certainly will serve as an information resource for numerous future projects. We noted that the annotation did not cover regulatory regions; in particular, no promoter annotation has been provided. Here we present an analysis of the complete published chromosome 22 sequence for promoters. A recent breakthrough in specific in silico prediction of promoter regions enabled us to attempt large-scale prediction of promoter regions on chromosome 22. Scanning of sequence databases revealed only 20 experimentally verified promoters, of which 10 were correctly predicted by our approach. Nearly 40% of our 465 predicted promoter regions are supported by the currently available gene annotation. Promoter finding also provides a biologically meaningful method for “chromosomal scaffolding”, by which long genomic sequences can be divided into segments starting with a gene. As one example, the combination of promoter region prediction with exon/intron structure predictions greatly enhances the specificity of de novo gene finding. The present study demonstrates that it is possible to identify promoters in silico on the chromosomal level with sufficient reliability for experimental planning and indicates that a wealth of information about regulatory regions can be extracted from current large-scale (megabase) sequencing projects. Results are available on-line at http://genomatix.gsf.de/chr22/. PMID:11230158
Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses.

PubMed

Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A; Janke, Axel

2015-05-27

The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
An "in silico" Bioinformatics Laboratory Manual for Bioscience Departments: "Prediction of Glycosylation Sites in Phosphoethanolamine Transferases"

ERIC Educational Resources Information Center

Alyuruk, Hakan; Cavas, Levent

2014-01-01

Genomics and proteomics projects have produced a huge amount of raw biological data including DNA and protein sequences. Although these data have been stored in data banks, their evaluation is strictly dependent on bioinformatics tools. These tools have been developed by multidisciplinary experts for fast and robust analysis of biological data.…
Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with pulmonary hypertension

USDA-ARS?s Scientific Manuscript database

The availability of whole genome sequence (WGS) data has made it possible to discover protein variants in silico. However, existing bovine WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in U.S. beef cattle...
De-MetaST-BLAST: A Tool for the Validation of Degenerate Primer Sets and Data Mining of Publicly Available Metagenomes

PubMed Central

Gulvik, Christopher A.; Effler, T. Chad; Wilhelm, Steven W.; Buchan, Alison

2012-01-01

Development and use of primer sets to amplify nucleic acid sequences of interest is fundamental to studies spanning many life science disciplines. As such, the validation of primer sets is essential. Several computer programs have been created to aid in the initial selection of primer sequences that may or may not require multiple nucleotide combinations (i.e., degeneracies). Conversely, validation of primer specificity has remained largely unchanged for several decades, and there are currently few available programs that allows for an evaluation of primers containing degenerate nucleotide bases. To alleviate this gap, we developed the program De-MetaST that performs an in silico amplification using user defined nucleotide sequence dataset(s) and primer sequences that may contain degenerate bases. The program returns an output file that contains the in silico amplicons. When De-MetaST is paired with NCBI’s BLAST (De-MetaST-BLAST), the program also returns the top 10 nr NCBI database hits for each recovered in silico amplicon. While the original motivation for development of this search tool was degenerate primer validation using the wealth of nucleotide sequences available in environmental metagenome and metatranscriptome databases, this search tool has potential utility in many data mining applications. PMID:23189198
Issues on machine learning for prediction of classes among molecular sequences of plants and animals

NASA Astrophysics Data System (ADS)

Stehlik, Milan; Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

2012-09-01

Nowadays major laboratories of the world are turning towards in-silico experimentation due to their ease, reproducibility and accuracy. The ethical issues concerning wet lab experimentations are also minimal in in-silico experimentations. But before we turn fully towards dry lab simulations it is necessary to understand the discrepancies and bottle necks involved with dry lab experimentations. It is necessary before reporting any result using dry lab simulations to perform in-depth statistical analysis of the data. Keeping same in mind here we are presenting a collaborative effort to correlate findings and results of various machine learning algorithms and checking underlying regressions and mutual dependencies so as to develop an optimal classifier and predictors.
Gastrointestinal Endogenous Proteins as a Source of Bioactive Peptides - An In Silico Study

PubMed Central

Dave, Lakshmi A.; Montoya, Carlos A.; Rutherfurd, Shane M.; Moughan, Paul J.

2014-01-01

Dietary proteins are known to contain bioactive peptides that are released during digestion. Endogenous proteins secreted into the gastrointestinal tract represent a quantitatively greater supply of protein to the gut lumen than those of dietary origin. Many of these endogenous proteins are digested in the gastrointestinal tract but the possibility that these are also a source of bioactive peptides has not been considered. An in silico prediction method was used to test if bioactive peptides could be derived from the gastrointestinal digestion of gut endogenous proteins. Twenty six gut endogenous proteins and seven dietary proteins were evaluated. The peptides present after gastric and intestinal digestion were predicted based on the amino acid sequence of the proteins and the known specificities of the major gastrointestinal proteases. The predicted resultant peptides possessing amino acid sequences identical to those of known bioactive peptides were identified. After gastrointestinal digestion (based on the in silico simulation), the total number of bioactive peptides predicted to be released ranged from 1 (gliadin) to 55 (myosin) for the selected dietary proteins and from 1 (secretin) to 39 (mucin-5AC) for the selected gut endogenous proteins. Within the intact proteins and after simulated gastrointestinal digestion, angiotensin converting enzyme (ACE)-inhibitory peptide sequences were the most frequently observed in both the dietary and endogenous proteins. Among the dietary proteins, after in silico simulated gastrointestinal digestion, myosin was found to have the highest number of ACE-inhibitory peptide sequences (49 peptides), while for the gut endogenous proteins, mucin-5AC had the greatest number of ACE-inhibitory peptide sequences (38 peptides). Gut endogenous proteins may be an important source of bioactive peptides in the gut particularly since gut endogenous proteins represent a quantitatively large and consistent source of protein. PMID:24901416
Cadmium effects on sperm morphology and semenogelin with relates to increased ROS in infertile smokers: An in vitro and in silico approach.

PubMed

Ranganathan, Parameswari; Rao, Kamini A; Sudan, Jesu Jaya; Balasundaram, Sridharan

2018-06-01

Smoking releases cadmium (Cd), the metal toxicant which causes an imbalance in reactive oxygen species level in seminal plasma. This imbalance is envisaged to impair the sperm DNA morphology and thereby result in male infertility. In order to correlate this association, we performed in vitro and in silico studies and evaluated the influence of reactive oxygen species imbalance on sperm morphology impairments due to smoking. The study included 76 infertile smokers, 72 infertile non-smokers, 68 fertile smokers and 74 fertile non-smokers (control). Semen samples were collected at regular intervals from all the subjects. Semen parameters were examined by computer assisted semen analysis, quantification of metal toxicant by atomic absorption spectrophotometer, assessment of antioxidants through enzymatic and non-enzymatic methods, diagnosis of reactive oxygen species by nitro blue tetrazolium method and Cd influence on sperm protein by in vitro and in silico methods. Our analysis revealed that the levels of cigarette toxicants in semen were high, accompanied by low levels of antioxidants in seminal plasma of infertile smoker subjects. In addition the investigation of Cd treated sperm cells through scanning electronic microscope showed the mid piece damage of spermatozoa. The dispersive X-ray analysis to identify the elemental composition further confirmed the presence of Cd. Finally, the in-silico analysis on semenogelin sequences revealed the D-H-D motif which represents a favourable binding site for Cd coordination. Our findings clearly indicated the influence of Cd on reactive oxygen species leading to impaired sperm morphology leading to male infertility. Copyright © 2018 Society for Biology of Reproduction & the Institute of Animal Reproduction and Food Research of Polish Academy of Sciences in Olsztyn. Published by Elsevier B.V. All rights reserved.
Imperfect duplicate insertions type of mutations in plasmepsin V modulates binding properties of PEXEL motifs of export proteins in Indian Plasmodium vivax.

PubMed

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

2013-01-01

Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200-300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246-249 AA and SLSE from 266-269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing.
Imperfect Duplicate Insertions Type of Mutations in Plasmepsin V Modulates Binding Properties of PEXEL Motifs of Export Proteins in Indian Plasmodium vivax

PubMed Central

Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

2013-01-01

Introduction Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200–300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. Method We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Results Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246–249 AA and SLSE from 266–269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Conclusion/Significance Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing. PMID:23555891
Comparative Analysis of Expressed Genes from Cacao Meristems Infected by Moniliophthora perniciosa

PubMed Central

Gesteira, Abelmon S.; Micheli, Fabienne; Carels, Nicolas; Da Silva, Aline C.; Gramacho, Karina P.; Schuster, Ivan; Macêdo, Joci N.; Pereira, Gonçalo A. G.; Cascardo, Júlio C. M.

2007-01-01

Background and Aims Witches' broom disease is caused by the hemibiotrophic basidiomycete Moniliophthora perniciosa, and is one of the most important diseases of cacao in the western hemisphere. Because very little is known about the global process of such disease development, expressed sequence tags (ESTs) were used to identify genes expressed during the Theobroma cacao–Moniliophthora perniciosa interaction. Methods Two cDNA libraries corresponding to the resistant (RT) and susceptible (SP) cacao–M. perniciosa interactions were constructed from total RNA, using the DB SMART Creator cDNA library kit (Clontech). Clones were randomly selected, sequenced from the 5′ end and analysed using bioinformatics tools including in silico analysis of the differential gene expression. Key Results A total of 6884 ESTs were generated from the RT and SP cDNA libraries. These ESTs were composed of 2585 singlets and 341 contigs for a total of 2926 non-redundant sequences. The redundancy of the libraries was low and their specificity high when compared with the few other cacao libraries already published. Sequence analysis allowed the assignment of a putative functional category for 54 % of sequences, whereas approx. 22 % of sequences corresponded to unknown function and approx. 24 % of sequences did not show any significant similarity with other proteins present in the database. Despite the similar overall distribution of the sequences in functional categories between the two libraries, qualitative differences were observed. Genes involved during the defence response to pathogen infection or in programmed cell death were identified, such as pathogenesis related-proteins, trypsin inhibitor or oxalate oxidase, and some of them showed an in silico differential expression between the resistant and the susceptible interactions. Conclusions As far as is known this is the first EST resource from the cacao–M. perniciosa interaction and it is believed that it will provide a significant contribution to the understanding of the molecular mechanisms of the resistance and susceptibility of cacao to M. perniciosa, to develop strategies to control witches broom, and as a source of polymorphism for molecular marker development and marker-assisted selection. PMID:17557832
Identification of expressed sequences in the coffee genome potentially associated with somatic embryogenesis.

PubMed

Silva, A T; Paiva, L V; Andrade, A C; Barduche, D

2013-05-21

Brazil possesses the most modern and productive coffee growing farms in the world, but technological development is desired to cope with the increasing world demand. One way to increase Brazilian coffee growing productivity is wide scale production of clones with superior genotypes, which can be obtained with in vitro propagation technique, or from tissue culture. These procedures can generate thousands of clones. However, the methodologies for in vitro cultivation are genotype-dependent, which leads to an almost empirical development of specific protocols for each species. Therefore, molecular markers linked to the biochemical events of somatic embryogenesis would greatly facilitate the development of such protocols. In this context, sequences potentially involved in embryogenesis processes in the coffee plant were identified in silico from libraries generated by the Brazilian Coffee Genome Project. Through these in silico analyses, we identified 15 EST-contigs related to the embryogenesis process. Among these, 5 EST-contigs (3605, 9850, 13686, 17240, and 17265) could readily be associated with plant embryogenesis. Sequence analysis of EST-contig 3605, 9850, and 17265 revealed similarity to a polygalacturonase, to a cysteine-proteinase, and to an allergenine, respectively. Results also show that EST-contig 17265 sequences presented similarity to an expansin. Finally, analysis of EST-contig 17240 revealed similarity to a protein of unknown function, but it grouped in the similarity dendrogram with the WUSCHEL transcription factor. The data suggest that these EST-contigs are related to the embryogenic process and have potential as molecular markers to increase methodological efficiency in obtaining coffee plant embryogenic materials.
Linking disease-associated genes to regulatory networks via promoter organization

PubMed Central

Döhr, S.; Klingenhoff, A.; Maier, H.; de Angelis, M. Hrabé; Werner, T.; Schneider, R.

2005-01-01

Pathway- or disease-associated genes may participate in more than one transcriptional co-regulation network. Such gene groups can be readily obtained by literature analysis or by high-throughput techniques such as microarrays or protein-interaction mapping. We developed a strategy that defines regulatory networks by in silico promoter analysis, finding potentially co-regulated subgroups without a priori knowledge. Pairs of transcription factor binding sites conserved in orthologous genes (vertically) as well as in promoter sequences of co-regulated genes (horizontally) were used as seeds for the development of promoter models representing potential co-regulation. This approach was applied to a Maturity Onset Diabetes of the Young (MODY)-associated gene list, which yielded two models connecting functionally interacting genes within MODY-related insulin/glucose signaling pathways. Additional genes functionally connected to our initial gene list were identified by database searches with these promoter models. Thus, data-driven in silico promoter analysis allowed integrating molecular mechanisms with biological functions of the cell. PMID:15701758
Identification of a 'Candidatus Phytoplasma hispanicum'-related strain, associated with yellows-type diseases, in smoke-tree sharpshooter (Homalodisca liturata Ball).

PubMed

Servín-Villegas, Rosalía; Caamal-Chan, Maria Goretty; Chavez-Medina, Alicia; Loera-Muro, Abraham; Barraza, Aarón; Medina-Hernández, Diana; Holguín-Peña, Ramón Jaime

2018-04-11

The 16SrXIII group from phytoplasma bacteria were identified in salivary glands from Homalodisca liturata, which were collected in El Comitán on the Baja California peninsula in Mexico. We were able to positively identify 15 16S rRNA gene sequences with the corresponding signature sequence of 'CandidatusPhytoplasma' (CAAGAYBATKATGTKTAGCYGGDCT) and in silico restriction fragment length polymorphism (RFLP) profiles (F value estimations) coupled with a phylogenetic analysis to confirm their relatedness to 'CandidatusPhytoplasma hispanicum', which in turn belongs to the 16SrXIII group. A restriction analysis was carried out with AluI and EcoRI to confirm that the five sequences belongs to subgroup D. The rest of the sequences did not exhibit any known RFLP profile related to a subgroup reported in the 16SrXIII group.
‘Candidatus Phytoplasma luffae’, a novel taxon associated with a witches’-broom disease of loofah, Luffa aegyptica Mill

USDA-ARS?s Scientific Manuscript database

The phytoplasma associated with witches’ broom disease of loofah (Luffa aegyptica Mill., syn. L.uffa cylindrica (L.) M.J. Roem.) in Taiwan was classified in group 16SrVIII, subgroup A (16SrVIII-A), based on results from actual and in silico RFLP analysis of 16S rRNA gene sequences. Nucleotide sequ...

Prediction of the Hydrogen Peroxide-Induced Methionine Oxidation Propensity in Monoclonal Antibodies.

PubMed

Agrawal, Neeraj J; Dykstra, Andrew; Yang, Jane; Yue, Hai; Nguyen, Xichdao; Kolvenbach, Carl; Angell, Nicolas

2018-05-01

Methionine oxidation in therapeutic antibodies can impact the product's stability, clinical efficacy, and safety and hence it is desirable to address the methionine oxidation liability during antibody discovery and development phase. Although the current experimental approaches can identify the oxidation-labile methionine residues, their application is limited mostly to the development phase. We demonstrate an in silico method that can be used to predict oxidation-labile residues based solely on the antibody sequence and structure information. Since antibody sequence information is available in the discovery phase, the in silico method can be applied very early on to identify the oxidation-labile methionine residues and subsequently address the oxidation liability. We believe that the in silico method for methionine oxidation liability assessment can aid in antibody discovery and development phase to address the liability in a more rational way. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
In silico structural analysis of group 3, 6 and 9 allergens from Dermatophagoides farinae.

PubMed

Teng, Feixiang; Yu, Lili; Bian, Yonghua; Sun, Jinxia; Wu, Juansong; Ling, Cunbao; Yang, Li; Wang, Yungang; Cui, Yubao

2015-05-01

Dermatophagoides farinae (Hughes; Acari: Pyroglyphidae) are the predominant source of dust mite allergens, which provoke allergic diseases, such as rhinitis, asthma and eczema. Of the 30 allergen groups produced by D. farinae, the Der f 3, Der f 6 and Der f 9 allergens are all trypsin‑associated proteins, however little else is currently known about them. The present study used in silico tools to compare the amino acid sequences, and predict the secondary and tertiary structures of Der f 3, Der f 6 and Der f 9 allergens. Protein sequence alignment detected ~46% identity between Der f 3, Der f 6 and Der f 9. Furthermore, each protein was shown to contain three active sites and two highly conserved trypsin functional domains. Predictions of the secondary and tertiary structure identified α‑helices, β‑sheets and random coils. The active sites of the three proteins appeared to fold onto each other in a three‑dimensional model, constituting the active site of the enzyme. Epitope analysis demonstrated that Der f 3, Der f 6 and Der f 9 have 4‑5 potential epitopes located in random coils, and the epitope sequences of Der f 3, Der f 6 and Der f 9 were shown to overlap in two domains (at amino acids 83‑87 and 179‑180); however the residues in these two domains were not identical. The present study aimed to conduct a biochemical and genetic analysis of these three allergens, and to potentially contribute to the development of vaccines for allergen‑specific immunotherapy.
Evolutionary distance from human homologs reflects allergenicity of animal food proteins.

PubMed

Jenkins, John A; Breiteneder, Heimo; Mills, E N Clare

2007-12-01

In silico analysis of allergens can identify putative relationships among protein sequence, structure, and allergenic properties. Such systematic analysis reveals that most plant food allergens belong to a restricted number of protein superfamilies, with pollen allergens behaving similarly. We have investigated the structural relationships of animal food allergens and their evolutionary relatedness to human homologs to define how closely a protein must resemble a human counterpart to lose its allergenic potential. Profile-based sequence homology methods were used to classify animal food allergens into Pfam families, and in silico analyses of their evolutionary and structural relationships were performed. Animal food allergens could be classified into 3 main families--tropomyosins, EF-hand proteins, and caseins--along with 14 minor families each composed of 1 to 3 allergens. The evolutionary relationships of each of these allergen superfamilies showed that in general, proteins with a sequence identity to a human homolog above approximately 62% were rarely allergenic. Single substitutions in otherwise highly conserved regions containing IgE epitopes in EF-hand parvalbumins may modulate allergenicity. These data support the premise that certain protein structures are more allergenic than others. Contrasting with plant food allergens, animal allergens, such as the highly conserved tropomyosins, challenge the capability of the human immune system to discriminate between foreign and self-proteins. Such immune responses run close to becoming autoimmune responses. Exploiting the closeness between animal allergens and their human homologs in the development of recombinant allergens for immunotherapy will need to consider the potential for developing unanticipated autoimmune responses.
MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

PubMed Central

Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

2007-01-01

MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome

PubMed Central

Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T.; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas

2003-01-01

More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). PMID:12634390
In silico pattern-based analysis of the human cytomegalovirus genome.

PubMed

Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas

2003-04-01

More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).
In silico characterization and expression analysis of the multigene family encoding the Bowman-Birk protease inhibitor in soybean.

PubMed

de Almeida Barros, Beatriz; da Silva, Wiliane Garcia; Moreira, Maurilio Alves; de Barros, Everaldo Gonçalves

2012-01-01

The Bowman-Birk (BBI) protease inhibitors can be used as source of sulfur amino acids, can regulate endogenous protease activity during seed germination and during the defense response of plants to pathogens. In soybean this family has not been fully described. The goal of this work was to characterize in silico and analyze the expression of the members of this family in soybean. We identified 11 potential BBI genes in the soybean genome. In each one of them at least a characteristic BBI conserved domain was detected in addition to a potential signal peptide. The sequences have been positioned in the soybean physical map and the promoter regions were analyzed with respect to known regulatory elements. Elements related to seed-specific expression and also to response to biotic and abiotic stresses have been identified. Based on the in silico analysis and also on quantitative RT-PCR data it was concluded that BBI-A, BBI-CII and BBI-DII are expressed specifically in the seed. The expression profiles of these three genes are similar along seed development. Their expressions reach a maximum in the intermediate stages and decrease as the seed matures. The BBI-DII transcripts are the most abundant ones followed by those of BBI-A and BBI-CII.
A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina.

PubMed

Bidard, Frédérique; Imbeaud, Sandrine; Reymond, Nancie; Lespinet, Olivier; Silar, Philippe; Clavé, Corinne; Delacroix, Hervé; Berteaux-Lecellier, Véronique; Debuchy, Robert

2010-06-18

The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis.
Comprehensive in silico allergenicity assessment of novel protein engineered chimeric Cry proteins for safe deployment in crops.

PubMed

Rathinam, Maniraj; Singh, Shweta; Pattanayak, Debasis; Sreevathsa, Rohini

2017-08-02

Development of chimeric Cry toxins by protein engineering of known and validated proteins is imperative for enhancing the efficacy and broadening the insecticidal spectrum of these genes. Expression of novel Cry proteins in food crops has however created apprehensions with respect to the safety aspects. To clarify this, premarket evaluation consisting of an array of analyses to evaluate the unintended effects is a prerequisite to provide safety assurance to the consumers. Additionally, series of bioinformatic tools as in silico aids are being used to evaluate the likely allergenic reaction of the proteins based on sequence and epitope similarity with known allergens. In the present study, chimeric Cry toxins developed through protein engineering were evaluated for allergenic potential using various in silico algorithms. Major emphasis was on the validation of allergenic potential on three aspects of paramount significance viz., sequence-based homology between allergenic proteins, validation of conformational epitopes towards identification of food allergens and physico-chemical properties of amino acids. Additionally, in vitro analysis pertaining to heat stability of two of the eight chimeric proteins and pepsin digestibility further demonstrated the non-allergenic potential of these chimeric toxins. The study revealed for the first time an all-encompassing evaluation that the recombinant Cry proteins did not show any potential similarity with any known allergens with respect to the parameters generally considered for a protein to be designated as an allergen. These novel chimeric proteins hence can be considered safe to be introgressed into plants.
In silico characterization and analysis of RTBP1 and NgTRF1 protein through MD simulation and molecular docking - A comparative study.

PubMed

Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

2015-02-06

Gaining access to sequence and structure information of telomere binding proteins helps in understanding the essential biological processes involve in conserved sequence specific interaction between DNA and the proteins. Rice telomere binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix turn helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain but till now there is very less communication on the in silico studies of these complete proteins.Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK web server.Digging up all the facts about the proteins it was reveled that around 120 amino acids in the tail part was showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicates the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and Energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
In Silico Characterization and Analysis of RTBP1 and NgTRF1 Protein Through MD Simulation and Molecular Docking: A Comparative Study.

PubMed

Mukherjee, Koel; Pandey, Dev Mani; Vidyarthi, Ambarish Saran

2015-09-01

Gaining access to sequence and structure information of telomere-binding proteins helps in understanding the essential biological processes involve in conserved sequence-specific interaction between DNA and the proteins. Rice telomere-binding protein (RTBP1) and Nicotiana glutinosa telomere repeat binding factor (NgTRF1) are helix-turn-helix motif type of proteins that plays role in telomeric DNA protection and length regulation. Both the proteins share same type of domain, but till now there is very less communication on the in silico studies of these complete proteins. Here we intend to do a comparative study between two proteins through modeling of the complete proteins, physiochemical characterization, MD simulation and DNA-protein docking. I-TASSER and CLC protein work bench was performed to find out the protein 3D structure as well as the different parameters to characterize the proteins. MD simulation was completed by GROMOS forcefield of GROMACS for 10 ns of time stretch. The simulated 3D structures were docked with template DNA (3D DNA modeled through 3D-DART) of TTTAGGG conserved sequence motif using HADDOCK Web server. By digging up all the facts about the proteins, it was revealed that around 120 amino acids in the tail part were showing a good sequence similarity between the proteins. Molecular modeling, sequence characterization and secondary structure prediction also indicate the similarity between the protein's structure and sequence. The result of MD simulation highlights on the RMSD, RMSF, Rg, PCA and energy plots which also conveys the similar type of motional behavior between them. The best complex formation for both the proteins in docking result also indicates for the first interaction site which is mainly the helix3 region of the DNA-binding domain. The overall computational analysis reveals that RTBP1 and NgTRF1 proteins display good amount of similarity in their physicochemical properties, structure, dynamics and binding mode.
Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates.

PubMed

Cao, Youfang; Wang, Lianjie; Xu, Kexue; Kou, Chunhai; Zhang, Yulei; Wei, Guifang; He, Junjian; Wang, Yunfang; Zhao, Liping

2005-07-26

A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.
Computational approach to analyze isolated ssDNA aptamers against angiotensin II.

PubMed

Heiat, Mohammad; Najafi, Ali; Ranjbar, Reza; Latifi, Ali Mohammad; Rasaee, Mohammad Javad

2016-07-20

Aptamers are oligonucleotides with highly structured molecules that can bind to their targets through specific 3-D conformation. Commonly, not all the nucleotides such as primer binding fixed region and some other sequences are vital for aptamers folding and interaction. Elimination of unnecessary regions needs trustworthy prediction tools to reduce experimental efforts and errors. Here we introduced a manipulated in-silico approach to predict the 3-D structure of aptamers and their target interactions. To design an approach for computational analysis of isolated ssDNA aptamers (FLC112, FLC125 and their truncated core region including CRC112 and CRC125), their secondary and tertiary structures were modeled by Mfold and RNA composer respectively. Output PDB files were modified from RNA to DNA in the discovery studio visualizer software. Using ZDOCK server, the aptamer-target interactions were predicted. Finally, the interaction scores were compared with the experimental results. In-silico interaction scores and the experimental outcomes were in the same descending arrangement of FLC112>CRC125>CRC112>FLC125 with similar intensity. The consistent results of innovative in-silico method with experimental outputs, affirmed that the present method may be a reliable approach. Also, it showed that the exact in-silico predictions can be utilized as a credible reference to find aptameric fragments binding potency. Copyright © 2016 Elsevier B.V. All rights reserved.
In silico genomic insights into aspects of food safety and defense mechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated from brines of naturally fermented Aloreña green table olives.

PubMed

Abriouel, Hikmate; Pérez Montoro, Beatriz; Casado Muñoz, María Del Carmen; Knapp, Charles W; Gálvez, Antonio; Benomar, Nabil

2017-01-01

Lactobacillus pentosus MP-10, isolated from brines of naturally fermented Aloreña green table olives, exhibited high probiotic potential. The genome sequence of L. pentosus MP-10 is currently considered the largest genome among lactobacilli, highlighting the microorganism's ecological flexibility and adaptability. Here, we analyzed the complete genome sequence for the presence of acquired antibiotic resistance and virulence determinants to understand their defense mechanisms and explore its putative safety in food. The annotated genome sequence revealed evidence of diverse mobile genetic elements, such as prophages, transposases and transposons involved in their adaptation to brine-associated niches. In-silico analysis of L. pentosus MP-10 genome sequence identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) as an immune system against foreign genetic elements, which consisted of six arrays (4-12 repeats) and eleven predicted cas genes [CRISPR1 and CRISPR2 consisted of 3 (Type II-C) and 8 (Type I) genes] with high similarity to L. pentosus KCA1. Bioinformatic analyses revealed L. pentosus MP-10 to be absent of acquired antibiotic resistance genes, and most resistance genes were related to efflux mechanisms; no virulence determinants were found in the genome. This suggests that L. pentosus MP-10 could be considered safe and with high-adaptation potential, which could facilitate its application as a starter culture and probiotic in food preparations.
Computational Approaches for Decoding Select Odorant-Olfactory Receptor Interactions Using Mini-Virtual Screening

PubMed Central

Harini, K.; Sowdhamini, Ramanathan

2015-01-01

Olfactory receptors (ORs) belong to the class A G-Protein Coupled Receptor superfamily of proteins. Unlike G-Protein Coupled Receptors, ORs exhibit a combinatorial response to odors/ligands. ORs display an affinity towards a range of odor molecules rather than binding to a specific set of ligands and conversely a single odorant molecule may bind to a number of olfactory receptors with varying affinities. The diversity in odor recognition is linked to the highly variable transmembrane domains of these receptors. The purpose of this study is to decode the odor-olfactory receptor interactions using in silico docking studies. In this study, a ligand (odor molecules) dataset of 125 molecules was used to carry out in silico docking using the GLIDE docking tool (SCHRODINGER Inc Pvt LTD). Previous studies, with smaller datasets of ligands, have shown that orthologous olfactory receptors respond to similarly-tuned ligands, but are dramatically different in their efficacy and potency. Ligand docking results were applied on homologous pairs (with varying sequence identity) of ORs from human and mouse genomes and ligand binding residues and the ligand profile differed among such related olfactory receptor sequences. This study revealed that homologous sequences with high sequence identity need not bind to the same/ similar ligand with a given affinity. A ligand profile has been obtained for each of the 20 receptors in this analysis which will be useful for expression and mutation studies on these receptors. PMID:26221959
Application of Genomic Technologies to the Breeding of Trees

PubMed Central

Badenes, Maria L.; Fernández i Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J.

2016-01-01

The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species. PMID:27895664
Application of Genomic Technologies to the Breeding of Trees.

PubMed

Badenes, Maria L; Fernández I Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J

2016-01-01

The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species.
Microsatellite analysis in the genome of Acanthaceae: An in silico approach.

PubMed

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.
Computational analysis and functional expression of ancestral copepod luciferase.

PubMed

Takenaka, Yasuhiro; Noda-Ogura, Akiko; Imanishi, Tadashi; Yamaguchi, Atsushi; Gojobori, Takashi; Shigeri, Yasushi

2013-10-10

We recently reported the cDNA sequences of 11 copepod luciferases from the superfamily Augaptiloidea in the order Calanoida. They were classified into two groups, Metridinidae and Heterorhabdidae/Lucicutiidae families, by phylogenetic analyses. To elucidate the evolutionary processes, we have now further isolated 12 copepod luciferases from Augaptiloidea species (Metridia asymmetrica, Metridia curticauda, Pleuromamma scutullata, Pleuromamma xiphias, Lucicutia ovaliformis and Heterorhabdus tanneri). Codon-based synonymous/nonsynonymous tests of positive selection for 25 identified copepod luciferases suggested that positive Darwinian selection operated in the evolution of Heterorhabdidae luciferases, whereas two types of Metridinidae luciferases had diversified via neutral mechanism. By in silico analysis of the decoded amino acid sequences of 25 copepod luciferases, we inferred two protein sequences as ancestral copepod luciferases. They were expressed in HEK293 cells where they exhibited notable luciferase activity both in intracellular lysates and cultured media, indicating that the luciferase activity was established before evolutionary diversification of these copepod species. © 2013.
Microbial genomic taxonomy

PubMed Central

2013-01-01

A need for a genomic species definition is emerging from several independent studies worldwide. In this commentary paper, we discuss recent studies on the genomic taxonomy of diverse microbial groups and a unified species definition based on genomics. Accordingly, strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, <10 in Karlin genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH). Species of the same genus will form monophyletic groups on the basis of 16S rRNA gene sequences, Multilocus Sequence Analysis (MLSA) and supertree analysis. In addition to the established requirements for species descriptions, we propose that new taxa descriptions should also include at least a draft genome sequence of the type strain in order to obtain a clear outlook on the genomic landscape of the novel microbe. The application of the new genomic species definition put forward here will allow researchers to use genome sequences to define simultaneously coherent phenotypic and genomic groups. PMID:24365132

Characterizing the Grape Transcriptome. Analysis of Expressed Sequence Tags from Multiple Vitis Species and Development of a Compendium of Gene Expression during Berry Development1[w

PubMed Central

Silva, Francisco Goes da; Iandolino, Alberto; Al-Kayal, Fadi; Bohlmann, Marlene C.; Cushman, Mary Ann; Lim, Hyunju; Ergul, Ali; Figueroa, Rubi; Kabuloglu, Elif K.; Osborne, Craig; Rowe, Joan; Tattersall, Elizabeth; Leslie, Anna; Xu, Jane; Baek, JongMin; Cramer, Grant R.; Cushman, John C.; Cook, Douglas R.

2005-01-01

We report the analysis and annotation of 146,075 expressed sequence tags from Vitis species. The majority of these sequences were derived from different cultivars of Vitis vinifera, comprising an estimated 25,746 unique contig and singleton sequences that survey transcription in various tissues and developmental stages and during biotic and abiotic stress. Putatively homologous proteins were identified for over 17,752 of the transcripts, with 1,962 transcripts further subdivided into one or more Gene Ontology categories. A simple structured vocabulary, with modules for plant genotype, plant development, and stress, was developed to describe the relationship between individual expressed sequence tags and cDNA libraries; the resulting vocabulary provides query terms to facilitate data mining within the context of a relational database. As a measure of the extent to which characterized metabolic pathways were encompassed by the data set, we searched for homologs of the enzymes leading from glycolysis, through the oxidative/nonoxidative pentose phosphate pathway, and into the general phenylpropanoid pathway. Homologs were identified for 65 of these 77 enzymes, with 86% of enzymatic steps represented by paralogous genes. Differentially expressed transcripts were identified by means of a stringent believability index cutoff of ≥98.4%. Correlation analysis and two-dimensional hierarchical clustering grouped these transcripts according to similarity of expression. In the broadest analysis, 665 differentially expressed transcripts were identified across 29 cDNA libraries, representing a range of developmental and stress conditions. The groupings revealed expected associations between plant developmental stages and tissue types, with the notable exception of abiotic stress treatments. A more focused analysis of flower and berry development identified 87 differentially expressed transcripts and provides the basis for a compendium that relates gene expression and annotation to previously characterized aspects of berry development and physiology. Comparison with published results for select genes, as well as correlation analysis between independent data sets, suggests that the inferred in silico patterns of expression are likely to be an accurate representation of transcript abundance for the conditions surveyed. Thus, the combined data set reveals the in silico expression patterns for hundreds of genes in V. vinifera, the majority of which have not been previously studied within this species. PMID:16219919
Characterization of free nitrogen fixing bacteria of the genus Azotobacter in organic vegetable-grown Colombian soils

PubMed Central

Jiménez, Diego Javier; Montaña, José Salvador; Martínez, María Mercedes

2011-01-01

With the purpose of isolating and characterizing free nitrogen fixing bacteria (FNFB) of the genus Azotobacter, soil samples were collected randomly from different vegetable organic cultures with neutral pH in different zones of Boyacá-Colombia. Isolations were done in selective free nitrogen Ashby-Sucrose agar obtaining a recovery of 40%. Twenty four isolates were evaluated for colony and cellular morphology, pigment production and metabolic activities. Molecular characterization was carried out using amplified ribosomal DNA restriction analysis (ARDRA). After digestion of 16S rDNA Y1-Y3 PCR products (1487pb) with AluI, HpaII and RsaI endonucleases, a polymorphism of 16% was obtained. Cluster analysis showed three main groups based on DNA fingerprints. Comparison between ribotypes generated by isolates and in silico restriction of 16S rDNA partial sequences with same restriction enzymes was done with Gen Workbench v.2.2.4 software. Nevertheless, Y1-Y2 PCR products were analysed using BLASTn. Isolate C5T from tomato (Lycopersicon esculentum) grown soils presented the same in silico restriction patterns with A. chroococcum (AY353708) and 99% of similarity with the same sequence. Isolate C5CO from cauliflower (Brassica oleracea var. botrytis) grown soils showed black pigmentation in Ashby-Benzoate agar and high similarity (91%) with A. nigricans (AB175651) sequence. In this work we demonstrated the utility of molecular techniques and bioinformatics tools as a support to conventional techniques in characterization of the genus Azotobacter from vegetable-grown soils. PMID:24031700
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

PubMed Central

Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

PubMed

Grunert, Steffen; Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
A general framework for optimization of probes for gene expression microarray and its application to the fungus Podospora anserina

PubMed Central

2010-01-01

Background The development of new microarray technologies makes custom long oligonucleotide arrays affordable for many experimental applications, notably gene expression analyses. Reliable results depend on probe design quality and selection. Probe design strategy should cope with the limited accuracy of de novo gene prediction programs, and annotation up-dating. We present a novel in silico procedure which addresses these issues and includes experimental screening, as an empirical approach is the best strategy to identify optimal probes in the in silico outcome. Findings We used four criteria for in silico probe selection: cross-hybridization, hairpin stability, probe location relative to coding sequence end and intron position. This latter criterion is critical when exon-intron gene structure predictions for intron-rich genes are inaccurate. For each coding sequence (CDS), we selected a sub-set of four probes. These probes were included in a test microarray, which was used to evaluate the hybridization behavior of each probe. The best probe for each CDS was selected according to three experimental criteria: signal-to-noise ratio, signal reproducibility, and representative signal intensities. This procedure was applied for the development of a gene expression Agilent platform for the filamentous fungus Podospora anserina and the selection of a single 60-mer probe for each of the 10,556 P. anserina CDS. Conclusions A reliable gene expression microarray version based on the Agilent 44K platform was developed with four spot replicates of each probe to increase statistical significance of analysis. PMID:20565839
Genererating a core cluster of Fasciola hepatica virulence and immunomodulation-related genes using a comparative in silico approach.

PubMed

Haçarız, Orçun; Sayers, Gearóid P

2018-04-01

A total of 71 virulence and immunomodulation-related transcripts (VIRs) of Fasciola hepatica have been previously proposed (Haçarız et al., 2015). In an attempt to further refine this cohort, an in silico meta analysis approach was carried out using publicly available sequence data of related liver flukes, Clonorchis sinensis and Opisthorchis viverrini. Data of both liver flukes were investigated in terms of sequential homology with data of non-parasitic organisms, pathogens and VIRs of F. hepatica, directional selection (Ka/Ks), and cytokine signaling relation (protein motif based). Some VIRs of F. hepatica [showing homology with immune receptors (for toll/interleukin-1, TGF-β or TNF-α), TGF-β, TNF-α, CD147, or relation with suppressors of cytokine signaling/IKBKE 1 or stimulation of TGF-β (through thrombospondin similarity)] were found to be orthologous with those of both C. sinensis and O. viverrini. The in silico analysis indicates that on the basis of genetic commonality, a total of 30 VIRs of F. hepatica are highlighted as of foremost importance in the parasite evasion strategy, through controlling of host immune system. Findings in this study could be important to further enhance our understanding of the parasitic mechanisms and develop effective control strategies against F. hepatica and other related parasites. Copyright © 2017 Elsevier Ltd. All rights reserved.
In silico site-directed mutagenesis informs species-specific predictions of chemical susceptibility derived from the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool

EPA Science Inventory

The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to address needs for rapid, cost effective methods of species extrapolation of chemical susceptibility. Specifically, the SeqAPASS tool compares the primary sequence (Level 1), functiona...
In Silico Identification of Protein Disulfide Isomerase Gene Families in the De Novo Assembled Transcriptomes of Four Different Species of the Genus Conus.

PubMed

Figueroa-Montiel, Andrea; Ramos, Marco A; Mares, Rosa E; Dueñas, Salvador; Pimienta, Genaro; Ortiz, Ernesto; Possani, Lourival D; Licea-Navarro, Alexei F

2016-01-01

Small peptides isolated from the venom of the marine snails belonging to the genus Conus have been largely studied because of their therapeutic value. These peptides can be classified in two groups. The largest one is composed by peptides rich in disulfide bonds, and referred to as conotoxins. Despite the importance of conotoxins given their pharmacology value, little is known about the protein disulfide isomerase (PDI) enzymes that are required to catalyze their correct folding. To discover the PDIs that may participate in the folding and structural maturation of conotoxins, the transcriptomes of the venom duct of four different species of Conus from the peninsula of Baja California (Mexico) were assembled. Complementary DNA (cDNA) libraries were constructed for each species and sequenced using a Genome Analyzer Illumina platform. The raw RNA-seq data was converted into transcript sequences using Trinity, a de novo assembler that allows the grouping of reads into contigs without a reference genome. An N50 value of 605 was established as a reference for future assemblies of Conus transcriptomes using this software. Transdecoder was used to extract likely coding sequences from Trinity transcripts, and PDI-specific sequence motif "APWCGHCK" was used to capture potential PDIs. An in silico analysis was performed to characterize the group of PDI protein sequences encoded by the duct-transcriptome of each species. The computational approach entailed a structural homology characterization, based on the presence of functional Thioredoxin-like domains. Four different PDI families were characterized, which are constituted by a total of 41 different gene sequences. The sequences had an average of 65% identity with other PDIs. Using MODELLER 9.14, the homology-based three-dimensional structure prediction of a subset of the sequences reported, showed the expected thioredoxin fold which was confirmed by a "simulated annealing" method.
Genome-Wide Analyses of the Soybean F-Box Gene Family in Response to Salt Stress

PubMed Central

Jia, Qi; Xiao, Zhi-Xia; Wong, Fuk-Ling; Sun, Song; Liang, Kang-Jing; Lam, Hon-Ming

2017-01-01

The F-box family is one of the largest gene families in plants that regulate diverse life processes, including salt responses. However, the knowledge of the soybean F-box genes and their roles in salt tolerance remains limited. Here, we conducted a genome-wide survey of the soybean F-box family, and their expression analysis in response to salinity via in silico analysis of online RNA-sequencing (RNA-seq) data and quantitative reverse-transcription polymerase chain reaction (qRT-PCR) to predict their potential functions. A total of 725 potential F-box proteins encoded by 509 genes were identified and classified into 9 subfamilies. The gene structures, conserved domains and chromosomal distributions were characterized. There are 76 pairs of duplicate genes identified, including genome-wide segmental and tandem duplication events, which lead to the expansion of the number of F-box genes. The in silico expression analysis showed that these genes would be involved in diverse developmental functions and play an important role in salt response. Our qRT-PCR analysis confirmed 12 salt-responding F-box genes. Overall, our results provide useful information on soybean F-box genes, especially their potential roles in salt tolerance. PMID:28417911
Genome-Wide Analyses of the Soybean F-Box Gene Family in Response to Salt Stress.

PubMed

Jia, Qi; Xiao, Zhi-Xia; Wong, Fuk-Ling; Sun, Song; Liang, Kang-Jing; Lam, Hon-Ming

2017-04-12

The F-box family is one of the largest gene families in plants that regulate diverse life processes, including salt responses. However, the knowledge of the soybean F-box genes and their roles in salt tolerance remains limited. Here, we conducted a genome-wide survey of the soybean F-box family, and their expression analysis in response to salinity via in silico analysis of online RNA-sequencing (RNA-seq) data and quantitative reverse-transcription polymerase chain reaction (qRT-PCR) to predict their potential functions. A total of 725 potential F-box proteins encoded by 509 genes were identified and classified into 9 subfamilies. The gene structures, conserved domains and chromosomal distributions were characterized. There are 76 pairs of duplicate genes identified, including genome-wide segmental and tandem duplication events, which lead to the expansion of the number of F-box genes. The in silico expression analysis showed that these genes would be involved in diverse developmental functions and play an important role in salt response. Our qRT-PCR analysis confirmed 12 salt-responding F-box genes. Overall, our results provide useful information on soybean F-box genes, especially their potential roles in salt tolerance.
The Druggable Pocketome of Corynebacterium diphtheriae: A New Approach for in silico Putative Druggable Targets

PubMed Central

Hassan, Syed S.; Jamal, Syed B.; Radusky, Leandro G.; Tiwari, Sandeep; Ullah, Asad; Ali, Javed; Behramand; de Carvalho, Paulo V. S. D.; Shams, Rida; Khan, Sabir; Figueiredo, Henrique C. P.; Barh, Debmalya; Ghosh, Preetam; Silva, Artur; Baumbach, Jan; Röttger, Richard; Turjanski, Adrián G.; Azevedo, Vasco A. C.

2018-01-01

Diphtheria is an acute and highly infectious disease, previously regarded as endemic in nature but vaccine-preventable, is caused by Corynebacterium diphtheriae (Cd). In this work, we used an in silico approach along the 13 complete genome sequences of C. diphtheriae followed by a computational assessment of structural information of the binding sites to characterize the “pocketome druggability.” To this end, we first computed the “modelome” (3D structures of a complete genome) of a randomly selected reference strain Cd NCTC13129; that had 13,763 open reading frames (ORFs) and resulted in 1,253 (∼9%) structure models. The amino acid sequences of these modeled structures were compared with the remaining 12 genomes and consequently, 438 conserved protein sequences were obtained. The RCSB-PDB database was consulted to check the template structures for these conserved proteins and as a result, 401 adequate 3D models were obtained. We subsequently predicted the protein pockets for the obtained set of models and kept only the conserved pockets that had highly druggable (HD) values (137 across all strains). Later, an off-target host homology analyses was performed considering the human proteome using NCBI database. Furthermore, the gene essentiality analysis was carried out that gave a final set of 10-conserved targets possessing highly druggable protein pockets. To check the target identification robustness of the pipeline used in this work, we crosschecked the final target list with another in-house target identification approach for C. diphtheriae thereby obtaining three common targets, these were; hisE-phosphoribosyl-ATP pyrophosphatase, glpX-fructose 1,6-bisphosphatase II, and rpsH-30S ribosomal protein S8. Our predicted results suggest that the in silico approach used could potentially aid in experimental polypharmacological target determination in C. diphtheriae and other pathogens, thereby, might complement the existing and new drug-discovery pipelines. PMID:29487617
A mechanistic insight into the amyloidogenic structure of hIAPP peptide revealed from sequence analysis and molecular dynamics simulation.

PubMed

Chakraborty, Sandipan; Chatterjee, Barnali; Basu, Soumalee

2012-07-01

A collective approach of sequence analysis, phylogenetic tree and in silico prediction of amyloidogenecity using bioinformatics tools have been used to correlate the observed species-specific variations in IAPP sequences with the amyloid forming propensity. Observed substitution patterns indicate that probable changes in local hydrophobicity are instrumental in altering the aggregation propensity of the peptide. In particular, residues at 17th, 22nd and 23rd positions of the IAPP peptide are found to be crucial for amyloid formation. Proline25 primarily dictates the observed non-amyloidogenecity in rodents. Furthermore, extensive molecular dynamics simulation of 0.24 μs have been carried out with human IAPP (hIAPP) fragment 19-27, the portion showing maximum sequence variation across different species, to understand the native folding characteristic of this region. Principal component analysis in combination with free energy landscape analysis illustrates a four residue turn spanning from residue 22 to 25. The results provide a structural insight into the intramolecular β-sheet structure of amylin which probably is the template for nucleation of fibril formation and growth, a pathogenic feature of type II diabetes. Copyright © 2012 Elsevier B.V. All rights reserved.
Increasing Genome Sampling and Improving SNP Genotyping for Genotyping-by-Sequencing with New Combinations of Restriction Enzymes.

PubMed

Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo

2016-04-07

Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.
Analysis of B Cell Repertoire Dynamics Following Hepatitis B Vaccination in Humans, and Enrichment of Vaccine-specific Antibody Sequences.

PubMed

Galson, Jacob D; Trück, Johannes; Fowler, Anna; Clutterbuck, Elizabeth A; Münz, Márton; Cerundolo, Vincenzo; Reinhard, Claudia; van der Most, Robbert; Pollard, Andrew J; Lunter, Gerton; Kelly, Dominic F

2015-12-01

Generating a diverse B cell immunoglobulin repertoire is essential for protection against infection. The repertoire in humans can now be comprehensively measured by high-throughput sequencing. Using hepatitis B vaccination as a model, we determined how the total immunoglobulin sequence repertoire changes following antigen exposure in humans, and compared this to sequences from vaccine-specific sorted cells. Clonal sequence expansions were seen 7 days after vaccination, which correlated with vaccine-specific plasma cell numbers. These expansions caused an increase in mutation, and a decrease in diversity and complementarity-determining region 3 sequence length in the repertoire. We also saw an increase in sequence convergence between participants 14 and 21 days after vaccination, coinciding with an increase of vaccine-specific memory cells. These features allowed development of a model for in silico enrichment of vaccine-specific sequences from the total repertoire. Identifying antigen-specific sequences from total repertoire data could aid our understanding B cell driven immunity, and be used for disease diagnostics and vaccine evaluation.
Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

NASA Astrophysics Data System (ADS)

Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

2018-04-01

The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.
Should DNA sequence be incorporated with other taxonomical data for routine identifying of plant species?

PubMed

Suesatpanit, Tanakorn; Osathanunkul, Kitisak; Madesis, Panagiotis; Osathanunkul, Maslin

2017-08-31

A variety of plants in Acanthaceae have long been used in traditional Thai ailment and commercialised with significant economic value. Nowadays medicinal plants are sold in processed forms and thus morphological authentication is almost impossible. Full identification requires comparison of the specimen with some authoritative sources, such as a full and accurate description and verification of the species deposited in herbarium. Intake of wrong herbals can cause adverse effects. Identification of both raw materials and end products is therefore needed. Here, the potential of a DNA-based identification method, called Bar-HRM (DNA barcoding coupled with High Resolution Melting analysis), in raw material species identification is investigated. DNA barcode sequences from five regions (matK, rbcL, trnH-psbA spacer region, trnL and ITS2) of Acanthaceae species were retrieved for in silico analysis. Then the specific primer pairs were used in HRM assay to generate unique melting profiles for each plants species. The method allows identification of samples lacking necessary morphological parts. In silico analyses of all five selected regions suggested that ITS2 is the most suitable marker for Bar-HRM in this study. The HRM analysis on dried samples of 16 Acanthaceae medicinal species was then performed using primer pair derived from ITS2 region. 100% discrimination of the tested samples at both genus and species level was observed. However, two samples documented as Clinacanthus nutans and Clinacanthus siamensis were recognised as the same species from the HRM analysis. Further investigation reveals that C. siamensis is now accepted as C. nutans. The results here proved that Bar-HRM is a promising technique in species identification of the studied medicinal plants in Acanthaceae. In addition, molecular biological data is currently used in plant taxonomy and increasingly popular in recent years. Here, DNA barcode sequence data should be incorporated with morphological characters in the species identification.
In silico genomic insights into aspects of food safety and defense mechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated from brines of naturally fermented Aloreña green table olives

PubMed Central

Pérez Montoro, Beatriz; Casado Muñoz, María del Carmen; Knapp, Charles W.; Gálvez, Antonio; Benomar, Nabil

2017-01-01

Lactobacillus pentosus MP-10, isolated from brines of naturally fermented Aloreña green table olives, exhibited high probiotic potential. The genome sequence of L. pentosus MP-10 is currently considered the largest genome among lactobacilli, highlighting the microorganism’s ecological flexibility and adaptability. Here, we analyzed the complete genome sequence for the presence of acquired antibiotic resistance and virulence determinants to understand their defense mechanisms and explore its putative safety in food. The annotated genome sequence revealed evidence of diverse mobile genetic elements, such as prophages, transposases and transposons involved in their adaptation to brine-associated niches. In-silico analysis of L. pentosus MP-10 genome sequence identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) as an immune system against foreign genetic elements, which consisted of six arrays (4–12 repeats) and eleven predicted cas genes [CRISPR1 and CRISPR2 consisted of 3 (Type II-C) and 8 (Type I) genes] with high similarity to L. pentosus KCA1. Bioinformatic analyses revealed L. pentosus MP-10 to be absent of acquired antibiotic resistance genes, and most resistance genes were related to efflux mechanisms; no virulence determinants were found in the genome. This suggests that L. pentosus MP-10 could be considered safe and with high-adaptation potential, which could facilitate its application as a starter culture and probiotic in food preparations. PMID:28651019
Identification of a new hepatitis B virus recombinant D2/D3 in the city of São Paulo, Brazil.

PubMed

Santana, Luiz Claudio; Mantovani, Nathalia Pena; Ferreira, Maira Cicero; Arnold, Rafael; Duro, Rodrigo Lopes Sanz; Ferreira, Paulo Roberto Abrão; Hunter, James Richard; Leal, Élcio; Diaz, Ricardo Sobhie; Komninakis, Shirley Vasconcelos

2017-02-01

Two hundred forty million people are chronically infected with hepatitis B virus (HBV) worldwide. The rise of globalization has facilitated the emergence of novel HBV recombinants and genotypes. We evaluated HBV genotypes and recombinants, mutations associated with resistance to antivirals (AVs), progression of hepatic illness, and inefficient hepatitis B vaccination responses in chronically infected individuals in the city of São Paulo, Brazil. Forty-five full-length and 24 partial-length sequences were obtained. The genotype distribution was as follows: A (66.7%), D (15.9%), F (11.6%) and C (4.3%). We describe a new recombinant (D2/D3), confirmed through next-generation sequencing (NGS) and reconstruction of the quasispecies sequences in silico. Primary resistance and major vaccine escape mutations were not found. We did, however, find mutations in the S region that might may be related to HBV antigenicity changes, as well as Pre-S deletions. The precore/core mutations A1762T + G1764A (40.9%) were found mostly in genotypes A and D, and G1896A (29.55%) was more frequent in genotype D than in genotype A. The genotypic distribution reflects the history of Brazilian immigration. This is the first description of recombination between genotypes D2 and D3 in Brazil. It is also the first confirmation through NGS and reconstruction of the quasispecies in silico. However, little is known about the response to treatment of recombinants. This demonstrates the need for molecular epidemiology studies involving the analysis of full-length HBV sequences.
In silico identification and analysis of phytoene synthase genes in plants.

PubMed

Han, Y; Zheng, Q S; Wei, Y P; Chen, J; Liu, R; Wan, H J

2015-08-14

In this study, we examined phytoene synthetase (PSY), the first key limiting enzyme in the synthesis of carotenoids and catalyzing the formation of geranylgeranyl pyrophosphate in terpenoid biosynthesis. We used known amino acid sequences of the PSY gene in tomato plants to conduct a genome-wide search and identify putative candidates in 34 sequenced plants. A total of 101 homologous genes were identified. Phylogenetic analysis revealed that PSY evolved independently in algae as well as monocotyledonous and dicotyledonous plants. Our results showed that the amino acid structures exhibited 5 motifs (motifs 1 to 5) in algae and those in higher plants were highly conserved. The PSY gene structures showed that the number of intron in algae varied widely, while the number of introns in higher plants was 4 to 5. Identification of PSY genes in plants and the analysis of the gene structure may provide a theoretical basis for studying evolutionary relationships in future analyses.
Microsatellite analysis in the genome of Acanthaceae: An in silico approach

PubMed Central

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future. PMID:25709226

Complete genome sequence of probiotic Bacillus coagulans HM-08: A potential lactic acid producer.

PubMed

Yao, Guoqiang; Gao, Pengfei; Zhang, Wenyi

2016-06-20

Bacillus coagulans HM-08 is a commercialized probiotic strain in China. Its genome contains a 3.62Mb circular chromosome with an average GC content of 46.3%. In silico analysis revealed the presence of one xyl operon as well as several other genes that are correlated to xylose utilization. The genetic information provided here may help to expand its future biotechnology potential in lactic acid production. Copyright © 2016 Elsevier B.V. All rights reserved.
An RNAi in silico approach to find an optimal shRNA cocktail against HIV-1

PubMed Central

2010-01-01

Background HIV-1 can be inhibited by RNA interference in vitro through the expression of short hairpin RNAs (shRNAs) that target conserved genome sequences. In silico shRNA design for HIV has lacked a detailed study of virus variability constituting a possible breaking point in a clinical setting. We designed shRNAs against HIV-1 considering the variability observed in naïve and drug-resistant isolates available at public databases. Methods A Bioperl-based algorithm was developed to automatically scan multiple sequence alignments of HIV, while evaluating the possibility of identifying dominant and subdominant viral variants that could be used as efficient silencing molecules. Student t-test and Bonferroni Dunn correction test were used to assess statistical significance of our findings. Results Our in silico approach identified the most common viral variants within highly conserved genome regions, with a calculated free energy of ≥ -6.6 kcal/mol. This is crucial for strand loading to RISC complex and for a predicted silencing efficiency score, which could be used in combination for achieving over 90% silencing. Resistant and naïve isolate variability revealed that the most frequent shRNA per region targets a maximum of 85% of viral sequences. Adding more divergent sequences maintained this percentage. Specific sequence features that have been found to be related with higher silencing efficiency were hardly accomplished in conserved regions, even when lower entropy values correlated with better scores. We identified a conserved region among most HIV-1 genomes, which meets as many sequence features for efficient silencing. Conclusions HIV-1 variability is an obstacle to achieving absolute silencing using shRNAs designed against a consensus sequence, mainly because there are many functional viral variants. Our shRNA cocktail could be truly effective at silencing dominant and subdominant naïve viral variants. Additionally, resistant isolates might be targeted under specific antiretroviral selective pressure, but in both cases these should be tested exhaustively prior to clinical use. PMID:21172023
Definition and characterization of a "trypsinosome" from specific peptide characteristics by nano-HPLC-MS/MS and in silico analysis of complex protein mixtures.

PubMed

Le Bihan, Thierry; Robinson, Mark D; Stewart, Ian I; Figeys, Daniel

2004-01-01

Although HPLC-ESI-MS/MS is rapidly becoming an indispensable tool for the analysis of peptides in complex mixtures, the sequence coverage it affords is often quite poor. Low protein expression resulting in peptide signal intensities that fall below the limit of detection of the MS system in combination with differences in peptide ionization efficiency plays a significant role in this. A second important factor stems from differences in physicochemical properties of each peptide and how these properties relate to chromatographic retention and ultimate detection. To identify and understand those properties, we compared data from experimentally identified peptides with data from peptides predicted by in silico digest of all corresponding proteins in the experimental set. Three different complex protein mixtures extracted were used to define a training set to evaluate the amino acid retention coefficients based on linear regression analysis. The retention coefficients were also compared with other previous hydrophobic and retention scale. From this, we have constructed an empirical model that can be readily used to predict peptides that are likely to be observed on our HPLC-ESI-MS/MS system based on their physicochemical properties. Finally, we demonstrated that in silico prediction of peptides and their retention coefficients can be used to generate an inclusion list for a targeted mass spectrometric identification of low abundance proteins in complex protein samples. This approach is based on experimentally derived data to calibrate the method and therefore may theoretically be applied to any HPLC-MS/MS system on which data are being generated.
In silico cloning, expression of Rieske-like apoprotein gene and protein subcellular localization in the Pacific oyster, Crassostrea gigas.

PubMed

He, Xiaocui; Zhang, Yang; Yu, Ziniu

2010-10-01

Rieske protein gene in the Pacific oyster Crassostrea gigas was obtained by in silico cloning for the first time, and its expression profiles and subcellular localization were determined, respectively. The full-length cDNA of Cgisp is 985 bp in length and contains a 5'- and 3'-untranslated regions of 35 and 161 bp, respectively, with an open reading frame of 786 bp encoding a protein of 262 amino acids. The predicted molecular weight of 30 kDa of Cgisp protein was verified by prokaryotic expression. Conserved Rieske [2Fe-2S] cluster binding sites and highly matched-pair tertiary structure with 3CWB_E (Gallus gallus) were revealed by homologous analysis and molecular modeling. Eleven putative SNP sites and two conserved hexapeptide sequences, box I (THLGC) and II (PCHGS), were detected by multiple alignments. Real-time PCR analysis showed that Cgisp is expressed in a wide range of tissues, with adductor muscle exhibiting the top expression level, suggesting its biological function of energy transduction. The GFP tagging Cgisp indicated a mitochondrial localization, further confirming its physiological function.
Mutations in the LHX2 gene are not a frequent cause of micro/anophthalmia

PubMed Central

Desmaison, Annaïck; Vigouroux, Adeline; Rieubland, Claudine; Peres, Christine; Calvas, Patrick

2010-01-01

Purpose Microphthalmia and anophthalmia are at the severe end of the spectrum of abnormalities in ocular development. A few genes (orthodenticle homeobox 2 [OTX2], retina and anterior neural fold homeobox [RAX], SRY-box 2 [SOX2], CEH10 homeodomain-containing homolog [CHX10], and growth differentiation factor 6 [GDF6]) have been implicated mainly in isolated micro/anophthalmia but causative mutations of these genes explain less than a quarter of these developmental defects. The essential role of the LIM homeobox 2 (LHX2) transcription factor in early eye development has recently been documented. We postulated that mutations in this gene could lead to micro/anophthalmia, and thus performed molecular screening of its sequence in patients having micro/anophthalmia. Methods Seventy patients having non-syndromic forms of colobomatous microphthalmia (n=25), isolated microphthalmia (n=18), or anophthalmia (n=17), and syndromic forms of micro/anophthalmia (n=10) were included in this study after negative molecular screening for OTX2, RAX, SOX2, and CHX10 mutations. Mutation screening of LHX2 was performed by direct sequencing of the coding sequences and intron/exon boundaries. Results Two heterozygous variants of unknown significance (c.128C>G [p.Pro43Arg]; c.776C>A [p.Pro259Gln]) were identified in LHX2 among the 70 patients. These variations were not identified in a panel of 100 control patients of mixed origins. The variation c.776C>A (p.Pro259Gln) was considered as non pathogenic by in silico analysis, while the variation c.128C>G (p.Pro43Arg) considered as deleterious by in silico analysis and was inherited from the asymptomatic father. Conclusions Mutations in LHX2 do not represent a frequent cause of micro/anophthalmia. PMID:21203406
Mutations in the LHX2 gene are not a frequent cause of micro/anophthalmia.

PubMed

Desmaison, Annaïck; Vigouroux, Adeline; Rieubland, Claudine; Peres, Christine; Calvas, Patrick; Chassaing, Nicolas

2010-12-18

Microphthalmia and anophthalmia are at the severe end of the spectrum of abnormalities in ocular development. A few genes (orthodenticle homeobox 2 [OTX2], retina and anterior neural fold homeobox [RAX], SRY-box 2 [SOX2], CEH10 homeodomain-containing homolog [CHX10], and growth differentiation factor 6 [GDF6]) have been implicated mainly in isolated micro/anophthalmia but causative mutations of these genes explain less than a quarter of these developmental defects. The essential role of the LIM homeobox 2 (LHX2) transcription factor in early eye development has recently been documented. We postulated that mutations in this gene could lead to micro/anophthalmia, and thus performed molecular screening of its sequence in patients having micro/anophthalmia. Seventy patients having non-syndromic forms of colobomatous microphthalmia (n=25), isolated microphthalmia (n=18), or anophthalmia (n=17), and syndromic forms of micro/anophthalmia (n=10) were included in this study after negative molecular screening for OTX2, RAX, SOX2, and CHX10 mutations. Mutation screening of LHX2 was performed by direct sequencing of the coding sequences and intron/exon boundaries. Two heterozygous variants of unknown significance (c.128C>G [p.Pro43Arg]; c.776C>A [p.Pro259Gln]) were identified in LHX2 among the 70 patients. These variations were not identified in a panel of 100 control patients of mixed origins. The variation c.776C>A (p.Pro259Gln) was considered as non pathogenic by in silico analysis, while the variation c.128C>G (p.Pro43Arg) considered as deleterious by in silico analysis and was inherited from the asymptomatic father. Mutations in LHX2 do not represent a frequent cause of micro/anophthalmia.
In-silico and in-vivo analyses of EST databases unveil conserved miRNAs from Carthamus tinctorius and Cynara cardunculus

PubMed Central

2012-01-01

Background MicroRNAs (miRNAs) are small RNAs (21-24 bp) providing an RNA-based system of gene regulation highly conserved in plants and animals. In plants, miRNAs control mRNA degradation or restrain translation, affecting development and responses to stresses. Plant miRNAs show imperfect but extensive complementarity to mRNA targets, making their computational prediction possible, useful when data mining is applied on different species. In this study we used a comparative approach to identify both miRNAs and their targets, in artichoke and safflower. Results Two complete expressed sequence tags (ESTs) datasets from artichoke (3.6·104 entries) and safflower (4.2·104), were analysed with a bioinformatic pipeline and in vitro experiments, identifying 17 potential miRNAs. For each EST, using RNAhybrid program and 953 non redundant miRNA mature sequences, available in mirBase as reference, we searched matching putative targets. 8730 out of 42011 ESTs from safflower and 7145 of 36323 ESTs from artichoke showed at least one predicted miRNA target. BLAST analysis showed that 75% of all ESTs shared at least a common homologous region (E-value < 10-4) and about 50% of these displayed 400 bp or longer aligned sequences as conserved homologous/orthologous (COS) regions. 960 and 890 ESTs of safflower and artichoke organized in COS shared 79 different miRNA targets, considered functionally conserved, and statistically significant when compared with random sequences (signal to noise ratio > 2 and specificity ≥ 0.85). Four highly significant miRNAs selected from in silico data were experimentally validated in globe artichoke leaves. Conclusions Mature miRNAs and targets were predicted within EST sequences of safflower and artichoke. Most of the miRNA targets appeared highly/moderately conserved, highlighting an important and conserved function. In this study we introduce a stringent parameter for the comparative sequence analysis, represented by the identification of the same target in the COS region. After statistical analysis 79 targets, found on the COS regions and belonging to 60 miRNA families, have a signal to noise ratio > 2, with ≥ 0.85 specificity. The putative miRNAs identified belong to 55 dicotyledon plants and to 24 families only in monocotyledon. PMID:22536958
In silico comparison of genomic regions containing genes coding for enzymes and transcription factors for the phenylpropanoid pathway in Phaseolus vulgaris L. and Glycine max L. Merr

PubMed Central

Reinprecht, Yarmilla; Yadegari, Zeinab; Perry, Gregory E.; Siddiqua, Mahbuba; Wright, Lori C.; McClean, Phillip E.; Pauls, K. Peter

2013-01-01

Legumes contain a variety of phytochemicals derived from the phenylpropanoid pathway that have important effects on human health as well as seed coat color, plant disease resistance and nodulation. However, the information about the genes involved in this important pathway is fragmentary in common bean (Phaseolus vulgaris L.). The objectives of this research were to isolate genes that function in and control the phenylpropanoid pathway in common bean, determine their genomic locations in silico in common bean and soybean, and analyze sequences of the 4CL gene family in two common bean genotypes. Sequences of phenylpropanoid pathway genes available for common bean or other plant species were aligned, and the conserved regions were used to design sequence-specific primers. The PCR products were cloned and sequenced and the gene sequences along with common bean gene-based (g) markers were BLASTed against the Glycine max v.1.0 genome and the P. vulgaris v.1.0 (Andean) early release genome. In addition, gene sequences were BLASTed against the OAC Rex (Mesoamerican) genome sequence assembly. In total, fragments of 46 structural and regulatory phenylpropanoid pathway genes were characterized in this way and placed in silico on common bean and soybean sequence maps. The maps contain over 250 common bean g and SSR (simple sequence repeat) markers and identify the positions of more than 60 additional phenylpropanoid pathway gene sequences, plus the putative locations of seed coat color genes. The majority of cloned phenylpropanoid pathway gene sequences were mapped to one location in the common bean genome but had two positions in soybean. The comparison of the genomic maps confirmed previous studies, which show that common bean and soybean share genomic regions, including those containing phenylpropanoid pathway gene sequences, with conserved synteny. Indels identified in the comparison of Andean and Mesoamerican common bean 4CL gene sequences might be used to develop inter-pool phenylpropanoid pathway gene-based markers. We anticipate that the information obtained by this study will simplify and accelerate selections of common bean with specific phenylpropanoid pathway alleles to increase the contents of beneficial phenylpropanoids in common bean and other legumes. PMID:24046770
The antioxidant property of chitosan green tea polyphenols complex induces transglutaminase activation in wound healing.

PubMed

Qin, Yao; Guo, Xing Wei; Li, Lei; Wang, Hong Wei; Kim, Wook

2013-06-01

The present study examined, for the first time, the in vitro wound healing potential of chitosan green tea polyphenols (CGP) complex based on the activation of transglutaminase (TGM) genes in epidermal morphogenesis. Response surface methodology was applied to determine the optimal processing condition that gave maximum extraction of green tea polyphenols. The antioxidant activity, scavenging ability, and chelating ability were studied and expressed as average EC50 values of CGP and other treatments. In silico analysis and gene coexpression network was subjected to the TGM sequences analysis. The temporal expressions of TGMs were profiled by semi-quantitative reverse transcription (RT)-PCR technology within 10 days after wounding and 2 days postwounding. CGP showed the effectiveness of antioxidant properties, and the observations of histopathological photography showed advanced tissue granulation and epithelialization formation by CGP treatment. In silico and coexpression analysis confirmed the regulation via TGM gene family in dermatological tissues. RT-PCR demonstrated increased levels of TGM1-3 expression induced by CGP treatment. The efficacy of CGP in wound healing based on these results may be ascribed to its antioxidant properties and activation of the expression of TGMs, and is, thus, essential for the facilitated repair of skin injury.
Lack of Detectable Allergenicity in Genetically Modified Maize Containing “Cry” Proteins as Compared to Native Maize Based on In Silico & In Vitro Analysis

PubMed Central

Mathur, Chandni; Kathuria, Pooran C.; Dahiya, Pushpa; Singh, Anand B.

2015-01-01

Background Genetically modified, (GM) crops with potential allergens must be evaluated for safety and endogenous IgE binding pattern compared to native variety, prior to market release. Objective To compare endogenous IgE binding proteins of three GM maize seeds containing Cry 1Ab,1Ac,1C transgenic proteins with non GM maize. Methods An integrated approach of in silico & in vitro methods was employed. Cry proteins were tested for presence of allergen sequence by FASTA in allergen databases. Biochemical assays for maize extracts were performed. Specific IgE (sIgE) and Immunoblot using food sensitized patients sera (n = 39) to non GM and GM maize antigens was performed. Results In silico approaches, confirmed for non sequence similarity of stated transgenic proteins in allergen databases. An insignificant (p> 0.05) variation in protein content between GM and non GM maize was observed. Simulated Gastric Fluid (SGF) revealed reduced number of stable protein fractions in GM then non GM maize which might be due to shift of constituent protein expression. Specific IgE values from patients showed insignificant difference in non GM and GM maize extracts. Five maize sensitized cases, recognized same 7 protein fractions of 88-28 kD as IgE bindng in both GM and non-GM maize, signifying absence of variation. Four of the reported IgE binding proteins were also found to be stable by SGF. Conclusion Cry proteins did not indicate any significant similarity of >35% in allergen databases. Immunoassays also did not identify appreciable differences in endogenous IgE binding in GM and non GM maize. PMID:25706412
Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

NASA Astrophysics Data System (ADS)

Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

2017-02-01

Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

PubMed Central

Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

2014-01-01

Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064
Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

PubMed

Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

2014-09-01

Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.
Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

PubMed

Huang, Ying; Chen, Shi-Yi; Deng, Feilong

2016-01-01

In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Designing and conducting in silico analysis for identifying of Echinococcus spp. with discrimination of novel haplotypes: an approach to better understanding of parasite taxonomic.

PubMed

Spotin, Adel; Gholami, Shirzad; Nasab, Abbas Najafi; Fallah, Esmaeil; Oskouei, Mahmoud Mahami; Semnani, Vahid; Shariatzadeh, Seyyed Ali; Shahbazi, Abbas

2015-04-01

The definitive identification of Echinococcus species is currently carried out by sequencing and phylogenetic strategies. However, the application of polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) patterns is not broadly used as a result of heterogeneity traits of Echinococcus genome in different regions of the world. Therefore, designing and conducting a standardized pattern should indigenously be considered in under-studied areas. In this investigation, an in silico mapping was designed and developed for eight Echinococcus spp. on the basis of regional sequences in Iran and the world. The numbers of 60 Echinococcus isolates were collected from the liver and lungs of 15 human, 15 sheep, 15 cattle, and 15 camel cases in Semnan province, Central Iran. DNA samples were extracted and examined by polymerase chain reaction of ribosomal DNA (rDNA) internal transcribed spacer 1 (ITS1) and PCR-RFLP via Rsa1 endonuclease enzyme. Moreover, 15 amplicons of cytochrome oxidase 1 (Cox1) were directly sequenced in order to identify the strains/haplotypes. PCR-RFLP and phylogenetic analyses revealed firmly the presence of the G1 and G6 genotypes with heterogeneity (three novel haplotypes) of Cox1 gene although no other expected genotypes were found in the region. Finding shows that the identification of novel haplotypes along with discrimination of Echinococcus spp. through regional patterns can unambiguously illustrate the real taxonomic status of parasite in Central Iran.
Mass spectrometry analysis and in silico prediction of allergenicity of peptides in tryptic hydrolysates of the proteins from Ruditapes philippinarum.

PubMed

Yu, Yue; Liu, Hongwei; Tu, Maolin; Qiao, Meiling; Wang, Zhenyu; Du, Ming

2017-12-01

Ruditapes philippinarum is nutrient-rich and widely-distributed, but little attention has been paid to the identification and characterization of the bioactive peptides in the bivalve. In the present study, we evaluated the peptides of the R. philippinarum that were enzymolysised by trypsin using a combination of ultra-performance liquid chromatography separation and electrospray ionization quadrupole time-of-flight tandem mass spectrometry, followed by data processing and sequence-similarity database searching. The potential allergenicity of the peptides was assessed in silico. The enzymolysis was performed under the conditions: E:S 3:100 (w/w), pH 9.0, 45 °C for 4 h. After separation and detection, the Swiss-Prot database and a Ruditapes philippinarum sequence database were used: 966 unique peptides were identified by non-error tolerant database searching; 173 peptides matching 55 precursor proteins comprised highly conserved cytoskeleton proteins. The remaining 793 peptides were identified from the R. philippinarum sequence database. The results showed that 510 peptides were labeled as allergens and 31 peptides were potential allergens; 425 peptides were predicted to be nonallergenic. The abundant peptide information contributes to further investigations of the structure and potential function of R. philippinarum. Additional in vitro studies are required to demonstrate and ensure the correct production of the hydrolysates for use in the food industry with respect to R. philippinarum. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Unique CD44 intronic SNP is associated with tumor grade in breast cancer: a case control study and in silico analysis.

PubMed

Esmaeili, Rezvan; Abdoli, Nasrin; Yadegari, Fatemeh; Neishaboury, Mohamadreza; Farahmand, Leila; Kaviani, Ahmad; Majidzadeh-A, Keivan

2018-01-01

CD44 encoded by a single gene is a cell surface transmembrane glycoprotein. Exon 2 is one of the important exons to bind CD44 protein to hyaluronan. Experimental evidences show that hyaluronan-CD44 interaction intensifies the proliferation, migration, and invasion of breast cancer cells. Therefore, the current study aimed at investigating the association between specific polymorphisms in exon 2 and its flanking region of CD44 with predisposition to breast cancer. In the current study, 175 Iranian female patients with breast cancer and 175 age-matched healthy controls were recruited in biobank, Breast Cancer Research Center, Tehran, Iran. Single nucleotide polymorphisms of CD44 exon 2 and its flanking were analyzed via polymerase chain reaction and gene sequencing techniques. Association between the observed variation with breast cancer risk and clinico-pathological characteristics were studied. Subsequently, bioinformatics analysis was conducted to predict potential exonic splicing enhancer (ESE) motifs changed as the result of a mutation. A unique polymorphism of the gene encoding CD44 was identified at position 14 nucleotide upstream of exon 2 (A37692→G) by the sequencing method. The A > G polymorphism exhibited a significant association with higher-grades of breast cancer, although no significant relation was found between this polymorphism and breast cancer risk. Finally, computational analysis revealed that the intronic mutation generated a new consensus-binding motif for the splicing factor, SC35, within intron 1. The current study results indicated that A > G polymorphism was associated with breast cancer development; in addition, in silico analysis with ESE finder prediction software showed that the change created a new SC35 binding site.
In-silico Taxonomic Classification of 373 Genomes Reveals Species Misidentification and New Genospecies within the Genus Pseudomonas.

PubMed

Tran, Phuong N; Savka, Michael A; Gan, Han Ming

2017-01-01

The genus Pseudomonas has one of the largest diversity of species within the Bacteria kingdom. To date, its taxonomy is still being revised and updated. Due to the non-standardized procedure and ambiguous thresholds at species level, largely based on 16S rRNA gene or conventional biochemical assay, species identification of publicly available Pseudomonas genomes remains questionable. In this study, we performed a large-scale analysis of all Pseudomonas genomes with species designation (excluding the well-defined P. aeruginosa ) and re-evaluated their taxonomic assignment via in silico genome-genome hybridization and/or genetic comparison with valid type species. Three-hundred and seventy-three pseudomonad genomes were analyzed and subsequently clustered into 145 distinct genospecies. We detected 207 erroneous labels and corrected 43 to the proper species based on Average Nucleotide Identity Multilocus Sequence Typing (MLST) sequence similarity to the type strain. Surprisingly, more than half of the genomes initially designated as Pseudomonas syringae and Pseudomonas fluorescens should be classified either to a previously described species or to a new genospecies. Notably, high pairwise average nucleotide identity (>95%) indicating species-level similarity was observed between P. synxantha-P. libanensis, P. psychrotolerans - P. oryzihabitans , and P. kilonensis- P. brassicacearum , that were previously differentiated based on conventional biochemical tests and/or genome-genome hybridization techniques.
Proteasix: a tool for automated and large-scale prediction of proteases involved in naturally occurring peptide generation.

PubMed

Klein, Julie; Eales, James; Zürbig, Petra; Vlahou, Antonia; Mischak, Harald; Stevens, Robert

2013-04-01

In this study, we have developed Proteasix, an open-source peptide-centric tool that can be used to predict in silico the proteases involved in naturally occurring peptide generation. We developed a curated cleavage site (CS) database, containing 3500 entries about human protease/CS combinations. On top of this database, we built a tool, Proteasix, which allows CS retrieval and protease associations from a list of peptides. To establish the proof of concept of the approach, we used a list of 1388 peptides identified from human urine samples, and compared the prediction to the analysis of 1003 randomly generated amino acid sequences. Metalloprotease activity was predominantly involved in urinary peptide generation, and more particularly to peptides associated with extracellular matrix remodelling, compared to proteins from other origins. In comparison, random sequences returned almost no results, highlighting the specificity of the prediction. This study provides a tool that can facilitate linking of identified protein fragments to predicted protease activity, and therefore into presumed mechanisms of disease. Experiments are needed to confirm the in silico hypotheses; nevertheless, this approach may be of great help to better understand molecular mechanisms of disease, and define new biomarkers, and therapeutic targets. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mild Zellweger syndrome due to a novel PEX6 mutation: correlation between clinical phenotype and in silico prediction of variant pathogenicity.

PubMed

Rydzanicz, Małgorzata; Stradomska, Teresa Joanna; Jurkiewicz, Elżbieta; Jamroz, Ewa; Gasperowicz, Piotr; Kostrzewa, Grażyna; Płoski, Rafał; Tylki-Szymańska, Anna

2017-11-01

Zellweger syndrome (ZS) is a consequence of a peroxisome biogenesis disorder (PBD) caused by the presence of a pathogenic mutation in one of the 13 genes from the PEX family. ZS is a severe multisystem condition characterized by neonatal appearance of symptoms and a shorter life. Here, we report a case of ZS with a mild phenotype, due to a novel PEX6 gene mutation. The patient presented subtle craniofacial dysmorphic features and slightly slower psychomotor development. At the age of 2 years, he was diagnosed with adrenal insufficiency, hypoacusis, and general deterioration. Magnetic resonance imaging showed a symmetrical hyperintense signal in the frontal and parietal white matter. Biochemical tests showed elevated liver transaminases, elevated serum very long chain fatty acids, and phytanic acid. After the death of the child at the age of 6 years, molecular diagnostics were continued in order to provide genetic counseling for his parents. Next generation sequencing (NGS) analysis with the TruSight One™ Sequencing Panel revealed a novel homozygous PEX6 p.Ala94Pro mutation. In silico prediction of variant severity suggested its possible benign effect. To conclude, in the milder phenotypes, adrenal insufficiency, hypoacusis, and leukodystrophy together seem to be pathognomonic for ZS.

Isolation and characterization of the promoter sequence of a cassava gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in storage roots.

PubMed

de Souza, C R; Aragão, F J; Moreira, E C O; Costa, C N M; Nascimento, S B; Carvalho, L J

2009-03-24

Cassava is one of the most important tropical food crops for more than 600 million people worldwide. Transgenic technologies can be useful for increasing its nutritional value and its resistance to viral diseases and insect pests. However, tissue-specific promoters that guarantee correct expression of transgenes would be necessary. We used inverse polymerase chain reaction to isolate a promoter sequence of the Mec1 gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in cassava storage roots. In silico analysis revealed putative cis-acting regulatory elements within this promoter sequence, including root-specific elements that may be required for its expression in vascular tissues. Transient expression experiments showed that the Mec1 promoter is functional, since this sequence was able to drive GUS expression in bean embryonic axes. Results from our computational analysis can serve as a guide for functional experiments to identify regions with tissue-specific Mec1 promoter activity. The DNA sequence that we identified is a new promoter that could be a candidate for genetic engineering of cassava roots.
Deciphering the molecular mechanisms underlying the binding of the TWIST1/E12 complex to regulatory E-box sequences

PubMed Central

Bouard, Charlotte; Terreux, Raphael; Honorat, Mylène; Manship, Brigitte; Ansieau, Stéphane; Vigneron, Arnaud M.; Puisieux, Alain; Payen, Léa

2016-01-01

Abstract The TWIST1 bHLH transcription factor controls embryonic development and cancer processes. Although molecular and genetic analyses have provided a wealth of data on the role of bHLH transcription factors, very little is known on the molecular mechanisms underlying their binding affinity to the E-box sequence of the promoter. Here, we used an in silico model of the TWIST1/E12 (TE) heterocomplex and performed molecular dynamics (MD) simulations of its binding to specific (TE-box) and modified E-box sequences. We focused on (i) active E-box and inactive E-box sequences, on (ii) modified active E-box sequences, as well as on (iii) two box sequences with modified adjacent bases the AT- and TA-boxes. Our in silico models were supported by functional in vitro binding assays. This exploration highlighted the predominant role of protein side-chain residues, close to the heart of the complex, at anchoring the dimer to DNA sequences, and unveiled a shift towards adjacent ((-1) and (-1*)) bases and conserved bases of modified E-box sequences. In conclusion, our study provides proof of the predictive value of these MD simulations, which may contribute to the characterization of specific inhibitors by docking approaches, and their use in pharmacological therapies by blocking the tumoral TWIST1/E12 function in cancers. PMID:27151200
In Vitro vs In Silico Detected SNPs for the Development of a Genotyping Array: What Can We Learn from a Non-Model Species?

PubMed Central

Lepoittevin, Camille; Frigerio, Jean-Marc; Garnier-Géré, Pauline; Salin, Franck; Cervera, María-Teresa; Vornam, Barbara; Harvengt, Luc; Plomion, Christophe

2010-01-01

Background There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C). Methodology/Principal Findings A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates). Conclusions/Significance This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome. PMID:20543950
Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages.

PubMed

Taminau, Jonatan; Meganck, Stijn; Lazar, Cosmin; Steenhoff, David; Coletta, Alain; Molter, Colin; Duque, Robin; de Schaetzen, Virginie; Weiss Solís, David Y; Bersini, Hugues; Nowé, Ann

2012-12-24

With an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck. We present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well. By providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/].
Medium-sized tandem repeats represent an abundant component of the Drosophila virilis genome.

PubMed

Abdurashitov, Murat A; Gonchar, Danila A; Chernukhin, Valery A; Tomilov, Victor N; Tomilova, Julia E; Schostak, Natalia G; Zatsepina, Olga G; Zelentsova, Elena S; Evgen'ev, Michael B; Degtyarev, Sergey K H

2013-11-09

Previously, we developed a simple method for carrying out a restriction enzyme analysis of eukaryotic DNA in silico, based on the known DNA sequences of the genomes. This method allows the user to calculate lengths of all DNA fragments that are formed after a whole genome is digested at the theoretical recognition sites of a given restriction enzyme. A comparison of the observed peaks in distribution diagrams with the results from DNA cleavage using several restriction enzymes performed in vitro have shown good correspondence between the theoretical and experimental data in several cases. Here, we applied this approach to the annotated genome of Drosophila virilis which is extremely rich in various repeats. Here we explored the combined approach to perform the restriction analysis of D. virilis DNA. This approach enabled to reveal three abundant medium-sized tandem repeats within the D. virilis genome. While the 225 bp repeats were revealed previously in intergenic non-transcribed spacers between ribosomal genes of D. virilis, two other families comprised of 154 bp and 172 bp repeats were not described. Tandem Repeats Finder search demonstrated that 154 bp and 172 bp units are organized in multiple clusters in the genome of D. virilis. Characteristically, only 154 bp repeats derived from Helitron transposon are transcribed. Using in silico digestion in combination with conventional restriction analysis and sequencing of repeated DNA fragments enabled us to isolate and characterize three highly abundant families of medium-sized repeats present in the D. virilis genome. These repeats comprise a significant portion of the genome and may have important roles in genome function and structural integrity. Therefore, we demonstrated an approach which makes possible to investigate in detail the gross arrangement and expression of medium-sized repeats basing on sequencing data even in the case of incompletely assembled and/or annotated genomes.
Mining of Microbial Genomes for the Novel Sources of Nitrilases.

PubMed

Sharma, Nikhil; Thakur, Neerja; Raj, Tilak; Savitri; Bhalla, Tek Chand

2017-01-01

Next-generation DNA sequencing (NGS) has made it feasible to sequence large number of microbial genomes and advancements in computational biology have opened enormous opportunities to mine genome sequence data for novel genes and enzymes or their sources. In the present communication in silico mining of microbial genomes has been carried out to find novel sources of nitrilases. The sequences selected were analyzed for homology and considered for designing motifs. The manually designed motifs based on amino acid sequences of nitrilases were used to screen 2000 microbial genomes (translated to proteomes). This resulted in identification of one hundred thirty-eight putative/hypothetical sequences which could potentially code for nitrilase activity. In vitro validation of nine predicted sources of nitrilases was done for nitrile/cyanide hydrolyzing activity. Out of nine predicted nitrilases, Gluconacetobacter diazotrophicus , Sphingopyxis alaskensis , Saccharomonospora viridis , and Shimwellia blattae were specific for aliphatic nitriles, whereas nitrilases from Geodermatophilus obscurus , Nocardiopsis dassonvillei , Runella slithyformis , and Streptomyces albus possessed activity for aromatic nitriles. Flavobacterium indicum was specific towards potassium cyanide (KCN) which revealed the presence of nitrilase homolog, that is, cyanide dihydratase with no activity for either aliphatic, aromatic, or aryl nitriles. The present study reports the novel sources of nitrilases and cyanide dihydratase which were not reported hitherto by in silico or in vitro studies.
GeneSilico protein structure prediction meta-server.

PubMed

Kurowski, Michal A; Bujnicki, Janusz M

2003-07-01

Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server

PubMed Central

Kurowski, Michal A.; Bujnicki, Janusz M.

2003-01-01

Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
A Sequence in the loop domain of hepatitis C virus E2 protein identified in silico as crucial for the selective binding to human CD81

PubMed Central

Chang, Chun-Chun; Hsu, Hao-Jen; Yen, Jui-Hung; Lo, Shih-Yen

2017-01-01

Hepatitis C virus (HCV) is a species-specific pathogenic virus that infects only humans and chimpanzees. Previous studies have indicated that interactions between the HCV E2 protein and CD81 on host cells are required for HCV infection. To determine the crucial factors for species-specific interactions at the molecular level, this study employed in silico molecular docking involving molecular dynamic simulations of the binding of HCV E2 onto human and rat CD81s. In vitro experiments including surface plasmon resonance measurements and cellular binding assays were applied for simple validations of the in silico results. The in silico studies identified two binding regions on the HCV E2 loop domain, namely E2-site1 and E2-site2, as being crucial for the interactions with CD81s, with the E2-site2 as the determinant factor for human-specific binding. Free energy calculations indicated that the E2/CD81 binding process might follow a two-step model involving (i) the electrostatic interaction-driven initial binding of human-specific E2-site2, followed by (ii) changes in the E2 orientation to facilitate the hydrophobic and van der Waals interaction-driven binding of E2-site1. The sequence of the human-specific, stronger-binding E2-site2 could serve as a candidate template for the future development of HCV-inhibiting peptide drugs. PMID:28481946
Computational prediction and biochemical characterization of novel RNA aptamers to Rift Valley fever virus nucleocapsid protein.

PubMed

Ellenbecker, Mary; St Goddard, Jeremy; Sundet, Alec; Lanchy, Jean-Marc; Raiford, Douglas; Lodmell, J Stephen

2015-10-01

Rift Valley fever virus (RVFV) is a potent human and livestock pathogen endemic to sub-Saharan Africa and the Arabian Peninsula that has potential to spread to other parts of the world. Although there is no proven effective and safe treatment for RVFV infections, a potential therapeutic target is the virally encoded nucleocapsid protein (N). During the course of infection, N binds to viral RNA, and perturbation of this interaction can inhibit viral replication. To gain insight into how N recognizes viral RNA specifically, we designed an algorithm that uses a distance matrix and multidimensional scaling to compare the predicted secondary structures of known N-binding RNAs, or aptamers, that were isolated and characterized in previous in vitro evolution experiment. These aptamers did not exhibit overt sequence or predicted structure similarity, so we employed bioinformatic methods to propose novel aptamers based on analysis and clustering of secondary structures. We screened and scored the predicted secondary structures of novel randomly generated RNA sequences in silico and selected several of these putative N-binding RNAs whose secondary structures were similar to those of known N-binding RNAs. We found that overall the in silico generated RNA sequences bound well to N in vitro. Furthermore, introduction of these RNAs into cells prior to infection with RVFV inhibited viral replication in cell culture. This proof of concept study demonstrates how the predictive power of bioinformatics and the empirical power of biochemistry can be jointly harnessed to discover, synthesize, and test new RNA sequences that bind tightly to RVFV N protein. The approach would be easily generalizable to other applications. Copyright © 2015 Elsevier Ltd. All rights reserved.
Beyond an AFLP genome scan towards the identification of immune genes involved in plague resistance in Rattus rattus from Madagascar.

PubMed

Tollenaere, C; Jacquet, S; Ivanova, S; Loiseau, A; Duplantier, J-M; Streiff, R; Brouat, C

2013-01-01

Genome scans using amplified fragment length polymorphism (AFLP) markers became popular in nonmodel species within the last 10 years, but few studies have tried to characterize the anonymous outliers identified. This study follows on from an AFLP genome scan in the black rat (Rattus rattus), the reservoir of plague (Yersinia pestis infection) in Madagascar. We successfully sequenced 17 of the 22 markers previously shown to be potentially affected by plague-mediated selection and associated with a plague resistance phenotype. Searching these sequences in the genome of the closely related species Rattus norvegicus assigned them to 14 genomic regions, revealing a random distribution of outliers in the genome (no clustering). We compared these results with those of an in silico AFLP study of the R. norvegicus genome, which showed that outlier sequences could not have been inferred by this method in R. rattus (only four of the 15 sequences were predicted). However, in silico analysis allowed the prediction of AFLP markers distribution and the estimation of homoplasy rates, confirming its potential utility for designing AFLP studies in nonmodel species. The 14 genomic regions surrounding AFLP outliers (less than 300 kb from the marker) contained 75 genes encoding proteins of known function, including nine involved in immune function and pathogen defence. We identified the two interleukin 1 genes (Il1a and Il1b) that share homology with an antigen of Y. pestis, as the best candidates for genes subject to plague-mediated natural selection. At least six other genes known to be involved in proinflammatory pathways may also be affected by plague-mediated selection. © 2012 Blackwell Publishing Ltd.
Chloroplast microsatellite markers for Artocarpus (Moraceae) developed from transcriptome sequences1

PubMed Central

Gardner, Elliot M.; Laricchia, Kristen M.; Murphy, Matthew; Ragone, Diane; Scheffler, Brian E.; Simpson, Sheron; Williams, Evelyn W.; Zerega, Nyree J. C.

2015-01-01

Premise of the study: Chloroplast microsatellite loci were characterized from transcriptomes of Artocarpus altilis (breadfruit) and A. camansi (breadnut). They were tested in A. odoratissimus (terap) and A. altilis and evaluated in silico for two congeners. Methods and Results: Fifteen simple sequence repeats (SSRs) were identified in chloroplast sequences from four Artocarpus transcriptome assemblies. The markers were evaluated using capillary electrophoresis in A. odoratissimus (105 accessions) and A. altilis (73). They were also evaluated in silico in A. altilis (10), A. camansi (6), and A. altilis × A. mariannensis (7) transcriptomes. All loci were polymorphic in at least one species, with all 15 polymorphic in A. camansi. Per species, average alleles per locus ranged between 2.2 and 2.5. Three loci had evidence of fragment-length homoplasy. Conclusions: These markers will complement existing nuclear markers by enabling confident identification of maternal and clone lines, which are often important in vegetatively propagated crops such as breadfruit. PMID:26421253
Draft sequencing and comparative genomics of Xylella fastidiosa strains reveal novel biological insights.

PubMed

Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L; Purcell, Alexander; Edwards, Rob A; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C; Predki, Paul F

2002-10-01

Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.
In Silico Prediction and In Vitro Characterization of Multifunctional Human RNase3

PubMed Central

Kuo, Ping-Hsueh; Chen, Chien-Jung; Chang, Hsiu-Hui; Fang, Shun-lung; Wu, Wei-Shuo; Lai, Yiu-Kay; Pai, Tun-Wen; Chang, Margaret Dah-Tsyr

2013-01-01

Human ribonucleases A (hRNaseA) superfamily consists of thirteen members with high-structure similarities but exhibits divergent physiological functions other than RNase activity. Evolution of hRNaseA superfamily has gained novel functions which may be preserved in a unique region or domain to account for additional molecular interactions. hRNase3 has multiple functions including ribonucleolytic, heparan sulfate (HS) binding, cellular binding, endocytic, lipid destabilization, cytotoxic, and antimicrobial activities. In this study, three putative multifunctional regions, 34RWRCK38 (HBR1), 75RSRFR79 (HBR2), and 101RPGRR105 (HBR3), of hRNase3 have been identified employing in silico sequence analysis and validated employing in vitro activity assays. A heparin binding peptide containing HBR1 is characterized to act as a key element associated with HS binding, cellular binding, and lipid binding activities. In this study, we provide novel insights to identify functional regions of hRNase3 that may have implications for all hRNaseA superfamily members. PMID:23484086
A comparative in silico linear B-cell epitope prediction and characterization for South American and African Trypanosoma vivax strains.

PubMed

Guedes, Rafael Lucas Muniz; Rodrigues, Carla Monadeli Filgueira; Coatnoan, Nicolas; Cosson, Alain; Cadioli, Fabiano Antonio; Garcia, Herakles Antonio; Gerber, Alexandra Lehmkuhl; Machado, Rosangela Zacarias; Minoprio, Paola Marcella Camargo; Teixeira, Marta Maria Geraldes; de Vasconcelos, Ana Tereza Ribeiro

2018-02-27

Trypanosoma vivax is a parasite widespread across Africa and South America. Immunological methods using recombinant antigens have been developed aiming at specific and sensitive detection of infections caused by T. vivax. Here, we sequenced for the first time the transcriptome of a virulent T. vivax strain (Lins), isolated from an outbreak of severe disease in South America (Brazil) and performed a computational integrated analysis of genome, transcriptome and in silico predictions to identify and characterize putative linear B-cell epitopes from African and South American T. vivax. A total of 2278, 3936 and 4062 linear B-cell epitopes were respectively characterized for the transcriptomes of T. vivax LIEM-176 (Venezuela), T. vivax IL1392 (Nigeria) and T. vivax Lins (Brazil) and 4684 for the genome of T. vivax Y486 (Nigeria). The results presented are a valuable theoretical source that may pave the way for highly sensitive and specific diagnostic tools. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis.

PubMed

Kalendar, Ruslan; Lee, David; Schulman, Alan H

2011-08-01

The polymerase chain reaction is fundamental to molecular biology and is the most important practical molecular technique for the research laboratory. We have developed and tested efficient tools for PCR primer and probe design, which also predict oligonucleotide properties based on experimental studies of PCR efficiency. The tools provide comprehensive facilities for designing primers for most PCR applications and their combinations, including standard, multiplex, long-distance, inverse, real-time, unique, group-specific, bisulphite modification assays, Overlap-Extension PCR Multi-Fragment Assembly, as well as a programme to design oligonucleotide sets for long sequence assembly by ligase chain reaction. The in silico PCR primer or probe search includes comprehensive analyses of individual primers and primer pairs. It calculates the melting temperature for standard and degenerate oligonucleotides including LNA and other modifications, provides analyses for a set of primers with prediction of oligonucleotide properties, dimer and G-quadruplex detection, linguistic complexity, and provides a dilution and resuspension calculator. Copyright © 2011 Elsevier Inc. All rights reserved.
Strain Level Streptococcus Colonization Patterns during the First Year of Life

PubMed Central

Wright, Meredith S.; McCorrison, Jamison; Gomez, Andres M.; Beck, Erin; Harkins, Derek; Shankar, Jyoti; Mounaud, Stephanie; Segubre-Mercado, Edelwisa; Mojica, Aileen May R.; Bacay, Brian; Nzenze, Susan A.; Kimaro, Sheila Z. M.; Adrian, Peter; Klugman, Keith P.; Lucero, Marilla G.; Nelson, Karen E.; Madhi, Shabir; Sutton, Granger G.; Nierman, William C.; Losada, Liliana

2017-01-01

Pneumococcal pneumonia has decreased significantly since the implementation of the pneumococcal conjugate vaccine (PCV), nevertheless, in many developing countries pneumonia mortality in infants remains high. We have undertaken a study of the nasopharyngeal (NP) microbiome during the first year of life in infants from The Philippines and South Africa. The study entailed the determination of the Streptococcus sp. carriage using a lytA qPCR assay, whole metagenomic sequencing, and in silico serotyping of Streptococcus pneumoniae, as well as 16S rRNA amplicon based community profiling. The lytA carriage in both populations increased with infant age and lytA+ samples ranged from 24 to 85% of the samples at each sampling time point. We next developed informatic tools for determining Streptococcus community composition and pneumococcal serotype from metagenomic sequences derived from a subset of longitudinal lytA-positive Streptococcus enrichment cultures from The Philippines (n = 26 infants, 50% vaccinated) and South African (n = 7 infants, 100% vaccinated). NP samples from infants were passaged in enrichment media, and metagenomic DNA was purified and sequenced. In silico capsular serotyping of these 51 metagenomic assemblies assigned known serotypes in 28 samples, and the co-occurrence of serotypes in 5 samples. Eighteen samples were not typeable using known serotypes but did encode for capsule biosynthetic cluster genes similar to non-encapsulated reference sequences. In addition, we performed metagenomic assembly and 16S rRNA amplicon profiling to understand co-colonization dynamics of Streptococcus sp. and other NP genera, revealing the presence of multiple Streptococcus species as well as potential respiratory pathogens in healthy infants. A range of virulence and drug resistant elements were identified as circulating in the NP microbiomes of these infants. This study revealed the frequent co-occurrence of multiple S. pneumoniae strains along with Streptococcus sp. and other potential pathogens such as S. aureus in the NP microbiome of these infants. In addition, the in silico serotype analysis proved powerful in determining the serotypes in S. pneumoniae carriage, and may lead to developing better targeted vaccines to prevent invasive pneumococcal disease (IPD) in these countries. These findings suggest that NP colonization by S. pneumoniae during the first years of life is a dynamic process involving multiple serotypes and species. PMID:28932211
Large-scale gene discovery in the pea aphid Acyrthosiphon pisum (Hemiptera)

PubMed Central

Sabater-Muñoz, Beatriz; Legeai, Fabrice; Rispe, Claude; Bonhomme, Joël; Dearden, Peter; Dossat, Carole; Duclert, Aymeric; Gauthier, Jean-Pierre; Ducray, Danièle Giblot; Hunter, Wayne; Dang, Phat; Kambhampati, Srini; Martinez-Torres, David; Cortes, Teresa; Moya, Andrès; Nakabachi, Atsushi; Philippe, Cathy; Prunier-Leterme, Nathalie; Rahbé, Yvan; Simon, Jean-Christophe; Stern, David L; Wincker, Patrick; Tagu, Denis

2006-01-01

Aphids are the leading pests in agricultural crops. A large-scale sequencing of 40,904 ESTs from the pea aphid Acyrthosiphon pisum was carried out to define a catalog of 12,082 unique transcripts. A strong AT bias was found, indicating a compositional shift between Drosophila melanogaster and A. pisum. An in silico profiling analysis characterized 135 transcripts specific to pea-aphid tissues (relating to bacteriocytes and parthenogenetic embryos). This project is the first to address the genetics of the Hemiptera and of a hemimetabolous insect. PMID:16542494
Development and implementation of a highly-multiplexed SNP array for genetic mapping in maritime pine and comparative mapping with loblolly pine

PubMed Central

2011-01-01

Background Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe. Results We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species. Conclusions Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers. PMID:21767361
Complete Genomic Sequence of “Thermofilum adornatus” Strain 1910bT, a Hyperthermophilic Anaerobic Organotrophic Crenarchaeon

PubMed Central

Dominova, I. N.; Kublanov, I. V.; Podosokorskaya, O. A.; Derbikova, K. S.; Patrushev, M. V.

2013-01-01

The complete genomic sequence of a novel hyperthermophilic crenarchaeon, strain 1910bT, was determined. The genome comprises a 1,750,259-bp circular chromosome containing single copies of 3 rRNA genes, 43 tRNA genes, and 1,896 protein-coding sequences. In silico genome-genome hybridization suggests the proposal of a novel species, “Thermofilum adornatus” strain 1910bT. PMID:24029764

A Nonautochthonous U.S. Strain of Vibrio parahaemolyticus Isolated from Chesapeake Bay Oysters Caused the Outbreak in Maryland in 2010

PubMed Central

Haendiges, Julie; Jones, Jessica; Myers, Robert A.; Mitchell, Clifford S.; Butler, Erin

2016-01-01

ABSTRACT In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to the consumption of oysters. Strains isolated from both stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). However, the oysters contained other potentially pathogenic V. parahaemolyticus strains exhibiting different PFGE patterns. In order to assess the identity, genetic makeup, relatedness, and potential pathogenicity of the V. parahaemolyticus strains, we sequenced 11 such strains (2 clinical strains and 9 oyster strains). We analyzed these genomes by in silico multilocus sequence typing (MLST) and determined their phylogeny using a whole-genome MLST (wgMLST) analysis. Our in silico MLST analysis identified six different sequence types (STs) (ST8, ST676, ST810, ST811, ST34, and ST768), with both of the clinical and four of the oyster strains being identified as belonging to ST8. Using wgMLST, we showed that the ST8 strains from clinical and oyster samples were nearly indistinguishable and belonged to the same outbreak, confirming that local oysters were the source of the infections. The remaining oyster strains were genetically diverse, differing in >3,000 loci from the Maryland ST8 strains. eBURST analysis comparing these strains with strains of other STs available at the V. parahaemolyticus MLST website showed that the Maryland ST8 strains belonged to a clonal complex endemic to Asia. This indicates that the ST8 isolates from clinical and oyster sources were likely not endemic to Maryland. Finally, this study demonstrates the utility of whole-genome sequencing (WGS) and associated analyses for source-tracking investigations. IMPORTANCE Vibrio parahaemolyticus is an important foodborne pathogen and the leading cause of bacterial infections in the United States associated with the consumption of seafood. In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to oyster consumption. Strains isolated from stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). The oysters also contained other potentially pathogenic V. parahaemolyticus strains with different PFGE patterns. Since their identity, genetic makeup, relatedness, and potential pathogenicity were unknown, their genomes were determined by using next-generation sequencing. Whole-genome sequencing (WGS) analysis by whole-genome multilocus sequence typing (wgMLST) allowed (i) identification of clinical and oyster strains with matching PFGE profiles as belonging to ST8, (ii) determination of oyster strain diversity, and (iii) identification of the clinical strains as belonging to a clonal complex (CC) described only in Asia. Finally, WGS and associated analyses demonstrated their utility for trace-back investigations. PMID:26994080
A Nonautochthonous U.S. Strain of Vibrio parahaemolyticus Isolated from Chesapeake Bay Oysters Caused the Outbreak in Maryland in 2010.

PubMed

Haendiges, Julie; Jones, Jessica; Myers, Robert A; Mitchell, Clifford S; Butler, Erin; Toro, Magaly; Gonzalez-Escalona, Narjol

2016-06-01

In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to the consumption of oysters. Strains isolated from both stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). However, the oysters contained other potentially pathogenic V. parahaemolyticus strains exhibiting different PFGE patterns. In order to assess the identity, genetic makeup, relatedness, and potential pathogenicity of the V. parahaemolyticus strains, we sequenced 11 such strains (2 clinical strains and 9 oyster strains). We analyzed these genomes by in silico multilocus sequence typing (MLST) and determined their phylogeny using a whole-genome MLST (wgMLST) analysis. Our in silico MLST analysis identified six different sequence types (STs) (ST8, ST676, ST810, ST811, ST34, and ST768), with both of the clinical and four of the oyster strains being identified as belonging to ST8. Using wgMLST, we showed that the ST8 strains from clinical and oyster samples were nearly indistinguishable and belonged to the same outbreak, confirming that local oysters were the source of the infections. The remaining oyster strains were genetically diverse, differing in >3,000 loci from the Maryland ST8 strains. eBURST analysis comparing these strains with strains of other STs available at the V. parahaemolyticus MLST website showed that the Maryland ST8 strains belonged to a clonal complex endemic to Asia. This indicates that the ST8 isolates from clinical and oyster sources were likely not endemic to Maryland. Finally, this study demonstrates the utility of whole-genome sequencing (WGS) and associated analyses for source-tracking investigations. Vibrio parahaemolyticus is an important foodborne pathogen and the leading cause of bacterial infections in the United States associated with the consumption of seafood. In the summer of 2010, Vibrio parahaemolyticus caused an outbreak in Maryland linked to oyster consumption. Strains isolated from stool and oyster samples were indistinguishable by pulsed-field gel electrophoresis (PFGE). The oysters also contained other potentially pathogenic V. parahaemolyticus strains with different PFGE patterns. Since their identity, genetic makeup, relatedness, and potential pathogenicity were unknown, their genomes were determined by using next-generation sequencing. Whole-genome sequencing (WGS) analysis by whole-genome multilocus sequence typing (wgMLST) allowed (i) identification of clinical and oyster strains with matching PFGE profiles as belonging to ST8, (ii) determination of oyster strain diversity, and (iii) identification of the clinical strains as belonging to a clonal complex (CC) described only in Asia. Finally, WGS and associated analyses demonstrated their utility for trace-back investigations. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan).

PubMed

Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Kumar, Vinay; Kale, Sandip M; Sinha, Pallavi; Chitikineni, Annapurna; Pazhamala, Lekha T; Garg, Vanika; Sharma, Mamta; Sameer Kumar, Chanda Venkata; Parupalli, Swathi; Vechalapu, Suryanarayana; Patil, Suyash; Muniswamy, Sonnappa; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Dharmaraj, Pallavi Subbanna; Varshney, Rajeev K

2016-05-01

To map resistance genes for Fusarium wilt (FW) and sterility mosaic disease (SMD) in pigeonpea, sequencing-based bulked segregant analysis (Seq-BSA) was used. Resistant (R) and susceptible (S) bulks from the extreme recombinant inbred lines of ICPL 20096 × ICPL 332 were sequenced. Subsequently, SNP index was calculated between R- and S-bulks with the help of draft genome sequence and reference-guided assembly of ICPL 20096 (resistant parent). Seq-BSA has provided seven candidate SNPs for FW and SMD resistance in pigeonpea. In parallel, four additional genotypes were re-sequenced and their combined analysis with R- and S-bulks has provided a total of 8362 nonsynonymous (ns) SNPs. Of 8362 nsSNPs, 60 were found within the 2-Mb flanking regions of seven candidate SNPs identified through Seq-BSA. Haplotype analysis narrowed down to eight nsSNPs in seven genes. These eight nsSNPs were further validated by re-sequencing 11 genotypes that are resistant and susceptible to FW and SMD. This analysis revealed association of four candidate nsSNPs in four genes with FW resistance and four candidate nsSNPs in three genes with SMD resistance. Further, In silico protein analysis and expression profiling identified two most promising candidate genes namely C.cajan_01839 for SMD resistance and C.cajan_03203 for FW resistance. Identified candidate genomic regions/SNPs will be useful for genomics-assisted breeding in pigeonpea. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Molecular typing of Trichomonas vaginalis isolates by actin gene sequence analysis and carriage of T. vaginalis viruses.

PubMed

Masha, Simon C; Cools, Piet; Crucitti, Tania; Sanders, Eduard J; Vaneechoutte, Mario

2017-10-30

The protozoan parasite Trichomonas vaginalis is the most common non-viral, sexually transmitted pathogen. Although T. vaginalis is highly prevalent among women in Kenya, there is lack of data regarding genetic diversity of isolates currently in circulation in Kenya. Typing was performed on 22 clinical isolates of T. vaginalis collected from women attending the antenatal care clinic at Kilifi County Hospital, Kenya, in 2015. Genotyping followed a previously proposed restriction fragment length polymorphism (RFLP) scheme, which involved in silico cleavage of the amplified actin gene by HindII, MseI and RsaI restriction enzymes. Phylogenetic analysis of all the sequences was performed to confirm the results obtained by RFLP-analysis and to assess the diversity within the RFLP genotypes. Additionally, we determined carriage of the four different types of Trichomonas vaginalis viruses (TVVs) by polymerase chain reaction. In silico RFLP-analysis revealed five actin genotypes; 50.0% of the isolates were of actin genotype E, 27.3% of actin genotype N, 13.6% of actin genotype G and 4.5% of actin genotypes I and P. Phylogenetic analysis was in agreement with the RFLP-analysis, with the different actin genotypes clustering together. Prevalence of TVVs was 43.5% (95% confidence interval, CI: 23.2-65.5). TVV1 was the most prevalent, present in 39.1% of the strains and 90% of the T. vaginalis isolates which harbored TVVs had more than one type of TVV. None of the isolates of actin genotype E harbored any TVV. The presence of five actin genotypes in our study suggests notable diversity among T. vaginalis isolates occurring among pregnant women in Kilifi, Kenya. Isolates of the most prevalent actin genotype E lacked TVVs. We found no association between T. vaginalis genotype, carriage of TVVs and symptoms. Further studies with higher number of strains should be conducted in order to corroborate these results.
In silico analysis of SIGMAR1 variant (rs4879809) segregating in a consanguineous Pakistani family showing amyotrophic lateral sclerosis without frontotemporal lobar dementia.

PubMed

Ullah, Muhammad Ikram; Ahmad, Arsalan; Raza, Syed Irfan; Amar, Ali; Ali, Amjad; Bhatti, Attya; John, Peter; Mohyuddin, Aisha; Ahmad, Wasim; Hassan, Muhammad Jawad

2015-10-01

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disorder affecting upper motor neurons in the brain and lower motor neurons in the brain stem and spinal cord, resulting in fatal paralysis. It has been found to be associated with frontotemporal lobar degeneration (FTLD). In the present study, we have described homozygosity mapping and gene sequencing in a consanguineous autosomal recessive Pakistani family showing non-juvenile ALS without signs of FTLD. Gene mapping was carried out in all recruited family members using microsatellite markers, and linkage was established with sigma non-opioid intracellular receptor 1 (SIGMAR1) gene at chromosome 9p13.2. Gene sequencing of SIGMAR1 revealed a novel 3'-UTR nucleotide variation c.672*31A>G (rs4879809) segregating with disease in this family. The C9ORF72 repeat region in intron 1, previously implicated in a related phenotype, was excluded through linkage, and further confirmation of exclusion was obtained by amplifying intron 1 of C9ORF72 with multiple primers in affected individuals and controls. In silico analysis was carried out to explore the possible role of 3'-UTR variant of SIGMAR1 in ALS. The Regulatory RNA motif and Element Finder program revealed disturbance in miRNA (hsa-miR-1205) binding site due to this variation. ESEFinder analysis showed new SRSF1 and SRSF1-IgM-BRCA1 binding sites with significant scores due to this variation. Our results indicate that the 3'-UTR SIGMAR1 variant c.672*31A>G may have a role in the pathogenesis of ALS in this family.
Selection and Validation of a Multilocus Variable-Number Tandem-Repeat Analysis Panel for Typing Shigella spp.▿ †

PubMed Central

Gorgé, Olivier; Lopez, Stéphanie; Hilaire, Valérie; Lisanti, Olivier; Ramisse, Vincent; Vergnaud, Gilles

2008-01-01

The Shigella genus has historically been separated into four species, based on biochemical assays. The classification within each species relies on serotyping. Recently, genome sequencing and DNA assays, in particular the multilocus sequence typing (MLST) approach, greatly improved the current knowledge of the origin and phylogenetic evolution of Shigella spp. The Shigella and Escherichia genera are now considered to belong to a unique genomospecies. Multilocus variable-number tandem-repeat (VNTR) analysis (MLVA) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses of highly homogeneous bacterial pathogens. Here, we assess the capability of MLVA for Shigella typing. Thirty-two potentially polymorphic VNTRs were selected by analyzing in silico five Shigella genomic sequences and subsequently evaluated. Eventually, a panel of 15 VNTRs was selected (i.e., MLVA15 analysis). MLVA15 analysis of 78 strains or genome sequences of Shigella spp. and 11 strains or genome sequences of Escherichia coli distinguished 83 genotypes. Shigella population cluster analysis gave consistent results compared to MLST. MLVA15 analysis showed capabilities for E. coli typing, providing classification among pathogenic and nonpathogenic E. coli strains included in the study. The resulting data can be queried on our genotyping webpage (http://mlva.u-psud.fr). The MLVA15 assay is rapid, highly discriminatory, and reproducible for Shigella and Escherichia strains, suggesting that it could significantly contribute to epidemiological trace-back analysis of Shigella infections and pathogenic Escherichia outbreaks. Typing was performed on strains obtained mostly from collections. Further studies should include strains of much more diverse origins, including all pathogenic E. coli types. PMID:18216214
Molecular identification of aiiA homologous gene from endophytic Enterobacter species and in silico analysis of putative tertiary structure of AHL-lactonase.

PubMed

Rajesh, P S; Rai, V Ravishankar

2014-01-03

The aiiA homologous gene known to encode AHL- lactonase enzyme which hydrolyze the N-acylhomoserine lactone (AHL) quorum sensing signaling molecules produced by Gram negative bacteria. In this study, the degradation of AHL molecules was determined by cell-free lysate of endophytic Enterobacter species. The percentage of quorum quenching was confirmed and quantified by HPLC method (p<0.0001). Amplification and sequence BLAST analysis showed the presence of aiiA homologous gene in endophytic Enterobacter asburiae VT65, Enterobacter aerogenes VT66 and Enterobacter ludwigii VT70 strains. Sequence alignment analysis revealed the presence of two zinc binding sites, "HXHXDH" motif as well as tyrosine residue at the position 194. Based on known template available at Swiss-Model, putative tertiary structure of AHL-lactonase was constructed. The result showed that novel endophytic strains of Enterobacter genera encode the novel aiiA homologous gene and its structural importance for future study. Copyright © 2013 Elsevier Inc. All rights reserved.
Accumulation of multiple mutations in linezolid-resistant Staphylococcus epidermidis causing bloodstream infections; in silico analysis of L3 amino acid substitutions that might confer high-level linezolid resistance.

PubMed

Ikonomidis, Alexandros; Grapsa, Anastasia; Pavlioglou, Charikleia; Demiri, Antonia; Batarli, Alexandra; Panopoulou, Maria

2016-12-01

Fifty-six Staphylococcus epidermidis clinical isolates, showing high-level linezolid resistance and causing bacteremia in critically ill patients, were studied. All isolates belonged to ST22 clone and carried the T2504A and C2534T mutations in gene coding for 23SrRNA as well as the C189A, G208A, C209T and G384C missense mutations in L3 protein which resulted in Asp159Tyr, Gly152Asp and Leu94Val substitutions. Other silent mutations were also detected in genes coding for ribosomal proteins L3 and L22. In silico analysis of missense mutations showed that although L3 protein retained the sequence of secondary motifs, the tertiary structure was influenced. The observed alteration in L3 protein folding provides an indication on the putative role of L3-coding gene mutations in high-level linezolid resistance. Furthermore, linezolid pressure in health care settings where linezolid consumption is of high rates might lead to the selection of resistant mutants possessing L3 mutations that might confer high-level linezolid resistance.
Protein cleavage strategies for an improved analysis of the membrane proteome

PubMed Central

Fischer, Frank; Poetsch, Ansgar

2006-01-01

Background Membrane proteins still remain elusive in proteomic studies. This is in part due to the distribution of the amino acids lysine and arginine, which are less frequent in integral membrane proteins and almost absent in transmembrane helices. As these amino acids are cleavage targets for the commonly used protease trypsin, alternative cleavage conditions, which should improve membrane protein analysis, were tested by in silico digestion for the three organisms Saccharomyces cerevisiae, Halobacterium sp. NRC-1, and Corynebacterium glutamicum as hallmarks for eukaryotes, archea and eubacteria. Results For the membrane proteomes from all three analyzed organisms, we identified cleavage conditions that achieve better sequence and proteome coverage than trypsin. Greater improvement was obtained for bacteria than for yeast, which was attributed to differences in protein size and GRAVY. It was demonstrated for bacteriorhodopsin that the in silico predictions agree well with the experimental observations. Conclusion For all three examined organisms, it was found that a combination of chymotrypsin and staphylococcal peptidase I gave significantly better results than trypsin. As some of the improved cleavage conditions are not more elaborate than trypsin digestion and have been proven useful in practice, we suppose that the cleavage at both hydrophilic and hydrophobic amino acids should facilitate in general the analysis of membrane proteins for all organisms. PMID:16512920
Metasecretome-selective phage display approach for mining the functional potential of a rumen microbial community.

PubMed

Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana

2014-05-12

In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.
Hybridization-based antibody cDNA recovery for the production of recombinant antibodies identified by repertoire sequencing.

PubMed

Valdés-Alemán, Javier; Téllez-Sosa, Juan; Ovilla-Muñoz, Marbella; Godoy-Lozano, Elizabeth; Velázquez-Ramírez, Daniel; Valdovinos-Torres, Humberto; Gómez-Barreto, Rosa E; Martinez-Barnetche, Jesús

2014-01-01

High-throughput sequencing of the antibody repertoire is enabling a thorough analysis of B cell diversity and clonal selection, which may improve the novel antibody discovery process. Theoretically, an adequate bioinformatic analysis could allow identification of candidate antigen-specific antibodies, requiring their recombinant production for experimental validation of their specificity. Gene synthesis is commonly used for the generation of recombinant antibodies identified in silico. Novel strategies that bypass gene synthesis could offer more accessible antibody identification and validation alternatives. We developed a hybridization-based recovery strategy that targets the complementarity-determining region 3 (CDRH3) for the enrichment of cDNA of candidate antigen-specific antibody sequences. Ten clonal groups of interest were identified through bioinformatic analysis of the heavy chain antibody repertoire of mice immunized with hen egg white lysozyme (HEL). cDNA from eight of the targeted clonal groups was recovered efficiently, leading to the generation of recombinant antibodies. One representative heavy chain sequence from each clonal group recovered was paired with previously reported anti-HEL light chains to generate full antibodies, later tested for HEL-binding capacity. The recovery process proposed represents a simple and scalable molecular strategy that could enhance antibody identification and specificity assessment, enabling a more cost-efficient generation of recombinant antibodies.
Performing SELEX experiments in silico

NASA Astrophysics Data System (ADS)

Wondergem, J. A. J.; Schiessel, H.; Tompitak, M.

2017-11-01

Due to the sequence-dependent nature of the elasticity of DNA, many protein-DNA complexes and other systems in which DNA molecules must be deformed have preferences for the type of DNA sequence they interact with. SELEX (Systematic Evolution of Ligands by EXponential enrichment) experiments and similar sequence selection experiments have been used extensively to examine the (indirect readout) sequence preferences of, e.g., nucleosomes (protein spools around which DNA is wound for compactification) and DNA rings. We show how recently developed computational and theoretical tools can be used to emulate such experiments in silico. Opening up this possibility comes with several benefits. First, it allows us a better understanding of our models and systems, specifically about the roles played by the simulation temperature and the selection pressure on the sequences. Second, it allows us to compare the predictions made by the model of choice with experimental results. We find agreement on important features between predictions of the rigid base-pair model and experimental results for DNA rings and interesting differences that point out open questions in the field. Finally, our simulations allow application of the SELEX methodology to systems that are experimentally difficult to realize because they come with high energetic costs and are therefore unlikely to form spontaneously, such as very short or overwound DNA rings.
Evaluation of six primer pairs targeting the nuclear rRNA operon for characterization of arbuscular mycorrhizal fungal (AMF) communities using 454 pyrosequencing.

PubMed

Van Geel, Maarten; Busschaert, Pieter; Honnay, Olivier; Lievens, Bart

2014-11-01

In the last few years, 454 pyrosequencing-based analysis of arbuscular mycorrhizal fungal (AMF; Glomeromycota) communities has tremendously increased our knowledge of the distribution and diversity of AMF. Nonetheless, comparing results between different studies is difficult, as different target genes (or regions thereof) and primer combinations, with potentially dissimilar specificities and efficacies, are being utilized. In this study we evaluated six primer pairs that have previously been used in AMF studies (NS31-AM1, AMV4.5NF-AMDGR, AML1-AML2, NS31-AML2, FLR3-LSUmBr and Glo454-NDL22) for their use in 454 pyrosequencing based on both an in silico approach and 454 pyrosequencing of AMF communities from apple tree roots. Primers were evaluated in terms of (i) in silico coverage of Glomeromycota fungi, (ii) the number of high-quality sequences obtained, (iii) selectivity for AMF species, (iv) reproducibility and (v) ability to accurately describe AMF communities. We show that primer pairs AMV4.5NF-AMDGR, AML1-AML2 and NS31-AML2 outperformed the other tested primer pairs in terms of number of Glomeromycota reads (AMF specificity and coverage). Additionally, these primer pairs were found to have no or only few mismatches to AMF sequences and were able to consistently describe AMF communities from apple roots. However, whereas most high-quality AMF sequences were obtained for AMV4.5NF-AMDGR, our results also suggest that this primer pair favored amplification of Glomeraceae sequences at the expense of Ambisporaceae, Claroideoglomeraceae and Paraglomeraceae sequences. Furthermore, we demonstrate the complementary specificity of AMV4.5NF-AMDGR with AML1-AML2, and of AMV4.5NF-AMDGR with NS31-AML2, making these primer combinations highly suitable for tandem use in covering the diversity of AMF communities. Copyright © 2014 Elsevier B.V. All rights reserved.
Systematic evaluation of the impact of ChIP-seq read designs on genome coverage, peak identification, and allele-specific binding detection.

PubMed

Zhang, Qi; Zeng, Xin; Younkin, Sam; Kawli, Trupti; Snyder, Michael P; Keleş, Sündüz

2016-02-24

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) experiments revolutionized genome-wide profiling of transcription factors and histone modifications. Although maturing sequencing technologies allow these experiments to be carried out with short (36-50 bps), long (75-100 bps), single-end, or paired-end reads, the impact of these read parameters on the downstream data analysis are not well understood. In this paper, we evaluate the effects of different read parameters on genome sequence alignment, coverage of different classes of genomic features, peak identification, and allele-specific binding detection. We generated 101 bps paired-end ChIP-seq data for many transcription factors from human GM12878 and MCF7 cell lines. Systematic evaluations using in silico variations of these data as well as fully simulated data, revealed complex interplay between the sequencing parameters and analysis tools, and indicated clear advantages of paired-end designs in several aspects such as alignment accuracy, peak resolution, and most notably, allele-specific binding detection. Our work elucidates the effect of design on the downstream analysis and provides insights to investigators in deciding sequencing parameters in ChIP-seq experiments. We present the first systematic evaluation of the impact of ChIP-seq designs on allele-specific binding detection and highlights the power of pair-end designs in such studies.
Short-read, high-throughput sequencing technology for STR genotyping

PubMed Central

Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

2013-01-01

DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315
CEQer: a graphical tool for copy number and allelic imbalance detection from whole-exome sequencing data.

PubMed

Piazza, Rocco; Magistroni, Vera; Pirola, Alessandra; Redaelli, Sara; Spinelli, Roberta; Redaelli, Serena; Galbiati, Marta; Valletta, Simona; Giudici, Giovanni; Cazzaniga, Giovanni; Gambacorti-Passerini, Carlo

2013-01-01

Copy number alterations (CNA) are common events occurring in leukaemias and solid tumors. Comparative Genome Hybridization (CGH) is actually the gold standard technique to analyze CNAs; however, CGH analysis requires dedicated instruments and is able to perform only low resolution Loss of Heterozygosity (LOH) analyses. Here we present CEQer (Comparative Exome Quantification analyzer), a new graphical, event-driven tool for CNA/allelic-imbalance (AI) coupled analysis of exome sequencing data. By using case-control matched exome data, CEQer performs a comparative digital exonic quantification to generate CNA data and couples this information with exome-wide LOH and allelic imbalance detection. This data is used to build mixed statistical/heuristic models allowing the identification of CNA/AI events. To test our tool, we initially used in silico generated data, then we performed whole-exome sequencing from 20 leukemic specimens and corresponding matched controls and we analyzed the results using CEQer. Taken globally, these analyses showed that the combined use of comparative digital exon quantification and LOH/AI allows generating very accurate CNA data. Therefore, we propose CEQer as an efficient, robust and user-friendly graphical tool for the identification of CNA/AI in the context of whole-exome sequencing data.
probeBase—an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016

PubMed Central

Greuter, Daniel; Loy, Alexander; Horn, Matthias; Rattei, Thomas

2016-01-01

probeBase http://www.probebase.net is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. Here we present a major update of probeBase, which was last featured in the NAR Database Issue 2007. This update describes a complete remodeling of the database architecture and environment to accommodate computationally efficient access. Improved search functions, sequence match tools and data output now extend the opportunities for finding suitable hierarchical probe sets that target an organism or taxon at different taxonomic levels. To facilitate the identification of complementary probe sets for organisms represented by short rRNA sequence reads generated by amplicon sequencing or metagenomic analysis with next generation sequencing technologies such as Illumina and IonTorrent, we introduce a novel tool that recovers surrogate near full-length rRNA sequences for short query sequences and finds matching oligonucleotides in probeBase. PMID:26586809
Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome

PubMed Central

Müller, Christina A.; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C. A.; Wellington, Elizabeth M. H.

2015-01-01

Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications. PMID:26002894
Physical and in silico approaches identify DNA-PK in a Tax DNA-damage response interactome

PubMed Central

Ramadan, Emad; Ward, Michael; Guo, Xin; Durkin, Sarah S; Sawyer, Adam; Vilela, Marcelo; Osgood, Christopher; Pothen, Alex; Semmes, Oliver J

2008-01-01

Background We have initiated an effort to exhaustively map interactions between HTLV-1 Tax and host cellular proteins. The resulting Tax interactome will have significant utility toward defining new and understanding known activities of this important viral protein. In addition, the completion of a full Tax interactome will also help shed light upon the functional consequences of these myriad Tax activities. The physical mapping process involved the affinity isolation of Tax complexes followed by sequence identification using tandem mass spectrometry. To date we have mapped 250 cellular components within this interactome. Here we present our approach to prioritizing these interactions via an in silico culling process. Results We first constructed an in silico Tax interactome comprised of 46 literature-confirmed protein-protein interactions. This number was then reduced to four Tax-interactions suspected to play a role in DNA damage response (Rad51, TOP1, Chk2, 53BP1). The first-neighbor and second-neighbor interactions of these four proteins were assembled from available human protein interaction databases. Through an analysis of betweenness and closeness centrality measures, and numbers of interactions, we ranked proteins in the first neighborhood. When this rank list was compared to the list of physical Tax-binding proteins, DNA-PK was the highest ranked protein common to both lists. An overlapping clustering of the Tax-specific second-neighborhood protein network showed DNA-PK to be one of three bridge proteins that link multiple clusters in the DNA damage response network. Conclusion The interaction of Tax with DNA-PK represents an important biological paradigm as suggested via consensus findings in vivo and in silico. We present this methodology as an approach to discovery and as a means of validating components of a consensus Tax interactome. PMID:18922151
Extensive sequence analysis of CFTR, SCNN1A, SCNN1B, SCNN1G and SERPINA1 suggests an oligogenic basis for cystic fibrosis-like phenotypes.

PubMed

Ramos, M D; Trujillano, D; Olivar, R; Sotillo, F; Ossowski, S; Manzanares, J; Costa, J; Gartner, S; Oliva, C; Quintana, E; Gonzalez, M I; Vazquez, C; Estivill, X; Casals, T

2014-07-01

The term cystic fibrosis (CF)-like disease is used to describe patients with a borderline sweat test and suggestive CF clinical features but without two CFTR(cystic fibrosis transmembrane conductance regulator) mutations. We have performed the extensive molecular analysis of four candidate genes (SCNN1A, SCNN1B, SCNN1G and SERPINA1) in a cohort of 10 uncharacterized patients with CF and CF-like disease. We have used whole-exome sequencing to characterize mutations in the CFTR gene and these four candidate genes. CFTR molecular analysis allowed a complete characterization of three of four CF patients. Candidate variants in SCNN1A, SCNN1B, SCNN1G and SERPINA1 in six patients with CF-like phenotypes were confirmed by Sanger sequencing and were further supported by in silico predictive analysis, pedigree studies, sweat test in other family members, and analysis in CF patients and healthy subjects. Our results suggest that CF-like disease probably results from complex genotypes in several genes in an oligogenic form, with rare variants interacting with environmental factors. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

Significance of functional disease-causal/susceptible variants identified by whole-genome analyses for the understanding of human diseases.

PubMed

Hitomi, Yuki; Tokunaga, Katsushi

2017-01-01

Human genome variation may cause differences in traits and disease risks. Disease-causal/susceptible genes and variants for both common and rare diseases can be detected by comprehensive whole-genome analyses, such as whole-genome sequencing (WGS), using next-generation sequencing (NGS) technology and genome-wide association studies (GWAS). Here, in addition to the application of an NGS as a whole-genome analysis method, we summarize approaches for the identification of functional disease-causal/susceptible variants from abundant genetic variants in the human genome and methods for evaluating their functional effects in human diseases, using an NGS and in silico and in vitro functional analyses. We also discuss the clinical applications of the functional disease causal/susceptible variants to personalized medicine.
Novel proteases from the genome of the carnivorous plant Drosera capensis: structural prediction and comparative analysis

PubMed Central

Butts, Carter T.; Bierma, Jan C.; Martin, Rachel W.

2016-01-01

In his 1875 monograph on insectivorous plants, Darwin described the feeding reactions of Drosera flypaper traps and predicted that their secretions contained a “ferment” similar to mammalian pepsin, an aspartic protease. Here we report a high-quality draft genome sequence for the cape sundew, Drosera capensis, the first genome of a carnivorous plant from order Caryophyllales, which also includes the Venus flytrap (Dionaea) and the tropical pitcher plants (Nepenthes). This species was selected in part for its hardiness and ease of cultivation, making it an excellent model organism for further investigations of plant carnivory. Analysis of predicted protein sequences yields genes encoding proteases homologous to those found in other plants, some of which display sequence and structural features that suggest novel functionalities. Because the sequence similarity to proteins of known structure is in most cases too low for traditional homology modeling, 3D structures of representative proteases are predicted using comparative modeling with all-atom refinement. Although the overall folds and active residues for these proteins are conserved, we find structural and sequence differences consistent with a diversity of substrate recognition patterns. Finally, we predict differences in substrate specificities using in silico experiments, providing targets for structure/function studies of novel enzymes with biological and technological significance. PMID:27353064
Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome.

PubMed

de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos

2013-07-01

The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees.
Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome

PubMed Central

de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos

2013-01-01

The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees. PMID:23885214
Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing

PubMed Central

García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.

2016-01-01

Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451
New insights into plant glycoside hydrolase family 32 in Agave species

PubMed Central

Avila de Dios, Emmanuel; Gomez Vargas, Alan D.; Damián Santos, Maura L.; Simpson, June

2015-01-01

In order to optimize the use of agaves for commercial applications, an understanding of fructan metabolism in these species at the molecular and genetic level is essential. Based on transcriptome data, this report describes the identification and molecular characterization of cDNAs and deduced amino acid sequences for genes encoding fructosyltransferases, invertases and fructan exohydrolases (FEH) (enzymes belonging to plant glycoside hydrolase family 32) from four different agave species (A. tequilana, A. deserti, A. victoriae-reginae, and A. striata). Conserved amino acid sequences and a hypervariable domain allowed classification of distinct isoforms for each enzyme type. Notably however neither 1-FFT nor 6-SFT encoding cDNAs were identified. In silico analysis revealed that distinct isoforms for certain enzymes found in a single species, showed different levels and tissue specific patterns of expression whereas in other cases expression patterns were conserved both within the species and between different species. Relatively high levels of in silico expression for specific isoforms of both invertases and fructosyltransferases were observed in floral tissues in comparison to vegetative tissues such as leaves and stems and this pattern was confirmed by Quantitative Real Time PCR using RNA obtained from floral and leaf tissue of A. tequilana. Thin layer chromatography confirmed the presence of fructans with degree of polymerization (DP) greater than DP three in both immature buds and fully opened flowers also obtained from A. tequilana. PMID:26300895
New insights into plant glycoside hydrolase family 32 in Agave species.

PubMed

Avila de Dios, Emmanuel; Gomez Vargas, Alan D; Damián Santos, Maura L; Simpson, June

2015-01-01

In order to optimize the use of agaves for commercial applications, an understanding of fructan metabolism in these species at the molecular and genetic level is essential. Based on transcriptome data, this report describes the identification and molecular characterization of cDNAs and deduced amino acid sequences for genes encoding fructosyltransferases, invertases and fructan exohydrolases (FEH) (enzymes belonging to plant glycoside hydrolase family 32) from four different agave species (A. tequilana, A. deserti, A. victoriae-reginae, and A. striata). Conserved amino acid sequences and a hypervariable domain allowed classification of distinct isoforms for each enzyme type. Notably however neither 1-FFT nor 6-SFT encoding cDNAs were identified. In silico analysis revealed that distinct isoforms for certain enzymes found in a single species, showed different levels and tissue specific patterns of expression whereas in other cases expression patterns were conserved both within the species and between different species. Relatively high levels of in silico expression for specific isoforms of both invertases and fructosyltransferases were observed in floral tissues in comparison to vegetative tissues such as leaves and stems and this pattern was confirmed by Quantitative Real Time PCR using RNA obtained from floral and leaf tissue of A. tequilana. Thin layer chromatography confirmed the presence of fructans with degree of polymerization (DP) greater than DP three in both immature buds and fully opened flowers also obtained from A. tequilana.
In-silico Taxonomic Classification of 373 Genomes Reveals Species Misidentification and New Genospecies within the Genus Pseudomonas

PubMed Central

Tran, Phuong N.; Savka, Michael A.; Gan, Han Ming

2017-01-01

The genus Pseudomonas has one of the largest diversity of species within the Bacteria kingdom. To date, its taxonomy is still being revised and updated. Due to the non-standardized procedure and ambiguous thresholds at species level, largely based on 16S rRNA gene or conventional biochemical assay, species identification of publicly available Pseudomonas genomes remains questionable. In this study, we performed a large-scale analysis of all Pseudomonas genomes with species designation (excluding the well-defined P. aeruginosa) and re-evaluated their taxonomic assignment via in silico genome-genome hybridization and/or genetic comparison with valid type species. Three-hundred and seventy-three pseudomonad genomes were analyzed and subsequently clustered into 145 distinct genospecies. We detected 207 erroneous labels and corrected 43 to the proper species based on Average Nucleotide Identity Multilocus Sequence Typing (MLST) sequence similarity to the type strain. Surprisingly, more than half of the genomes initially designated as Pseudomonas syringae and Pseudomonas fluorescens should be classified either to a previously described species or to a new genospecies. Notably, high pairwise average nucleotide identity (>95%) indicating species-level similarity was observed between P. synxantha-P. libanensis, P. psychrotolerans–P. oryzihabitans, and P. kilonensis- P. brassicacearum, that were previously differentiated based on conventional biochemical tests and/or genome-genome hybridization techniques. PMID:28747902
Efficient HIV-1 inhibition by a 16 nt-long RNA aptamer designed by combining in vitro selection and in silico optimisation strategies

PubMed Central

Sánchez-Luque, Francisco J.; Stich, Michael; Manrubia, Susanna; Briones, Carlos; Berzal-Herranz, Alfredo

2014-01-01

The human immunodeficiency virus type-1 (HIV-1) genome contains multiple, highly conserved structural RNA domains that play key roles in essential viral processes. Interference with the function of these RNA domains either by disrupting their structures or by blocking their interaction with viral or cellular factors may seriously compromise HIV-1 viability. RNA aptamers are amongst the most promising synthetic molecules able to interact with structural domains of viral genomes. However, aptamer shortening up to their minimal active domain is usually necessary for scaling up production, what requires very time-consuming, trial-and-error approaches. Here we report on the in vitro selection of 64 nt-long specific aptamers against the complete 5′-untranslated region of HIV-1 genome, which inhibit more than 75% of HIV-1 production in a human cell line. The analysis of the selected sequences and structures allowed for the identification of a highly conserved 16 nt-long stem-loop motif containing a common 8 nt-long apical loop. Based on this result, an in silico designed 16 nt-long RNA aptamer, termed RNApt16, was synthesized, with sequence 5′-CCCCGGCAAGGAGGGG-3′. The HIV-1 inhibition efficiency of such an aptamer was close to 85%, thus constituting the shortest RNA molecule so far described that efficiently interferes with HIV-1 replication. PMID:25175101
The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

PubMed Central

Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

2015-01-01

Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191
Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains.

PubMed

Bhattacharyya, Anamitra; Stilwagen, Stephanie; Ivanova, Natalia; D'Souza, Mark; Bernal, Axel; Lykidis, Athanasios; Kapatral, Vinayak; Anderson, Iain; Larsen, Niels; Los, Tamara; Reznik, Gary; Selkov, Eugene; Walunas, Theresa L; Feil, Helene; Feil, William S; Purcell, Alexander; Lassez, Jean-Louis; Hawkins, Trevor L; Haselkorn, Robert; Overbeek, Ross; Predki, Paul F; Kyrpides, Nikos C

2002-09-17

Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.
Molecular genetic analysis of macular corneal dystrophy patients from North India.

PubMed

Paliwal, Preeti; Sharma, Arundhati; Tandon, Radhika; Sharma, Namrata; Titiyal, Jeevan S; Sen, Seema; Vajpayee, Rasik B

2012-01-01

To identify underlying genetic defects in the carbohydrate sulfotransferase-6 (CHST6) gene in North Indian patients with macular corneal dystrophy (MCD). 30 clinically diagnosed MCD patients from 21 families and 50 healthy normal controls were recruited in the study. Detailed clinical evaluation in the patients was undertaken followed by histopathology and ultrastructural studies in corneal tissues. DNA from blood samples was amplified for the CHST6 coding and upstream region followed by direct sequencing and in silico analysis. We identified pathogenic mutations in 17 patients from 11 families. Of these 4 were novel (p.Ser54Tyr, p.Gln58Arg, p.Leu59His and p.Leu293Phe), 2 were previously reported (Arg93His and Glu274Lys) homozygous, 1 heterozygous stop codon (p.Trp123X) and 2 compound heterozygous (p.Arg93His + p.Arg97Pro; p.Leu22Arg + p.Gln58X) mutations. A missense single-nucleotide polymorphism was also identified in 11 patients. The novel mutations were conserved as shown by in silico analysis. Thirteen patients did not show any pathogenic CHST6 changes. This is the first report on molecular analysis of MCD in North Indian patients. All cases could not be explained by mutations in CHST6, suggesting that MCD may result from other changes in the regulatory elements of CHST6 or from genetic heterogeneity. Copyright © 2012 S. Karger AG, Basel.
SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

PubMed

Anekthanakul, Krittima; Hongsthong, Apiradee; Senachak, Jittisak; Ruengjitchatchawalya, Marasri

2018-04-20

Bioactive peptides, including biological sources-derived peptides with different biological activities, are protein fragments that influence the functions or conditions of organisms, in particular humans and animals. Conventional methods of identifying bioactive peptides are time-consuming and costly. To quicken the processes, several bioinformatics tools are recently used to facilitate screening of the potential peptides prior their activity assessment in vitro and/or in vivo. In this study, we developed an efficient computational method, SpirPep, which offers many advantages over the currently available tools. The SpirPep web application tool is a one-stop analysis and visualization facility to assist bioactive peptide discovery. The tool is equipped with 15 customized enzymes and 1-3 miscleavage options, which allows in silico digestion of protein sequences encoded by protein-coding genes from single, multiple, or genome-wide scaling, and then directly classifies the peptides by bioactivity using an in-house database that contains bioactive peptides collected from 13 public databases. With this tool, the resulting peptides are categorized by each selected enzyme, and shown in a tabular format where the peptide sequences can be tracked back to their original proteins. The developed tool and webpages are coded in PHP and HTML with CSS/JavaScript. Moreover, the tool allows protein-peptide alignment visualization by Generic Genome Browser (GBrowse) to display the region and details of the proteins and peptides within each parameter, while considering digestion design for the desirable bioactivity. SpirPep is efficient; it takes less than 20 min to digest 3000 proteins (751,860 amino acids) with 15 enzymes and three miscleavages for each enzyme, and only a few seconds for single enzyme digestion. Obviously, the tool identified more bioactive peptides than that of the benchmarked tool; an example of validated pentapeptide (FLPIL) from LC-MS/MS was demonstrated. The web and database server are available at http://spirpepapp.sbi.kmutt.ac.th . SpirPep, a web-based bioactive peptide discovery application, is an in silico-based tool with an overview of the results. The platform is a one-stop analysis and visualization facility; and offers advantages over the currently available tools. This tool may be useful for further bioactivity analysis and the quantitative discovery of desirable peptides.
Molecular evolution of gas cavity in [NiFeSe] hydrogenases resurrected in silico

NASA Astrophysics Data System (ADS)

Tamura, Takashi; Tsunekawa, Naoki; Nemoto, Michiko; Inagaki, Kenji; Hirano, Toshiyuki; Sato, Fumitoshi

2016-01-01

Oxygen tolerance of selenium-containing [NiFeSe] hydrogenases (Hases) is attributable to the high reducing power of the selenocysteine residue, which sustains the bimetallic Ni-Fe catalytic center in the large subunit. Genes encoding [NiFeSe] Hases are inherited by few sulphate-reducing δ-proteobacteria globally distributed under various anoxic conditions. Ancestral sequences of [NiFeSe] Hases were elucidated and their three-dimensional structures were recreated in silico using homology modelling and molecular dynamic simulation, which suggested that deep gas channels gradually developed in [NiFeSe] Hases under absolute anaerobic conditions, whereas the enzyme remained as a sealed edifice under environmental conditions of a higher oxygen exposure risk. The development of a gas cavity appears to be driven by non-synonymous mutations, which cause subtle conformational changes locally and distantly, even including highly conserved sequence regions.
Terminal Restriction Fragment Length Polymorphism Analysis Program, a Web-Based Research Tool for Microbial Community Analysis

PubMed Central

Marsh, Terence L.; Saxman, Paul; Cole, James; Tiedje, James

2000-01-01

Rapid analysis of microbial communities has proven to be a difficult task. This is due, in part, to both the tremendous diversity of the microbial world and the high complexity of many microbial communities. Several techniques for community analysis have emerged over the past decade, and most take advantage of the molecular phylogeny derived from 16S rRNA comparative sequence analysis. We describe a web-based research tool located at the Ribosomal Database Project web site (http://www.cme.msu.edu/RDP/html/analyses.html) that facilitates microbial community analysis using terminal restriction fragment length polymorphism of 16S ribosomal DNA. The analysis function (designated TAP T-RFLP) permits the user to perform in silico restriction digestions of the entire 16S sequence database and derive terminal restriction fragment sizes, measured in base pairs, from the 5′ terminus of the user-specified primer to the 3′ terminus of the restriction endonuclease target site. The output can be sorted and viewed either phylogenetically or by size. It is anticipated that the site will guide experimental design as well as provide insight into interpreting results of community analysis with terminal restriction fragment length polymorphisms. PMID:10919828
Rational assignment of key motifs for function guides in silico enzyme identification.

PubMed

Höhne, Matthias; Schätzle, Sebastian; Jochens, Helge; Robins, Karen; Bornscheuer, Uwe T

2010-11-01

Biocatalysis has emerged as a powerful alternative to traditional chemistry, especially for asymmetric synthesis. One key requirement during process development is the discovery of a biocatalyst with an appropriate enantiopreference and enantioselectivity, which can be achieved, for instance, by protein engineering or screening of metagenome libraries. We have developed an in silico strategy for a sequence-based prediction of substrate specificity and enantiopreference. First, we used rational protein design to predict key amino acid substitutions that indicate the desired activity. Then, we searched protein databases for proteins already carrying these mutations instead of constructing the corresponding mutants in the laboratory. This methodology exploits the fact that naturally evolved proteins have undergone selection over millions of years, which has resulted in highly optimized catalysts. Using this in silico approach, we have discovered 17 (R)-selective amine transaminases, which catalyzed the synthesis of several (R)-amines with excellent optical purity up to >99% enantiomeric excess.
Pitfalls in genetic analysis of pheochromocytomas/paragangliomas-case report.

PubMed

Canu, Letizia; Rapizzi, Elena; Zampetti, Benedetta; Fucci, Rossella; Nesi, Gabriella; Richter, Susan; Qin, Nan; Giachè, Valentino; Bergamini, Carlo; Parenti, Gabriele; Valeri, Andrea; Ercolino, Tonino; Eisenhofer, Graeme; Mannelli, Massimo

2014-07-01

About 35% of patients with pheochromocytoma/paraganglioma carry a germline mutation in one of the 10 main susceptibility genes. The recent introduction of next-generation sequencing will allow the analysis of all these genes in one run. When positive, the analysis is generally unequivocal due to the association between a germline mutation and a concordant clinical presentation or positive family history. When genetic analysis reveals a novel mutation with no clinical correlates, particularly in the presence of a missense variant, the question arises whether the mutation is pathogenic or a rare polymorphism. We report the case of a 35-year-old patient operated for a pheochromocytoma who turned out to be a carrier of a novel SDHD (succinate dehydrogenase subunit D) missense mutation. With no positive family history or clinical correlates, we decided to perform additional analyses to test the clinical significance of the mutation. We performed in silico analysis, tissue loss of heterozygosity analysis, immunohistochemistry, Western blot analysis, SDH enzymatic assay, and measurement of the succinate/fumarate concentration ratio in the tumor tissue by tandem mass spectrometry. Although the in silico analysis gave contradictory results according to the different methods, all the other tests demonstrated that the SDH complex was conserved and normally active. We therefore came to the conclusion that the variant was a nonpathogenic polymorphism. Advancements in technology facilitate genetic analysis of patients with pheochromocytoma but also offer new challenges to the clinician who, in some cases, needs clinical correlates and/or functional tests to give significance to the results of the genetic assay.
Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics.

PubMed

Ernst, Corinna; Hahnen, Eric; Engel, Christoph; Nothnagel, Michael; Weber, Jonas; Schmutzler, Rita K; Hauke, Jan

2018-03-27

The use of next-generation sequencing approaches in clinical diagnostics has led to a tremendous increase in data and a vast number of variants of uncertain significance that require interpretation. Therefore, prediction of the effects of missense mutations using in silico tools has become a frequently used approach. Aim of this study was to assess the reliability of in silico prediction as a basis for clinical decision making in the context of hereditary breast and/or ovarian cancer. We tested the performance of four prediction tools (Align-GVGD, SIFT, PolyPhen-2, MutationTaster2) using a set of 236 BRCA1/2 missense variants that had previously been classified by expert committees. However, a major pitfall in the creation of a reliable evaluation set for our purpose is the generally accepted classification of BRCA1/2 missense variants using the multifactorial likelihood model, which is partially based on Align-GVGD results. To overcome this drawback we identified 161 variants whose classification is independent of any previous in silico prediction. In addition to the performance as stand-alone tools we examined the sensitivity, specificity, accuracy and Matthews correlation coefficient (MCC) of combined approaches. PolyPhen-2 achieved the lowest sensitivity (0.67), specificity (0.67), accuracy (0.67) and MCC (0.39). Align-GVGD achieved the highest values of specificity (0.92), accuracy (0.92) and MCC (0.73), but was outperformed regarding its sensitivity (0.90) by SIFT (1.00) and MutationTaster2 (1.00). All tools suffered from poor specificities, resulting in an unacceptable proportion of false positive results in a clinical setting. This shortcoming could not be bypassed by combination of these tools. In the best case scenario, 138 families would be affected by the misclassification of neutral variants within the cohort of patients of the German Consortium for Hereditary Breast and Ovarian Cancer. We show that due to low specificities state-of-the-art in silico prediction tools are not suitable to predict pathogenicity of variants of uncertain significance in BRCA1/2. Thus, clinical consequences should never be based solely on in silico forecasts. However, our data suggests that SIFT and MutationTaster2 could be suitable to predict benignity, as both tools did not result in false negative predictions in our analysis.
Identification and molecular characterisation of a homozygous missense mutation in the ADAMTS10 gene in a patient with Weill-Marchesani syndrome.

PubMed

Steinkellner, Hannes; Etzler, Julia; Gogoll, Laura; Neesen, Jürgen; Stifter, Eva; Brandau, Oliver; Laccone, Franco

2015-09-01

Weill-Marchesani syndrome is a rare disorder of the connective tissue. Functional variants in ADAMTS10 are associated with Weill-Marchesani syndrome-1. We identified a homozygous missense mutation, c.41T>A, of the ADAMTS10 gene in a 19-year-old female with typical symptoms of WMS1: proportionate short stature, brachydactyly, joint stiffness, and microspherophakia. The ADAMTS10 missense mutation was analysed in silico, with conflicting results as to its effects on protein function, but it was predicted to affect the leader sequence. Molecular characterisation in HEK293 Ebna cells revealed an intracellular mis-targeting of the ADAMTS10 protein with a reduced concentration of the polypeptide in the endoplasmic reticulum. A large reduction in glycosylation of the cytoplasmic fraction of the mutant ADAMTS10 protein versus the wild-type protein and a lack of secretion of the mutant protein are also evident in our results.In conclusion, we identified a novel missense mutation of the ADAMTS10 gene and confirmed the functional consequences suggested by the in silico analysis by conducting molecular studies.
In silico design of ligand triggered RNA switches.

PubMed

Findeiß, Sven; Hammer, Stefan; Wolfinger, Michael T; Kühnl, Felix; Flamm, Christoph; Hofacker, Ivo L

2018-04-13

This contribution sketches a work flow to design an RNA switch that is able to adapt two structural conformations in a ligand-dependent way. A well characterized RNA aptamer, i.,e., knowing its K d and adaptive structural features, is an essential ingredient of the described design process. We exemplify the principles using the well-known theophylline aptamer throughout this work. The aptamer in its ligand-binding competent structure represents one structural conformation of the switch while an alternative fold that disrupts the binding-competent structure forms the other conformation. To keep it simple we do not incorporate any regulatory mechanism to control transcription or translation. We elucidate a commonly used design process by explicitly dissecting and explaining the necessary steps in detail. We developed a novel objective function which specifies the mechanistics of this simple, ligand-triggered riboswitch and describe an extensive in silico analysis pipeline to evaluate important kinetic properties of the designed sequences. This protocol and the developed software can be easily extended or adapted to fit novel design scenarios and thus can serve as a template for future needs. Copyright © 2018. Published by Elsevier Inc.

In silico serine β-lactamases analysis reveals a huge potential resistome in environmental and pathogenic species.

PubMed

Brandt, Christian; Braun, Sascha D; Stein, Claudia; Slickers, Peter; Ehricht, Ralf; Pletz, Mathias W; Makarewicz, Oliwia

2017-02-24

The secretion of antimicrobial compounds is an ancient mechanism with clear survival benefits for microbes competing with other microorganisms. Consequently, mechanisms that confer resistance are also ancient and may represent an underestimated reservoir in environmental bacteria. In this context, β-lactamases (BLs) are of great interest due to their long-term presence and diversification in the hospital environment, leading to the emergence of Gram-negative pathogens that are resistant to cephalosporins (extended spectrum BLs = ESBLs) and carbapenems (carbapenemases). In the current study, protein sequence databases were used to analyze BLs, and the results revealed a substantial number of unknown and functionally uncharacterized BLs in a multitude of environmental and pathogenic species. Together, these BLs represent an uncharacterized reservoir of potentially transferable resistance genes. Considering all available data, in silico approaches appear to more adequately reflect a given resistome than analyses of limited datasets. This approach leads to a more precise definition of BL clades and conserved motifs. Moreover, it may support the prediction of new resistance determinants and improve the tailored development of robust molecular diagnostics.
De novo transcriptome analysis of rose-scented geranium provides insights into the metabolic specificity of terpene and tartaric acid biosynthesis.

PubMed

Narnoliya, Lokesh K; Kaushal, Girija; Singh, Sudhir P; Sangwan, Rajender S

2017-01-13

Rose-scented geranium (Pelargonium sp.) is a perennial herb that produces a high value essential oil of fragrant significance due to the characteristic compositional blend of rose-oxide and acyclic monoterpenoids in foliage. Recently, the plant has also been shown to produce tartaric acid in leaf tissues. Rose-scented geranium represents top-tier cash crop in terms of economic returns and significance of the plant and plant products. However, there has hardly been any study on its metabolism and functional genomics, nor any genomic expression dataset resource is available in public domain. Therefore, to begin the gains in molecular understanding of specialized metabolic pathways of the plant, de novo sequencing of rose-scented geranium leaf transcriptome, transcript assembly, annotation, expression profiling as well as their validation were carried out. De novo transcriptome analysis resulted a total of 78,943 unique contigs (average length: 623 bp, and N50 length: 752 bp) from 15.44 million high quality raw reads. In silico functional annotation led to the identification of several putative genes representing terpene, ascorbic acid and tartaric acid biosynthetic pathways, hormone metabolism, and transcription factors. Additionally, a total of 6,040 simple sequence repeat (SSR) motifs were identified in 6.8% of the expressed transcripts. The highest frequency of SSR was of tri-nucleotides (50%). Further, transcriptome assembly was validated for randomly selected putative genes by standard PCR-based approach. In silico expression profile of assembled contigs were validated by real-time PCR analysis of selected transcripts. Being the first report on transcriptome analysis of rose-scented geranium the data sets and the leads and directions reflected in this investigation will serve as a foundation for pursuing and understanding molecular aspects of its biology, and specialized metabolic pathways, metabolic engineering, genetic diversity as well as molecular breeding.
Camelid Ig V genes reveal significant human homology not seen in therapeutic target genes, providing for a powerful therapeutic antibody platform

PubMed Central

Klarenbeek, Alex; Mazouari, Khalil El; Desmyter, Aline; Blanchetot, Christophe; Hultberg, Anna; de Jonge, Natalie; Roovers, Rob C; Cambillau, Christian; Spinelli, Sylvia; Del-Favero, Jurgen; Verrips, Theo; de Haard, Hans J; Achour, Ikbel

2015-01-01

Camelid immunoglobulin variable (IGV) regions were found homologous to their human counterparts; however, the germline V repertoires of camelid heavy and light chains are still incomplete and their therapeutic potential is only beginning to be appreciated. We therefore leveraged the publicly available HTG and WGS databases of Lama pacos and Camelus ferus to retrieve the germline repertoire of V genes using human IGV genes as reference. In addition, we amplified IGKV and IGLV genes to uncover the V germline repertoire of Lama glama and sequenced BAC clones covering part of the Lama pacos IGK and IGL loci. Our in silico analysis showed that camelid counterparts of all human IGKV and IGLV families and most IGHV families could be identified, based on canonical structure and sequence homology. Interestingly, this sequence homology seemed largely restricted to the Ig V genes and was far less apparent in other genes: 6 therapeutically relevant target genes differed significantly from their human orthologs. This contributed to efficient immunization of llamas with the human proteins CD70, MET, interleukin (IL)-1β and IL-6, resulting in large panels of functional antibodies. The in silico predicted human-homologous canonical folds of camelid-derived antibodies were confirmed by X-ray crystallography solving the structure of 2 selected camelid anti-CD70 and anti-MET antibodies. These antibodies showed identical fold combinations as found in the corresponding human germline V families, yielding binding site structures closely similar to those occurring in human antibodies. In conclusion, our results indicate that active immunization of camelids can be a powerful therapeutic antibody platform. PMID:26018625
In silico peptide-binding predictions of passerine MHC class I reveal similarities across distantly related species, suggesting convergence on the level of protein function.

PubMed

Follin, Elna; Karlsson, Maria; Lundegaard, Claus; Nielsen, Morten; Wallin, Stefan; Paulsson, Kajsa; Westerdahl, Helena

2013-04-01

The major histocompatibility complex (MHC) genes are the most polymorphic genes found in the vertebrate genome, and they encode proteins that play an essential role in the adaptive immune response. Many songbirds (passerines) have been shown to have a large number of transcribed MHC class I genes compared to most mammals. To elucidate the reason for this large number of genes, we compared 14 MHC class I alleles (α1-α3 domains), from great reed warbler, house sparrow and tree sparrow, via phylogenetic analysis, homology modelling and in silico peptide-binding predictions to investigate their functional and genetic relationships. We found more pronounced clustering of the MHC class I allomorphs (allele specific proteins) in regards to their function (peptide-binding specificities) compared to their genetic relationships (amino acid sequences), indicating that the high number of alleles is of functional significance. The MHC class I allomorphs from house sparrow and tree sparrow, species that diverged 10 million years ago (MYA), had overlapping peptide-binding specificities, and these similarities across species were also confirmed in phylogenetic analyses based on amino acid sequences. Notably, there were also overlapping peptide-binding specificities in the allomorphs from house sparrow and great reed warbler, although these species diverged 30 MYA. This overlap was not found in a tree based on amino acid sequences. Our interpretation is that convergent evolution on the level of the protein function, possibly driven by selection from shared pathogens, has resulted in allomorphs with similar peptide-binding repertoires, although trans-species evolution in combination with gene conversion cannot be ruled out.
Genome-wide identification and phylogenetic analysis of the AP2/ERF gene superfamily in sweet orange (Citrus sinensis).

PubMed

Ito, T M; Polido, P B; Rampim, M C; Kaschuk, G; Souza, S G H

2014-09-26

Sweet orange (Citrus sinensis) plays an important role in the economy of more than 140 countries, but it is grown in areas with intermittent stressful soil and climatic conditions. The stress tolerance could be addressed by manipulating the ethylene response factor (ERF) transcription factors because they orchestrate plant responses to environmental stress. We performed an in silico study on the ERFs in the expressed sequence tag database of C. sinensis to identify potential genes that regulate plant responses to stress. We identified 108 putative genes encoding protein sequences of the AP2/ERF superfamily distributed within 10 groups of amino acid sequences. Ninety-one genes were assembled from the ERF family containing only one AP2/ERF domain, 13 genes were assembled from the AP2 family containing two AP2/ERF domains, and four other genes were assembled from the RAV family containing one AP2/ERF domain and a B3 domain. Some conserved domains of the ERF family genes were disrupted into a few segments by introns. This irregular distribution of genes in the AP2/ERF superfamily in different plant species could be a result of genomic losses or duplication events in a common ancestor. The in silico gene expression revealed that 67% of AP2/ERF genes are expressed in tissues with usual plant development, and 14% were expressed in stressed tissues. Because the AP2/ERF superfamily is expressed in an orchestrated way, it is possible that the manipulation of only one gene may result in changes in the whole plant function, which could result in more tolerant crops.
Is the activity of CGRP and Adrenomedullin regulated by RAMP (-2) and (-3) in Trypanosomatidae? An in-silico approach.

PubMed

Febres, Anthony; Vanegas, Oriana; Giammarresi, Michelle; Gomes, Carlos; Díaz, Emilia; Ponte-Sucre, Alicia

2018-07-01

The Calcitonin-Like Receptor (CLR) belongs to the classical seven-transmembrane segment molecules coupled to heterotrimeric G proteins. Its pharmacology depends on the simultaneous expression of the so-called Receptor Activity Modifier Proteins (RAMP-) -1, -2 and -3. RAMP-associated proteins modulate glycosylation and cellular traffic of CLR, therefore determining its pharmacodynamics. In higher eukaryotes, the complex formed by CLR and RAMP-1 is more akin to bind Calcitonin Gene-Related Peptide (CGRP), whereas those formed by CLR and RAMP-2 or RAMP-3, bind preferentially Adrenomedullin (AM). In lower eukaryotes, RAMPs, or any homologous protein, have not been identified until now. Herein we demonstrated a negative chemotactic response elicited by CGRP (10 -9 and 10 -8  M) and AM (10 -9 to 10 -5  M). Whether or not this response is receptor mediated should be verified, as well as the expression of a 24 kDa band in Leishmania, recognized by western blot analysis by the use of (human-)-RAMP-2 antibodies as detection probes. Queries with human RAMP-2 and RAMP-3 protein sequences in blastp against Leishmania (Viannia) braziliensis predicted proteome, allowed us to detect two sequence alignments in the parasite: A RAMP-2-aligned sequence corresponding to Leishmania folylpolyglutamate synthase (FPGS), and a RAMP-3 aligned protein, a hypothetical Leishmania protein with yet unknown function. The presence of homologous of these proteins was described in-silico in other members of the Trypanosomatidae. These preliminary and not yet complete data suggest the feasibility that both CGRP and Adrenomedullin activities may be regulated by homologs of RAMP- (-2) and (-3) in these parasites. Copyright © 2018 Elsevier B.V. All rights reserved.
Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome.

PubMed

Müller, Christina A; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C A; Wellington, Elizabeth M H; Berg, Gabriele

2015-08-01

Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Signal sequence and keyword trap in silico for selection of full-length human cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries.

PubMed

Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao

2005-01-01

We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.
Ribosomal DNA analysis of tsetse and non-tsetse transmitted Ethiopian Trypanosoma vivax strains in view of improved molecular diagnosis.

PubMed

Fikru, Regassa; Matetovici, Irina; Rogé, Stijn; Merga, Bekana; Goddeeris, Bruno Maria; Büscher, Philippe; Van Reet, Nick

2016-04-15

Animal trypanosomosis caused by Trypanosoma vivax (T. vivax) is a devastating disease causing serious economic losses. Most molecular diagnostics for T. vivax infection target the ribosomal DNA locus (rDNA) but are challenged by the heterogeneity among T. vivax strains. In this study, we investigated the rDNA heterogeneity of Ethiopian T. vivax strains in relation to their presence in tsetse-infested and tsetse-free areas and its effect on molecular diagnosis. We sequenced the rDNA loci of six Ethiopian (three from tsetse-infested and three from tsetse-free areas) and one Nigerian T. vivax strain. We analysed the obtained sequences in silico for primer-mismatches of some commonly used diagnostic PCR assays and for GC content. With these data, we selected some rDNA diagnostic PCR assays for evaluation of their diagnostic accuracy. Furthermore we constructed two phylogenetic networks based on sequences within the smaller subunit (SSU) of 18S and within the 5.8S and internal transcribed spacer 2 (ITS2) to assess the relatedness of Ethiopian T. vivax strains to strains from other African countries and from South America. In silico analysis of the rDNA sequence showed important mismatches of some published diagnostic PCR primers and high GC content of T. vivax rDNA. The evaluation of selected diagnostic PCR assays with specimens from cattle under natural T. vivax challenge showed that this high GC content interferes with the diagnostic accuracy of PCR, especially in cases of mixed infections with T. congolense. Adding betain to the PCR reaction mixture can enhance the amplification of T. vivax rDNA but decreases the sensitivity for T. congolense and Trypanozoon. The networks illustrated that Ethiopian T. vivax strains are considerably heterogeneous and two strains (one from tsetse-infested and one from tsetse-free area) are more related to the West African and South American strains than to the East African strains. The rDNA locus sequence of six Ethiopian T. vivax strains showed important differences and higher GC content compared to other animal trypanosomes but could not be related to their origin from tsetse-infested or tsetse-free area. The high GC content of T. vivax DNA renders accurate diagnosis of all pathogenic animal trypanosomes with one single PCR problematic. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Perspectives on pathway perturbation: Focused research to enhance 3R objectives

EPA Science Inventory

In vitro high-throughput screening (HTS) and in silico technologies are emerging as 21st century tools for hazard identification. Computational methods that strategically examine cross-species conservation of protein sequence/structural information for chemical molecular targets ...
Germline viral "fossils" guide in silico reconstruction of a mid-Cenozoic era marsupial adeno-associated virus.

PubMed

Smith, Richard H; Hallwirth, Claus V; Westerman, Michael; Hetherington, Nicola A; Tseng, Yu-Shan; Cecchini, Sylvain; Virag, Tamas; Ziegler, Mona-Larissa; Rogozin, Igor B; Koonin, Eugene V; Agbandje-McKenna, Mavis; Kotin, Robert M; Alexander, Ian E

2016-07-05

Germline endogenous viral elements (EVEs) genetically preserve viral nucleotide sequences useful to the study of viral evolution, gene mutation, and the phylogenetic relationships among host organisms. Here, we describe a lineage-specific, adeno-associated virus (AAV)-derived endogenous viral element (mAAV-EVE1) found within the germline of numerous closely related marsupial species. Molecular screening of a marsupial DNA panel indicated that mAAV-EVE1 occurs specifically within the marsupial suborder Macropodiformes (present-day kangaroos, wallabies, and related macropodoids), to the exclusion of other Diprotodontian lineages. Orthologous mAAV-EVE1 locus sequences from sixteen macropodoid species, representing a speciation history spanning an estimated 30 million years, facilitated compilation of an inferred ancestral sequence that recapitulates the genome of an ancient marsupial AAV that circulated among Australian metatherian fauna sometime during the late Eocene to early Oligocene. In silico gene reconstruction and molecular modelling indicate remarkable conservation of viral structure over a geologic timescale. Characterisation of AAV-EVE loci among disparate species affords insight into AAV evolution and, in the case of macropodoid species, may offer an additional genetic basis for assignment of phylogenetic relationships among the Macropodoidea. From an applied perspective, the identified AAV "fossils" provide novel capsid sequences for use in translational research and clinical applications.
In silico Derivation of HLA-Specific Alloreactivity Potential from Whole Exome Sequencing of Stem-Cell Transplant Donors and Recipients: Understanding the Quantitative Immunobiology of Allogeneic Transplantation

PubMed Central

Jameson-Lee, Max; Koparde, Vishal; Griffith, Phil; Scalora, Allison F.; Sampson, Juliana K.; Khalid, Haniya; Sheth, Nihar U.; Batalo, Michael; Serrano, Myrna G.; Roberts, Catherine H.; Hess, Michael L.; Buck, Gregory A.; Neale, Michael C.; Manjili, Masoud H.; Toor, Amir Ahmed

2014-01-01

Donor T-cell mediated graft versus host (GVH) effects may result from the aggregate alloreactivity to minor histocompatibility antigens (mHA) presented by the human leukocyte antigen (HLA) molecules in each donor–recipient pair undergoing stem-cell transplantation (SCT). Whole exome sequencing has previously demonstrated a large number of non-synonymous single nucleotide polymorphisms (SNP) present in HLA-matched recipients of SCT donors (GVH direction). The nucleotide sequence flanking each of these SNPs was obtained and the amino acid sequence determined. All the possible nonameric peptides incorporating the variant amino acid resulting from these SNPs were interrogated in silico for their likelihood to be presented by the HLA class I molecules using the Immune Epitope Database stabilized matrix method (SMM) and NetMHCpan algorithms. The SMM algorithm predicted that a median of 18,396 peptides weakly bound HLA class I molecules in individual SCT recipients, and 2,254 peptides displayed strong binding. A similar library of presented peptides was identified when the data were interrogated using the NetMHCpan algorithm. The bioinformatic algorithm presented here demonstrates that there may be a high level of mHA variation in HLA-matched individuals, constituting a HLA-specific alloreactivity potential. PMID:25414699
In silico Derivation of HLA-Specific Alloreactivity Potential from Whole Exome Sequencing of Stem-Cell Transplant Donors and Recipients: Understanding the Quantitative Immunobiology of Allogeneic Transplantation.

PubMed

Jameson-Lee, Max; Koparde, Vishal; Griffith, Phil; Scalora, Allison F; Sampson, Juliana K; Khalid, Haniya; Sheth, Nihar U; Batalo, Michael; Serrano, Myrna G; Roberts, Catherine H; Hess, Michael L; Buck, Gregory A; Neale, Michael C; Manjili, Masoud H; Toor, Amir Ahmed

2014-01-01

Donor T-cell mediated graft versus host (GVH) effects may result from the aggregate alloreactivity to minor histocompatibility antigens (mHA) presented by the human leukocyte antigen (HLA) molecules in each donor-recipient pair undergoing stem-cell transplantation (SCT). Whole exome sequencing has previously demonstrated a large number of non-synonymous single nucleotide polymorphisms (SNP) present in HLA-matched recipients of SCT donors (GVH direction). The nucleotide sequence flanking each of these SNPs was obtained and the amino acid sequence determined. All the possible nonameric peptides incorporating the variant amino acid resulting from these SNPs were interrogated in silico for their likelihood to be presented by the HLA class I molecules using the Immune Epitope Database stabilized matrix method (SMM) and NetMHCpan algorithms. The SMM algorithm predicted that a median of 18,396 peptides weakly bound HLA class I molecules in individual SCT recipients, and 2,254 peptides displayed strong binding. A similar library of presented peptides was identified when the data were interrogated using the NetMHCpan algorithm. The bioinformatic algorithm presented here demonstrates that there may be a high level of mHA variation in HLA-matched individuals, constituting a HLA-specific alloreactivity potential.
Integrated in silico and biological validation of the blocking effect of Cot-1 DNA on Microarray-CGH.

PubMed

Kang, Seung-Hui; Park, Chan Hee; Jeung, Hei Cheul; Kim, Ki-Yeol; Rha, Sun Young; Chung, Hyun Cheol

2007-06-01

In array-CGH, various factors may act as variables influencing the result of experiments. Among them, Cot-1 DNA, which has been used as a repetitive sequence-blocking agent, may become an artifact-inducing factor in BAC array-CGH. To identify the effect of Cot-1 DNA on Microarray-CGH experiments, Cot-1 DNA was labeled directly and Microarray-CGH experiments were performed. The results confirmed that probes which hybridized more completely with Cot-1 DNA had a higher sequence similarity to the Alu element. Further, in the sex-mismatched Microarray-CGH experiments, the variation and intensity in the fluorescent signal were reduced in the high intensity probe group in which probes were better hybridized with Cot-1 DNA. Otherwise, those of the low intensity probe group showed no alterations regardless of Cot-1 DNA. These results confirmed by in silico methods that Cot-1 DNA could block repetitive sequences in gDNA and probes. In addition, it was confirmed biologically that the blocking effect of Cot-1 DNA could be presented via its repetitive sequences, especially Alu elements. Thus, in contrast to BAC-array CGH, the use of Cot-1 DNA is advantageous in controlling experimental variation in Microarray-CGH.
Passenger strand loading in overexpression experiments using microRNA mimics.

PubMed

Søkilde, Rolf; Newie, Inga; Persson, Helena; Borg, Åke; Rovira, Carlos

2015-01-01

MicroRNAs (miRNAs) are important regulators of gene function and manipulation of miRNAs is a central component of basic research. Modulation of gene expression by miRNA gain-of-function can be based on different approaches including transfection with miRNA mimics; artificial, chemically modified miRNA-like small RNAs. These molecules are intended to mimic the function of a miRNA guide strand while bypassing the maturation steps of endogenous miRNAs. Due to easy accessibility through commercial providers this approach has gained popularity, and accuracy is often assumed without prior independent testing. Our in silico analysis of over-represented sequence motifs in microarray expression data and sequencing of AGO-associated small RNAs indicate, however, that miRNA mimics may be associated with considerable side-effects due to the unwanted activity of the miRNA mimic complementary strand.
Comparative In silico Study of Sex-Determining Region Y (SRY) Protein Sequences Involved in Sex-Determining.

PubMed

Vakili Azghandi, Masoume; Nasiri, Mohammadreza; Shamsa, Ali; Jalali, Mohsen; Shariati, Mohammad Mahdi

2016-04-01

The SRY gene (SRY) provides instructions for making a transcription factor called the sex-determining region Y protein. The sex-determining region Y protein causes a fetus to develop as a male. In this study, SRY of 15 spices included of human, chimpanzee, dog, pig, rat, cattle, buffalo, goat, sheep, horse, zebra, frog, urial, dolphin and killer whale were used for determine of bioinformatic differences. Nucleotide sequences of SRY were retrieved from the NCBI databank. Bioinformatic analysis of SRY is done by CLC Main Workbench version 5.5 and ClustalW (http:/www.ebi.ac.uk/clustalw/) and MEGA6 softwares. The multiple sequence alignment results indicated that SRY protein sequences from Orcinus orca (killer whale) and Tursiopsaduncus (dolphin) have least genetic distance of 0.33 in these 15 species and are 99.67% identical at the amino acid level. Homosapiens and Pantroglodytes (chimpanzee) have the next lowest genetic distance of 1.35 and are 98.65% identical at the amino acid level. These findings indicate that the SRY proteins are conserved in the 15 species, and their evolutionary relationships are similar.
Hairpin structures with conserved sequence motifs determine the 3' ends of non-polyadenylated invertebrate iridovirus transcripts.

PubMed

İnce, İkbal Agah; Pijlman, Gorben P; Vlak, Just M; van Oers, Monique M

2017-11-01

Previously, we observed that the transcripts of Invertebrate iridescent virus 6 (IIV6) are not polyadenylated, in line with the absence of canonical poly(A) motifs (AATAAA) downstream of the open reading frames (ORFs) in the genome. Here, we determined the 3' ends of the transcripts of fifty-four IIV6 virion protein genes in infected Drosophila Schneider 2 (S2) cells. By using ligation-based amplification of cDNA ends (LACE) it was shown that the IIV6 mRNAs often ended with a CAUUA motif. In silico analysis showed that the 3'-untranslated regions of IIV6 genes have the ability to form hairpin structures (22-56 nt in length) and that for about half of all IIV6 genes these 3' sequences contained complementary TAATG and CATTA motifs. We also show that a hairpin in the 3' flanking region with conserved sequence motifs is a conserved feature in invertebrate-infecting iridoviruses (genus Iridovirus and Chloriridovirus). Copyright © 2017 Elsevier Inc. All rights reserved.
Systems properties of the Haemophilus influenzae Rd metabolic genotype.

PubMed

Edwards, J S; Palsson, B O

1999-06-18

Haemophilus influenzae Rd was the first free-living organism for which the complete genomic sequence was established. The annotated sequence and known biochemical information was used to define the H. influenzae Rd metabolic genotype. This genotype contains 488 metabolic reactions operating on 343 metabolites. The stoichiometric matrix was used to determine the systems characteristics of the metabolic genotype and to assess the metabolic capabilities of H. influenzae. The need to balance cofactor and biosynthetic precursor production during growth on mixed substrates led to the definition of six different optimal metabolic phenotypes arising from the same metabolic genotype, each with different constraining features. The effects of variations in the metabolic genotype were also studied, and it was shown that the H. influenzae Rd metabolic genotype contains redundant functions under defined conditions. We thus show that the synthesis of in silico metabolic genotypes from annotated genome sequences is possible and that systems analysis methods are available that can be used to analyze and interpret phenotypic behavior of such genotypes.
In silico analysis of the polygalacturonase inhibiting protein 1 from apple, Malus domestica.

PubMed

Matsaunyane, Lerato Bt; Oelofse, Dean; Dubery, Ian A

2015-03-11

The Malus domestica polygalacturonase inhibiting protein 1 (MdPGIP1) gene, encoding the M. domestica polygalacturonase inhibiting protein 1 (MdPGIP1), was isolated from the Granny Smith apple cultivar (GenBank accession no. DQ185063). The gene was used to transform tobacco and potato for enhanced resistance against fungal diseases. Analysis of the MdPGIP1 nucleotide sequence revealed that the gene comprises 993 nucleotides that encode a 330 amino acid polypeptide. In silico characterization of the MdPGIP1 polypeptide revealed domains typical of PGIP proteins, which include a 24 amino acid putative signal peptide, a potential cleavage site [Alanine-Leucine-Serine (ALS)] for the signal peptide, a 238 amino acid leucine-rich repeat (LRR) domain, a 46 amino acid N-terminal domain and a 22 amino acid C-terminal domain. The hydropathic evaluation of MdPGIP1 indicated a repetitive hydrophobic motif in the LRR domain and a hydrophilic surface area consistent with a globular protein. The typical consensus glycosylation sequence of Asn-X-Ser/Thr was identified in MdPGIP1, indicating potential N-linked glycosylation of MdPGIP1. The molecular mass of non-glycosylated MdPGIP1 was calculated as 36.615 kDa and the theoretical isoelectric point as 6.98. Furthermore, the secondary and tertiary structure of MdPGIP1 was modelled, and revealed that MdPGIP1 is a curved and elongated molecule that contains sheet B1, sheet B2 and 310-helices on its LRR domain. The overall properties of the MdPGIP1 protein is similar to that of the prototypical Phaseolus vulgaris PGIP 2 (PvPGIP2), and the detected differences supported its use in biotechnological applications as an inhibitor of targeted fungal polygalacturonases (PGs).
Identification of Brucella melitensis Rev.1 vaccine-strain genetic markers: Towards understanding the molecular mechanism behind virulence attenuation.

PubMed

Issa, Mohammad Nouh; Ashhab, Yaqoub

2016-09-22

Brucella melitensis Rev.1 is an avirulent strain that is widely used as a live vaccine to control brucellosis in small ruminants. Although an assembled draft version of Rev.1 genome has been available since 2009, this genome has not been investigated to characterize this important vaccine. In the present work, we used the draft genome of Rev.1 to perform a thorough genomic comparison and sequence analysis to identify and characterize the panel of its unique genetic markers. The draft genome of Rev.1 was compared with genome sequences of 36 different Brucella melitensis strains from the Brucella project of the Broad Institute of MIT and Harvard. The comparative analyses revealed 32 genetic alterations (30 SNPs, 1 single-bp insertion and 1 single-bp deletion) that are exclusively present in the Rev.1 genome. In silico analyses showed that 9 out of the 17 non-synonymous mutations are deleterious. Three ABC transporters are among the disrupted genes that can be linked to virulence attenuation. Out of the 32 mutations, 11 Rev.1 specific markers were selected to test their potential to discriminate Rev.1 using a bi-directional allele-specific PCR assay. Six markers were able to distinguish between Rev.1 and a set of control strains. We succeeded in identifying a panel of 32 genome-specific markers of the B. melitensis Rev.1 vaccine strain. Extensive in silico analysis showed that a considerable number of these mutations could severely affect the function of the associated genes. In addition, some of the discovered markers were able to discriminate Rev.1 strain from a group of control strains using practical PCR tests that can be applied in resource-limited settings. Copyright © 2016 Elsevier Ltd. All rights reserved.

Calibration of Multiple In Silico Tools for Predicting Pathogenicity of Mismatch Repair Gene Missense Substitutions

PubMed Central

Thompson, Bryony A.; Greenblatt, Marc S.; Vallee, Maxime P.; Herkert, Johanna C.; Tessereau, Chloe; Young, Erin L.; Adzhubey, Ivan A.; Li, Biao; Bell, Russell; Feng, Bingjian; Mooney, Sean D.; Radivojac, Predrag; Sunyaev, Shamil R.; Frebourg, Thierry; Hofstra, Robert M.W.; Sijmons, Rolf H.; Boucher, Ken; Thomas, Alun; Goldgar, David E.; Spurdle, Amanda B.; Tavtigian, Sean V.

2015-01-01

Classification of rare missense substitutions observed during genetic testing for patient management is a considerable problem in clinical genetics. The Bayesian integrated evaluation of unclassified variants is a solution originally developed for BRCA1/2. Here, we take a step toward an analogous system for the mismatch repair (MMR) genes (MLH1, MSH2, MSH6, and PMS2) that confer colon cancer susceptibility in Lynch syndrome by calibrating in silico tools to estimate prior probabilities of pathogenicity for MMR gene missense substitutions. A qualitative five-class classification system was developed and applied to 143 MMR missense variants. This identified 74 missense substitutions suitable for calibration. These substitutions were scored using six different in silico tools (Align-Grantham Variation Grantham Deviation, multivariate analysis of protein polymorphisms [MAPP], Mut-Pred, PolyPhen-2.1, Sorting Intolerant From Tolerant, and Xvar), using curated MMR multiple sequence alignments where possible. The output from each tool was calibrated by regression against the classifications of the 74 missense substitutions; these calibrated outputs are interpretable as prior probabilities of pathogenicity. MAPP was the most accurate tool and MAPP + PolyPhen-2.1 provided the best-combined model (R2 = 0.62 and area under receiver operating characteristic = 0.93). The MAPP + PolyPhen-2.1 output is sufficiently predictive to feed as a continuous variable into the quantitative Bayesian integrated evaluation for clinical classification of MMR gene missense substitutions. PMID:22949387
In vivo and in silico determination of essential genes of Campylobacter jejuni.

PubMed

Metris, Aline; Reuter, Mark; Gaskin, Duncan J H; Baranyi, Jozsef; van Vliet, Arnoud H M

2011-11-01

In the United Kingdom, the thermophilic Campylobacter species C. jejuni and C. coli are the most frequent causes of food-borne gastroenteritis in humans. While campylobacteriosis is usually a relatively mild infection, it has a significant public health and economic impact, and possible complications include reactive arthritis and the autoimmune diseases Guillain-Barré syndrome. The rapid developments in "omics" technologies have resulted in the availability of diverse datasets allowing predictions of metabolism and physiology of pathogenic micro-organisms. When combined, these datasets may allow for the identification of potential weaknesses that can be used for development of new antimicrobials to reduce or eliminate C. jejuni and C. coli from the food chain. A metabolic model of C. jejuni was constructed using the annotation of the NCTC 11168 genome sequence, a published model of the related bacterium Helicobacter pylori, and extensive literature mining. Using this model, we have used in silico Flux Balance Analysis (FBA) to determine key metabolic routes that are essential for generating energy and biomass, thus creating a list of genes potentially essential for growth under laboratory conditions. To complement this in silico approach, candidate essential genes have been determined using a whole genome transposon mutagenesis method. FBA and transposon mutagenesis (both this study and a published study) predict a similar number of essential genes (around 200). The analysis of the intersection between the three approaches highlights the shikimate pathway where genes are predicted to be essential by one or more method, and tend to be network hubs, based on a previously published Campylobacter protein-protein interaction network, and could therefore be targets for novel antimicrobial therapy. We have constructed the first curated metabolic model for the food-borne pathogen Campylobacter jejuni and have presented the resulting metabolic insights. We have shown that the combination of in silico and in vivo approaches could point to non-redundant, indispensable genes associated with the well characterised shikimate pathway, and also genes of unknown function specific to C. jejuni, which are all potential novel Campylobacter intervention targets.
The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

PubMed

Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

2015-07-20

Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genomewide Analysis of the Antimicrobial Peptides in Python bivittatus and Characterization of Cathelicidins with Potent Antimicrobial Activity and Low Cytotoxicity.

PubMed

Kim, Dayeong; Soundrarajan, Nagasundarapandian; Lee, Juyeon; Cho, Hye-Sun; Choi, Minkyeung; Cha, Se-Yeoun; Ahn, Byeongyong; Jeon, Hyoim; Le, Minh Thong; Song, Hyuk; Kim, Jin-Hoi; Park, Chankyu

2017-09-01

In this study, we sought to identify novel antimicrobial peptides (AMPs) in Python bivittatus through bioinformatic analyses of publicly available genome information and experimental validation. In our analysis of the python genome, we identified 29 AMP-related candidate sequences. Of these, we selected five cathelicidin-like sequences and subjected them to further in silico analyses. The results showed that these sequences likely have antimicrobial activity. The sequences were named Pb-CATH1 to Pb-CATH5 according to their sequence similarity to previously reported snake cathelicidins. We predicted their molecular structure and then chemically synthesized the mature peptide for three putative cathelicidins and subjected them to biological activity tests. Interestingly, all three peptides showed potent antimicrobial effects against Gram-negative bacteria but very weak activity against Gram-positive bacteria. Remarkably, ΔPb-CATH4 showed potent activity against antibiotic-resistant clinical isolates and also was observed to possess very low hemolytic activity and cytotoxicity. ΔPb-CATH4 also showed considerable serum stability. Electron microscopic analysis indicated that ΔPb-CATH4 exerts its effects via toroidal pore preformation. Structural comparison of the cathelicidins identified in this study to previously reported ones revealed that these Pb-CATHs are representatives of a new group of reptilian cathelicidins lacking the acidic connecting domain. Furthermore, Pb-CATH4 possesses a completely different mature peptide sequence from those of previously described reptilian cathelicidins. These new AMPs may be candidates for the development of alternatives to or complements of antibiotics to control multidrug-resistant pathogens. Copyright © 2017 American Society for Microbiology.
Genomewide Analysis of the Antimicrobial Peptides in Python bivittatus and Characterization of Cathelicidins with Potent Antimicrobial Activity and Low Cytotoxicity

PubMed Central

Kim, Dayeong; Soundrarajan, Nagasundarapandian; Lee, Juyeon; Cho, Hye-sun; Choi, Minkyeung; Cha, Se-Yeoun; Ahn, Byeongyong; Jeon, Hyoim; Le, Minh Thong; Song, Hyuk; Kim, Jin-Hoi

2017-01-01

ABSTRACT In this study, we sought to identify novel antimicrobial peptides (AMPs) in Python bivittatus through bioinformatic analyses of publicly available genome information and experimental validation. In our analysis of the python genome, we identified 29 AMP-related candidate sequences. Of these, we selected five cathelicidin-like sequences and subjected them to further in silico analyses. The results showed that these sequences likely have antimicrobial activity. The sequences were named Pb-CATH1 to Pb-CATH5 according to their sequence similarity to previously reported snake cathelicidins. We predicted their molecular structure and then chemically synthesized the mature peptide for three putative cathelicidins and subjected them to biological activity tests. Interestingly, all three peptides showed potent antimicrobial effects against Gram-negative bacteria but very weak activity against Gram-positive bacteria. Remarkably, ΔPb-CATH4 showed potent activity against antibiotic-resistant clinical isolates and also was observed to possess very low hemolytic activity and cytotoxicity. ΔPb-CATH4 also showed considerable serum stability. Electron microscopic analysis indicated that ΔPb-CATH4 exerts its effects via toroidal pore preformation. Structural comparison of the cathelicidins identified in this study to previously reported ones revealed that these Pb-CATHs are representatives of a new group of reptilian cathelicidins lacking the acidic connecting domain. Furthermore, Pb-CATH4 possesses a completely different mature peptide sequence from those of previously described reptilian cathelicidins. These new AMPs may be candidates for the development of alternatives to or complements of antibiotics to control multidrug-resistant pathogens. PMID:28630199
Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

NASA Astrophysics Data System (ADS)

Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

2016-09-01

Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.
Alt a 1 allergen homologs from Alternaria and related taxa: analysis of phylogenetic content and secondary structure.

PubMed

Hong, Soon Gyu; Cramer, Robert A; Lawrence, Christopher B; Pryor, Barry M

2005-02-01

A gene for the Alternaria major allergen, Alt a 1, was amplified from 52 species of Alternaria and related genera, and sequence information was used for phylogenetic study. Alt a 1 gene sequences evolved 3.8 times faster and contained 3.5 times more parsimony-informative sites than glyceraldehyde-3-phosphate dehydrogenase (gpd) sequences. Analyses of Alt a 1 gene and gpd exon sequences strongly supported grouping of Alternaria spp. and related taxa into several species-groups described in previous studies, especially the infectoria, alternata, porri, brassicicola, and radicina species-groups and the Embellisia group. The sonchi species-group was newly suggested in this study. Monophyly of the Nimbya group was moderately supported, and monophyly of the Ulocladium group was weakly supported. Relationships among species-groups and among closely related species of the same species-group were not fully resolved. However, higher resolution could be obtained using Alt a 1 sequences or a combined dataset than using gpd sequences alone. Despite high levels of variation in amino acid sequences, results of in silico prediction of protein secondary structure for Alt a 1 demonstrated a high degree of structural similarity for most of the species suggesting a conservation of function.
CEQer: A Graphical Tool for Copy Number and Allelic Imbalance Detection from Whole-Exome Sequencing Data

PubMed Central

Piazza, Rocco; Magistroni, Vera; Pirola, Alessandra; Redaelli, Sara; Spinelli, Roberta; Redaelli, Serena; Galbiati, Marta; Valletta, Simona; Giudici, Giovanni; Cazzaniga, Giovanni; Gambacorti-Passerini, Carlo

2013-01-01

Copy number alterations (CNA) are common events occurring in leukaemias and solid tumors. Comparative Genome Hybridization (CGH) is actually the gold standard technique to analyze CNAs; however, CGH analysis requires dedicated instruments and is able to perform only low resolution Loss of Heterozygosity (LOH) analyses. Here we present CEQer (Comparative Exome Quantification analyzer), a new graphical, event-driven tool for CNA/allelic-imbalance (AI) coupled analysis of exome sequencing data. By using case-control matched exome data, CEQer performs a comparative digital exonic quantification to generate CNA data and couples this information with exome-wide LOH and allelic imbalance detection. This data is used to build mixed statistical/heuristic models allowing the identification of CNA/AI events. To test our tool, we initially used in silico generated data, then we performed whole-exome sequencing from 20 leukemic specimens and corresponding matched controls and we analyzed the results using CEQer. Taken globally, these analyses showed that the combined use of comparative digital exon quantification and LOH/AI allows generating very accurate CNA data. Therefore, we propose CEQer as an efficient, robust and user-friendly graphical tool for the identification of CNA/AI in the context of whole-exome sequencing data. PMID:24124457
Multilocus sequence analysis for assessment of phylogenetic diversity and biogeography in Thalassospira bacteria from diverse marine environments.

PubMed

Lai, Qiliang; Liu, Yang; Yuan, Jun; Du, Juan; Wang, Liping; Sun, Fengqin; Shao, Zongze

2014-01-01

Thalassospira bacteria are widespread and have been isolated from various marine environments. Less is known about their genetic diversity and biogeography, as well as their role in marine environments, many of them cannot be discriminated merely using the 16S rRNA gene. To address these issues, in this report, the phylogenetic analysis of 58 strains from seawater and deep sea sediments were carried out using the multilocus sequence analysis (MLSA) based on acsA, aroE, gyrB, mutL, rpoD and trpB genes, and the DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) based on genome sequences. The MLSA analysis demonstrated that the 58 strains were clearly separated into 15 lineages, corresponding to seven validly described species and eight potential novel species. The DDH and ANI values further confirmed the validity of the MLSA analysis and eight potential novel species. The MLSA interspecies gap of the genus Thalassospira was determined to be 96.16-97.12% sequence identity on the basis of the combined analyses of the DDH and MLSA, while the ANIm interspecies gap was 95.76-97.20% based on the in silico DDH analysis. Meanwhile, phylogenetic analyses showed that the Thalassospira bacteria exhibited distribution pattern to a certain degree according to geographic regions. Moreover, they clustered together according to the habitats depth. For short, the phylogenetic analyses and biogeography of the Thalassospira bacteria were systematically investigated for the first time. These results will be helpful to explore further their ecological role and adaptive evolution in marine environments.
Multilocus Sequence Analysis for Assessment of Phylogenetic Diversity and Biogeography in Thalassospira Bacteria from Diverse Marine Environments

PubMed Central

Yuan, Jun; Du, Juan; Wang, Liping; Sun, Fengqin; Shao, Zongze

2014-01-01

Thalassospira bacteria are widespread and have been isolated from various marine environments. Less is known about their genetic diversity and biogeography, as well as their role in marine environments, many of them cannot be discriminated merely using the 16S rRNA gene. To address these issues, in this report, the phylogenetic analysis of 58 strains from seawater and deep sea sediments were carried out using the multilocus sequence analysis (MLSA) based on acsA, aroE, gyrB, mutL, rpoD and trpB genes, and the DNA-DNA hybridization (DDH) and average nucleotide identity (ANI) based on genome sequences. The MLSA analysis demonstrated that the 58 strains were clearly separated into 15 lineages, corresponding to seven validly described species and eight potential novel species. The DDH and ANI values further confirmed the validity of the MLSA analysis and eight potential novel species. The MLSA interspecies gap of the genus Thalassospira was determined to be 96.16–97.12% sequence identity on the basis of the combined analyses of the DDH and MLSA, while the ANIm interspecies gap was 95.76–97.20% based on the in silico DDH analysis. Meanwhile, phylogenetic analyses showed that the Thalassospira bacteria exhibited distribution pattern to a certain degree according to geographic regions. Moreover, they clustered together according to the habitats depth. For short, the phylogenetic analyses and biogeography of the Thalassospira bacteria were systematically investigated for the first time. These results will be helpful to explore further their ecological role and adaptive evolution in marine environments. PMID:25198177
In silico assessment of primers for eDNA studies using PrimerTree and application to characterize the biodiversity surrounding the Cuyahoga River

NASA Astrophysics Data System (ADS)

Cannon, M. V.; Hester, J.; Shalkhauser, A.; Chan, E. R.; Logue, K.; Small, S. T.; Serre, D.

2016-03-01

Analysis of environmental DNA (eDNA) enables the detection of species of interest from water and soil samples, typically using species-specific PCR. Here, we describe a method to characterize the biodiversity of a given environment by amplifying eDNA using primer pairs targeting a wide range of taxa and high-throughput sequencing for species identification. We tested this approach on 91 water samples of 40 mL collected along the Cuyahoga River (Ohio, USA). We amplified eDNA using 12 primer pairs targeting mammals, fish, amphibians, birds, bryophytes, arthropods, copepods, plants and several microorganism taxa and sequenced all PCR products simultaneously by high-throughput sequencing. Overall, we identified DNA sequences from 15 species of fish, 17 species of mammals, 8 species of birds, 15 species of arthropods, one turtle and one salamander. Interestingly, in addition to aquatic and semi-aquatic animals, we identified DNA from terrestrial species that live near the Cuyahoga River. We also identified DNA from one Asian carp species invasive to the Great Lakes but that had not been previously reported in the Cuyahoga River. Our study shows that analysis of eDNA extracted from small water samples using wide-range PCR amplification combined with high-throughput sequencing can provide a broad perspective on biological diversity.
In silico assessment of primers for eDNA studies using PrimerTree and application to characterize the biodiversity surrounding the Cuyahoga River

PubMed Central

Cannon, M. V.; Hester, J.; Shalkhauser, A.; Chan, E. R.; Logue, K.; Small, S. T.; Serre, D.

2016-01-01

Analysis of environmental DNA (eDNA) enables the detection of species of interest from water and soil samples, typically using species-specific PCR. Here, we describe a method to characterize the biodiversity of a given environment by amplifying eDNA using primer pairs targeting a wide range of taxa and high-throughput sequencing for species identification. We tested this approach on 91 water samples of 40 mL collected along the Cuyahoga River (Ohio, USA). We amplified eDNA using 12 primer pairs targeting mammals, fish, amphibians, birds, bryophytes, arthropods, copepods, plants and several microorganism taxa and sequenced all PCR products simultaneously by high-throughput sequencing. Overall, we identified DNA sequences from 15 species of fish, 17 species of mammals, 8 species of birds, 15 species of arthropods, one turtle and one salamander. Interestingly, in addition to aquatic and semi-aquatic animals, we identified DNA from terrestrial species that live near the Cuyahoga River. We also identified DNA from one Asian carp species invasive to the Great Lakes but that had not been previously reported in the Cuyahoga River. Our study shows that analysis of eDNA extracted from small water samples using wide-range PCR amplification combined with high-throughput sequencing can provide a broad perspective on biological diversity. PMID:26965911
A Rapid Method of Genomic Array Analysis of Scaffold/Matrix Attachment Regions (S/MARs) Identifies a 2.5-Mb Region of Enhanced Scaffold/Matrix Attachment at a Human Neocentromere

PubMed Central

Sumer, Huseyin; Craig, Jeffrey M.; Sibson, Mandy; Choo, K.H. Andy

2003-01-01

Human neocentromeres are fully functional centromeres that arise at previously noncentromeric regions of the genome. We have tested a rapid procedure of genomic array analysis of chromosome scaffold/matrix attachment regions (S/MARs), involving the isolation of S/MAR DNA and hybridization of this DNA to a genomic BAC/PAC array. Using this procedure, we have defined a 2.5-Mb domain of S/MAR-enriched chromatin that fully encompasses a previously mapped centromere protein-A (CENP-A)-associated domain at a human neocentromere. We have independently verified this procedure using a previously established fluorescence in situ hybridization method on salt-treated metaphase chromosomes. In silico sequence analysis of the S/MAR-enriched and surrounding regions has revealed no outstanding sequence-related predisposition. This study defines the S/MAR-enriched domain of a higher eukaryotic centromere and provides a method that has broad application for the mapping of S/MAR attachment sites over large genomic regions or throughout a genome. PMID:12840048
Ketide Synthase (KS) Domain Prediction and Analysis of Iterative Type II PKS Gene in Marine Sponge-Associated Actinobacteria Producing Biosurfactants and Antimicrobial Agents

PubMed Central

Selvin, Joseph; Sathiyanarayanan, Ganesan; Lipton, Anuj N.; Al-Dhabi, Naif Abdullah; Valan Arasu, Mariadhas; Kiran, George S.

2016-01-01

The important biological macromolecules, such as lipopeptide and glycolipid biosurfactant producing marine actinobacteria were analyzed and their potential linkage between type II polyketide synthase (PKS) genes was explored. A unique feature of type II PKS genes is their high amino acid (AA) sequence homology and conserved gene organization. These enzymes mediate the biosynthesis of polyketide natural products with enormous structural complexity and chemical nature by combinatorial use of various domains. Therefore, deciphering the order of AA sequence encoded by PKS domains tailored the chemical structure of polyketide analogs still remains a great challenge. The present work deals with an in vitro and in silico analysis of PKS type II genes from five actinobacterial species to correlate KS domain architecture and structural features. Our present analysis reveals the unique protein domain organization of iterative type II PKS and KS domain of marine actinobacteria. The findings of this study would have implications in metabolic pathway reconstruction and design of semi-synthetic genomes to achieve rational design of novel natural products. PMID:26903957
Deep sequencing and in silico analysis of small RNA library reveals novel miRNA from leaf Persicaria minor transcriptome.

PubMed

Samad, Abdul Fatah A; Nazaruddin, Nazaruddin; Murad, Abdul Munir Abdul; Jani, Jaeyres; Zainal, Zamri; Ismail, Ismanizan

2018-03-01

In current era, majority of microRNA (miRNA) are being discovered through computational approaches which are more confined towards model plants. Here, for the first time, we have described the identification and characterization of novel miRNA in a non-model plant, Persicaria minor ( P . minor ) using computational approach. Unannotated sequences from deep sequencing were analyzed based on previous well-established parameters. Around 24 putative novel miRNAs were identified from 6,417,780 reads of the unannotated sequence which represented 11 unique putative miRNA sequences. PsRobot target prediction tool was deployed to identify the target transcripts of putative novel miRNAs. Most of the predicted target transcripts (mRNAs) were known to be involved in plant development and stress responses. Gene ontology showed that majority of the putative novel miRNA targets involved in cellular component (69.07%), followed by molecular function (30.08%) and biological process (0.85%). Out of 11 unique putative miRNAs, 7 miRNAs were validated through semi-quantitative PCR. These novel miRNAs discoveries in P . minor may develop and update the current public miRNA database.
i-rDNA: alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets.

PubMed

Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S

2011-11-30

Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/
In Silico Analysis of Expression Data for Identification of Genes Involved in Spatial Accumulation of Calcium in Developing Seeds of Rice

PubMed Central

Goel, Anshita; Gaur, Vikram S.; Arora, Sandeep; Gupta, Sanjay

2012-01-01

Abstract The calcium (Ca2+) transporters, like Ca2+ channels, Ca2+ ATPases, and Ca2+ exchangers, are instrumental for signaling and transport. However, the mechanism by which they orchestrate the accumulation of Ca2+ in grain filling has not yet been investigated. Hence the present study was designed to identify the potential calcium transporter genes that may be responsible for the spatial accumulation of calcium during grain filling. In silico expression analyses were performed to identify Ca2+ transporters that predominantly express during the different developmental stages of Oryza sativa. A total of 13 unique calcium transporters (7 from massively parallel signature sequencing [MPSS] data analysis, and 9 from microarray analysis) were identified. Analysis of variance (ANOVA) revealed differential expression of the transporters across tissues, and principal component analysis (PCA) exhibited their seed-specific distinctive expression profile. Interestingly, Ca2+ exchanger genes are highly expressed in the initial stages, whereas some Ca2+ ATPase genes are highly expressed throughout seed development. Furthermore, analysis of the cis-elements located in the promoter region of the subset of 13 genes suggested that Dof proteins play essential roles in regulating the expression of Ca2+ transporter genes during rice seed development. Based on these results, we developed a hypothetical model explaining the transport and tissue specific distribution of calcium in developing cereal seeds. The model may be extrapolated to understand the mechanism behind the exceptionally high level of calcium accumulation seen in grains like finger millet. PMID:22734689
Transcriptomic analysis of Ruditapes philippinarum hemocytes reveals cytoskeleton disruption after in vitro Vibrio tapetis challenge.

PubMed

Brulle, Franck; Jeffroy, Fanny; Madec, Stéphanie; Nicolas, Jean-Louis; Paillard, Christine

2012-10-01

The Manila clam, Ruditapes philippinarum, is an economically-important, commercial shellfish; harvests are diminished in some European waters by a pathogenic bacterium, Vibrio tapetis, that causes Brown Ring disease. To identify molecular characteristics associated with susceptibility or resistance to Brown Ring disease, Suppression Subtractive Hybridization (SSH) analyzes were performed to construct cDNA libraries enriched in up- or down-regulated transcripts from clam immune cells, hemocytes, after a 3-h in vitro challenge with cultured V. tapetis. Nine hundred and ninety eight sequences from the two libraries were sequenced, and an in silico analysis identified 235 unique genes. BLAST and "Gene ontology" classification analyzes revealed that 60.4% of the Expressed Sequence Tags (ESTs) have high similarities with genes involved in various physiological functions, such as immunity, apoptosis and cytoskeleton organization; whereas, 39.6% remain unidentified. From the 235 unique genes, we selected 22 candidates based upon physiological function and redundancy in the libraries. Then, Real-Time PCR analysis identified 3 genes related to cytoskeleton organization showing significant variation in expression attributable to V. tapetis exposure. Disruption in regulation of these genes is consistent with the etiologic agent of Brown Ring disease in Manila clams. Copyright © 2012 Elsevier Ltd. All rights reserved.
In silico analysis of L-asparaginase from different source organisms.

PubMed

Dwivedi, Vivek Dhar; Mishra, Sarad Kumar

2014-06-01

L-asparaginases are widely distributed enzymes among plants, fungi and bacteria. This enzyme catalyzes the conversion of l-asparagine to l-aspartate and ammonia and to a lesser extent the formation of l-glutamate from l-glutamine. In the present study, forty-five full-length amino acid sequences of L-asparaginases from bacteria, fungi and plants were collected and subjected to multiple sequence alignment (MSA), domain identification, discovering individual amino acid composition, and phylogenetic tree construction. MSA revealed that two glycine residues were identically found in all analyzed species, two glycine residues were also identically found in all the fungal and bacterial sources and three glycine residues were identically found in all plant and bacterial sources while no residue was identically found in plant and fungal L-asparaginases. Two major sequence clusters were constructed by phylogenetic analysis. One cluster contains eleven species of fungi, twelve species of bacteria, and one species of plant, whereas the other one contains fourteen species of plant, four species of fungi and three species bacteria. The amino acid composition result revealed that the average frequency of amino acid alanine is 10.77 percent that is very high in comparison to other amino acids in all analyzed species.
Conservation of tubulin-binding sequences in TRPV1 throughout evolution.

PubMed

Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan

2012-01-01

Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Our analysis identifies the regions of TRPV1, which are important for structure-function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol®-induced neuropathy.

Identification of Sinorhizobium (Ensifer) medicae based on a specific genomic sequence unveiled by M13-PCR fingerprinting.

PubMed

Dourado, Ana Catarina; Alves, Paula I L; Tenreiro, Tania; Ferreira, Eugénio M; Tenreiro, Rogério; Fareleira, Paula; Crespo, M Teresa Barreto

2009-12-01

A collection of nodule isolates from Medicago polymorpha obtained from southern and central Portugal was evaluated by M13-PCR fingerprinting and hierarchical cluster analysis. Several genomic clusters were obtained which, by 16S rRNA gene sequencing of selected representatives, were shown to be associated with particular taxonomic groups of rhizobia and other soil bacteria. The method provided a clear separation between rhizobia and co-isolated non-symbiotic soil contaminants. Ten M13-PCR groups were assigned to Sinorhizobium (Ensifer) medicae and included all isolates responsible for the formation of nitrogen-fixing nodules upon re-inoculation of M. polymorpha test-plants. In addition, enterobacterial repetitive intergenic consensus (ERIC)-PCR fingerprinting indicated a high genomic heterogeneity within the major M13- PCR clusters of S. medicae isolates. Based on nucleotide sequence data of an M13-PCR amplicon of ca. 1500 bp, observed only in S. medicae isolates and spanning locus Smed_3707 to Smed_3709 from the pSMED01 plasmid sequence of S. medicae WSM419 genome's sequence, a pair of PCR primers was designed and used for direct PCR amplification of a 1399-bp sequence within this fragment. Additional in silico and in vitro experiments, as well as phylogenetic analysis, confirmed the specificity of this primer combination and therefore the reliability of this approach in the prompt identification of S. medicae isolates and their distinction from other soil bacteria.
Chloroplast microsatellite markers for Artocarpus (Moraceae) developed from transcriptome sequences

USDA-ARS?s Scientific Manuscript database

Premise of the study: Chloroplast microsatellite loci were characterized from transcriptomes of Artocarpus (A.) altilis (breadfruit) and A. camansi (breadnut). They were tested in A. odoratissimus (terap) and A. altilis and evaluated in silico for two congeners. Methods and Results: 15 simple seque...
In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

PubMed Central

Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

2011-01-01

To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533
Development of Scoring Functions for Antibody Sequence Assessment and Optimization

PubMed Central

Seeliger, Daniel

2013-01-01

Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity. PMID:24204701
In silico cloning and B/T cell epitope prediction of triosephosphate isomerase from Echinococcus granulosus.

PubMed

Wang, Fen; Ye, Bin

2016-10-01

Cystic echinococcosis is a worldwide zoonosis caused by Echinococcus granulosus. Because the methods of diagnosis and treatment for cystic echinococcosis were limited, it is still necessary to screen target proteins for the development of new anti-hydatidosis vaccine. In this study, the triosephosphate isomerase gene of E. granulosus was in silico cloned. The B cell and T cell epitopes were predicted by bioinformatics methods. The cDNA sequence of EgTIM was composition of 1094 base pairs, with an open reading frame of 753 base pairs. The deduced amino acid sequences were composed of 250 amino acids. Five cross-reactive epitopes, locating on 21aa-35aa, 43aa-57aa, 94aa-107aa, 115-129aa, and 164aa-183aa, could be expected to serve as candidate epitopes in the development of vaccine against E. granulosus. These results could provide bases for gene cloning, recombinant expression, and the designation of anti-hydatidosis vaccine.
Inverted repeats in the promoter as an autoregulatory sequence for TcrX in Mycobacterium tuberculosis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhattacharya, Monolekha; Das, Amit Kumar, E-mail: amitk@hijli.iitkgp.ernet.in

Highlights: Black-Right-Pointing-Pointer The regulatory sequences recognized by TcrX have been identified. Black-Right-Pointing-Pointer The regulatory region comprises of inverted repeats segregated by 30 bp region. Black-Right-Pointing-Pointer The mode of binding of TcrX with regulatory sequence is unique. Black-Right-Pointing-Pointer In silico TcrX-DNA docked model binds one of the inverted repeats. Black-Right-Pointing-Pointer Both phosphorylated and unphosphorylated TcrX binds regulatory sequence in vitro. -- Abstract: TcrY, a histidine kinase, and TcrX, a response regulator, constitute a two-component system in Mycobacterium tuberculosis. tcrX, which is expressed during iron scarcity, is instrumental in the survival of iron-dependent M. tuberculosis. However, the regulator of tcrX/Y has notmore » been fully characterized. Crosslinking studies of TcrX reveal that it can form oligomers in vitro. Electrophoretic mobility shift assays (EMSAs) show that TcrX recognizes two regions in the promoter that are comprised of inverted repeats separated by {approx}30 bp. The dimeric in silico model of TcrX predicts binding to one of these inverted repeat regions. Site-directed mutagenesis and radioactive phosphorylation indicate that D54 of TcrX is phosphorylated by H256 of TcrY. However, phosphorylated and unphosphorylated TcrX bind the regulatory sequence with equal efficiency, which was shown with an EMSA using the D54A TcrX mutant.« less
LongISLND: in silico sequencing of lengthy and noisy datatypes

PubMed Central

Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C.; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y. K.

2016-01-01

Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. Availability and Implementation: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd Contact: hugo.lam@roche.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27667791
Epilepsy-related sudden unexpected death: targeted molecular analysis of inherited heart disease genes using next-generation DNA sequencing.

PubMed

Hata, Yukiko; Yoshida, Koji; Kinoshita, Koshi; Nishida, Naoki

2017-05-01

Inherited heart disease causing electric instability in the heart has been suggested to be a risk factor for sudden unexpected death in epilepsy (SUDEP). The purpose of this study was to reveal the correlation between epilepsy-related sudden unexpected death (SUD) and inherited heart disease. Twelve epilepsy-related SUD cases (seven males and five females, aged 11-78 years) were examined. Nine cases fulfilled the criteria of SUDEP, and three cases died by drowning. In addition to examining three major epilepsy-related genes, we used next-generation sequencing (NGS) to examine 73 inherited heart disease-related genes. We detected both known pathogenic variants and rare variants with minor allele frequencies of <0.5%. The pathogenicity of these variants was evaluated and graded by eight in silico predictive algorithms. Six known and six potential rare variants were detected. Among these, three known variants of LDB3, DSC2 and KCNE1 and three potential rare variants of MYH6, DSP and DSG2 were predicted by in silico analysis as possibly highly pathogenic in three of the nine SUDEP cases. Two of three cases with desmosome-related variants showed mild but possible significant right ventricular dysplasia-like pathology. A case with LDB3 and MYH6 variants showed hypertrabeculation of the left ventricle and severe fibrosis of the cardiac conduction system. In the three drowning death cases, one case with mild prolonged QT interval had two variants in ANK2. This study shows that inherited heart disease may be a significant risk factor for SUD in some epilepsy cases, even if pathological findings of the heart had not progressed to an advanced stage of the disease. A combination of detailed pathological examination of the heart and gene analysis using NGS may be useful for evaluating arrhythmogenic potential of epilepsy-related SUD. © 2016 International Society of Neuropathology.
Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

PubMed

Lee, Seungeun; Yamamoto, Naomichi

2015-12-01

This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Draft genome sequence of a CTX-M-8, CTX-M-55 and FosA3 co-producing Escherichia coli ST117/B2 isolated from an asymptomatic carrier.

PubMed

Fernandes, Miriam R; Sellera, Fábio P; Moura, Quézia; Souza, Tiago A; Lincopan, Nilton

2018-03-01

Asymptomatic carriers can act as reservoirs of multidrug-resistant (MDR) bacteria. The aim of this study was to describe the draft genome sequence of a MDR Escherichia coli lineage recovered from a faecal sample of a healthy carrier. Genomic DNA was sequenced on an Illumina NextSeq platform. Sequence reads were de novo assembled using CLC Genomics Workbench and the whole genome sequence was evaluated through bioinformatics tools available from the Center of Genomic Epidemiology as well as additional in silico analysis. The genome size was calculated as 5178340 bp, with 5442 protein-coding sequences and 5492 total genes. Presence of the bla CTX-M-8 , bla CTX-M-55 and fosA3 genes was detected in addition to other antimicrobial resistance genes. Interestingly, the strain was assigned to serotype O8:H4-fimH97 and was classified within the highly virulent phylogroup B2. This draft genome can provide helpful information to elucidate genetic features that contribute to colonisation and adaptation of MDR and virulent pathogens in asymptomatic carriers. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
In silico pharmacology for drug discovery: applications to targets and beyond

PubMed Central

Ekins, S; Mestres, J; Testa, B

2007-01-01

Computational (in silico) methods have been developed and widely applied to pharmacology hypothesis development and testing. These in silico methods include databases, quantitative structure-activity relationships, similarity searching, pharmacophores, homology models and other molecular modeling, machine learning, data mining, network analysis tools and data analysis tools that use a computer. Such methods have seen frequent use in the discovery and optimization of novel molecules with affinity to a target, the clarification of absorption, distribution, metabolism, excretion and toxicity properties as well as physicochemical characterization. The first part of this review discussed the methods that have been used for virtual ligand and target-based screening and profiling to predict biological activity. The aim of this second part of the review is to illustrate some of the varied applications of in silico methods for pharmacology in terms of the targets addressed. We will also discuss some of the advantages and disadvantages of in silico methods with respect to in vitro and in vivo methods for pharmacology research. Our conclusion is that the in silico pharmacology paradigm is ongoing and presents a rich array of opportunities that will assist in expediating the discovery of new targets, and ultimately lead to compounds with predicted biological activity for these novel targets. PMID:17549046
Identification of the PLA2G6 c.1579G>A Missense Mutation in Papillon Dog Neuroaxonal Dystrophy Using Whole Exome Sequencing Analysis

PubMed Central

Tsuboi, Masaya; Watanabe, Manabu; Nibe, Kazumi; Yoshimi, Natsuko; Kato, Akihisa; Sakaguchi, Masahiro; Yamato, Osamu; Tanaka, Miyuu; Kuwamura, Mitsuru; Kushida, Kazuya; Harada, Tomoyuki; Chambers, James Kenn; Sugano, Sumio; Uchida, Kazuyuki; Nakayama, Hiroyuki

2017-01-01

Whole exome sequencing (WES) has become a common tool for identifying genetic causes of human inherited disorders, and it has also recently been applied to canine genome research. We conducted WES analysis of neuroaxonal dystrophy (NAD), a neurodegenerative disease that sporadically occurs worldwide in Papillon dogs. The disease is considered an autosomal recessive monogenic disease, which is histopathologically characterized by severe axonal swelling, known as “spheroids,” throughout the nervous system. By sequencing all eleven DNA samples from one NAD-affected Papillon dog and her parents, two unrelated NAD-affected Papillon dogs, and six unaffected control Papillon dogs, we identified 10 candidate mutations. Among them, three candidates were determined to be “deleterious” by in silico pathogenesis evaluation. By subsequent massive screening by TaqMan genotyping analysis, only the PLA2G6 c.1579G>A mutation had an association with the presence or absence of the disease, suggesting that it may be a causal mutation of canine NAD. As a human homologue of this gene is a causative gene for infantile neuroaxonal dystrophy, this canine phenotype may serve as a good animal model for human disease. The results of this study also indicate that WES analysis is a powerful tool for exploring canine hereditary diseases, especially in rare monogenic hereditary diseases. PMID:28107443
Chalcone synthase genes from milk thistle (Silybum marianum): isolation and expression analysis.

PubMed

Sanjari, Sepideh; Shobbar, Zahra Sadat; Ebrahimi, Mohsen; Hasanloo, Tahereh; Sadat-Noori, Seyed-Ahmad; Tirnaz, Soodeh

2015-12-01

Silymarin is a flavonoid compound derived from milk thistle (Silybum marianum) seeds which has several pharmacological applications. Chalcone synthase (CHS) is a key enzyme in the biosynthesis of flavonoids; thereby, the identification of CHS encoding genes in milk thistle plant can be of great importance. In the current research, fragments of CHS genes were amplified using degenerate primers based on the conserved parts of Asteraceae CHS genes, and then cloned and sequenced. Analysis of the resultant nucleotide and deduced amino acid sequences led to the identification of two different members of CHS gene family,SmCHS1 and SmCHS2. Third member, full-length cDNA (SmCHS3) was isolated by rapid amplification of cDNA ends (RACE), whose open reading frame contained 1239 bp including exon 1 (190 bp) and exon 2 (1049 bp), encoding 63 and 349 amino acids, respectively. In silico analysis of SmCHS3 sequence contains all the conserved CHS sites and shares high homology with CHS proteins from other plants.Real-time PCR analysis indicated that SmCHS1 and SmCHS3 had the highest transcript level in petals in the early flowering stage and in the stem of five upper leaves, followed by five upper leaves in the mid-flowering stage which are most probably involved in anthocyanin and silymarin biosynthesis.
A novel ENU-induced mutation, peewee, causes dwarfism in the mouse

PubMed Central

Bon-Ryon, Lee; Kano, Kiyoshi; Young, Jay; John, Simon; Nishina, Patsy M; Naggert, Jurgen K; Naito, Kunihiko

2010-01-01

We identified a novel fertile, autosomal recessive mutation, called peewee and that results in dwarfing, in a region-specific ENU-induced mutagenesis. These mice at litter size were smaller those of other strains. Histological analysis revealed that the major organs appear normal, but abnormalities in cellular proliferation were observed in bone, liver and testis. Haplotype analysis localized the peewee gene to a 3.3-Mb region between D5Mit83 and D5Mit356.3. There are 18 genes in this linkage area, and we also performed in silico mapping using the PosMed℠ program, which searches for connections among keywords and genes in an interval, but no similar phenotype descriptions were found for these genes. In the peewee mutant compared to the normal, C57BL/6J mouse, only Slc10a4 expression was lower. Our preliminary mutation analysis examining the nucleotide sequence of three exons, two introns and an untranslated region of Slc10a4 did not find any sequence difference between the peewee mouse and the C57BL/6J mouse. Detailed analysis of peewee mice might provide novel molecular insights into the complex mechanisms regulating body growth. PMID:19513787
Comparative Genomic Analysis of Lactobacillus plantarum GB-LP1 Isolated from Traditional Korean Fermented Food.

PubMed

Yu, Jihyun; Ahn, Sojin; Kim, Kwondo; Caetano-Anolles, Kelsey; Lee, Chanho; Kang, Jungsun; Cho, Kyungjin; Yoon, Sook Hee; Kang, Dae-Kyung; Kim, Heebal

2017-08-28

As probiotics play an important role in maintaining a healthy gut flora environment through antitoxin activity and inhibition of pathogen colonization, they have been of interest to the medical research community for quite some time now. Probiotic bacteria such as Lactobacillus plantarum , which can be found in fermented food, are of particular interest given their easy accessibility. We performed whole-genome sequencing and genomic analysis on a GB-LP1 strain of L. plantarum isolated from Korean traditional fermented food; this strain is well known for its functions in immune response, suppression of pathogen growth, and antitoxin effects. The complete genome sequence of GB-LP1 is a single chromosome of 3,040,388 bp with 2,899 predicted open reading frames. Genomic analysis of GB-LP1 revealed two CRISPR regions and genes showing accelerated evolution, which may have antibiotic and antitoxin functions. The aim of the present study was to predict strain specific-genomic characteristics and assess the potential of this new strain as lactic acid bacteria at the genomic level using in silico analysis. These results provide insight into the L. plantarum species as well as confirm the possibility of its utility as a candidate probiotic.
Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery

PubMed Central

Hoinka, Jan; Berezhnoy, Alexey; Dao, Phuong; Sauna, Zuben E.; Gilboa, Eli; Przytycka, Teresa M.

2015-01-01

High-Throughput (HT) SELEX combines SELEX (Systematic Evolution of Ligands by EXponential Enrichment), a method for aptamer discovery, with massively parallel sequencing technologies. This emerging technology provides data for a global analysis of the selection process and for simultaneous discovery of a large number of candidates but currently lacks dedicated computational approaches for their analysis. To close this gap, we developed novel in-silico methods to analyze HT-SELEX data and utilized them to study the emergence of polymerase errors during HT-SELEX. Rather than considering these errors as a nuisance, we demonstrated their utility for guiding aptamer discovery. Our approach builds on two main advancements in aptamer analysis: AptaMut—a novel technique allowing for the identification of polymerase errors conferring an improved binding affinity relative to the ‘parent’ sequence and AptaCluster—an aptamer clustering algorithm which is to our best knowledge, the only currently available tool capable of efficiently clustering entire aptamer pools. We applied these methods to an HT-SELEX experiment developing aptamers against Interleukin 10 receptor alpha chain (IL-10RA) and experimentally confirmed our predictions thus validating our computational methods. PMID:25870409
In silico lineage tracing through single cell transcriptomics identifies a neural stem cell population in planarians.

PubMed

Molinaro, Alyssa M; Pearson, Bret J

2016-04-27

The planarian Schmidtea mediterranea is a master regenerator with a large adult stem cell compartment. The lack of transgenic labeling techniques in this animal has hindered the study of lineage progression and has made understanding the mechanisms of tissue regeneration a challenge. However, recent advances in single-cell transcriptomics and analysis methods allow for the discovery of novel cell lineages as differentiation progresses from stem cell to terminally differentiated cell. Here we apply pseudotime analysis and single-cell transcriptomics to identify adult stem cells belonging to specific cellular lineages and identify novel candidate genes for future in vivo lineage studies. We purify 168 single stem and progeny cells from the planarian head, which were subjected to single-cell RNA sequencing (scRNAseq). Pseudotime analysis with Waterfall and gene set enrichment analysis predicts a molecularly distinct neoblast sub-population with neural character (νNeoblasts) as well as a novel alternative lineage. Using the predicted νNeoblast markers, we demonstrate that a novel proliferative stem cell population exists adjacent to the brain. scRNAseq coupled with in silico lineage analysis offers a new approach for studying lineage progression in planarians. The lineages identified here are extracted from a highly heterogeneous dataset with minimal prior knowledge of planarian lineages, demonstrating that lineage purification by transgenic labeling is not a prerequisite for this approach. The identification of the νNeoblast lineage demonstrates the usefulness of the planarian system for computationally predicting cellular lineages in an adult context coupled with in vivo verification.
SeqAPASS: Sequence alignment to predict across-species susceptibility

EPA Science Inventory

Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, s...
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.

PubMed

Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R

2017-07-01

The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
Characterization of irritans mariner-like elements in the olive fruit fly Bactrocera oleae (Diptera: Tephritidae): evolutionary implications.

PubMed

Ben Lazhar-Ajroud, Wafa; Caruso, Aurore; Mezghani, Maha; Bouallegue, Maryem; Tastard, Emmanuelle; Denis, Françoise; Rouault, Jacques-Deric; Makni, Hanem; Capy, Pierre; Chénais, Benoît; Makni, Mohamed; Casse, Nathalie

2016-08-01

Genomic variation among species is commonly driven by transposable element (TE) invasion; thus, the pattern of TEs in a genome allows drawing an evolutionary history of the studied species. This paper reports in vitro and in silico detection and characterization of irritans mariner-like elements (MLEs) in the genome and transcriptome of Bactrocera oleae (Rossi) (Diptera: Tephritidae). Eleven irritans MLE sequences have been isolated in vitro using terminal inverted repeats (TIRs) as primers, and 215 have been extracted in silico from the sequenced genome of B. oleae. Additionally, the sequenced genomes of Bactrocera tryoni (Froggatt) and Bactrocera cucurbitae (Diptera: Tephritidae) have been explored to identify irritans MLEs. A total of 129 sequences from B. tryoni have been extracted, while the genome of B. cucurbitae appears probably devoid of irritans MLEs. All detected irritans MLEs are defective due to several mutations and are clustered together in a monophyletic group suggesting a common ancestor. The evolutionary history and dynamics of these TEs are discussed in relation with the phylogenetic distribution of their hosts. The knowledge on the structure, distribution, dynamic, and evolution of irritans MLEs in Bactrocera species contributes to the understanding of both their evolutionary history and the invasion history of their hosts. This could also be the basis for genetic control strategies using transposable elements.

Quantitative Antisense Screening and Optimization for Exon 51 Skipping in Duchenne Muscular Dystrophy.

PubMed

Echigoya, Yusuke; Lim, Kenji Rowel Q; Trieu, Nhu; Bao, Bo; Miskew Nichols, Bailey; Vila, Maria Candida; Novak, James S; Hara, Yuko; Lee, Joshua; Touznik, Aleksander; Mamchaoui, Kamel; Aoki, Yoshitsugu; Takeda, Shin'ichi; Nagaraju, Kanneboyina; Mouly, Vincent; Maruyama, Rika; Duddy, William; Yokota, Toshifumi

2017-11-01

Duchenne muscular dystrophy (DMD), the most common lethal genetic disorder, is caused by mutations in the dystrophin (DMD) gene. Exon skipping is a therapeutic approach that uses antisense oligonucleotides (AOs) to modulate splicing and restore the reading frame, leading to truncated, yet functional protein expression. In 2016, the US Food and Drug Administration (FDA) conditionally approved the first phosphorodiamidate morpholino oligomer (morpholino)-based AO drug, eteplirsen, developed for DMD exon 51 skipping. Eteplirsen remains controversial with insufficient evidence of its therapeutic effect in patients. We recently developed an in silico tool to design antisense morpholino sequences for exon skipping. Here, we designed morpholino AOs targeting DMD exon 51 using the in silico tool and quantitatively evaluated the effects in immortalized DMD muscle cells in vitro. To our surprise, most of the newly designed morpholinos induced exon 51 skipping more efficiently compared with the eteplirsen sequence. The efficacy of exon 51 skipping and rescue of dystrophin protein expression were increased by up to more than 12-fold and 7-fold, respectively, compared with the eteplirsen sequence. Significant in vivo efficacy of the most effective morpholino, determined in vitro, was confirmed in mice carrying the human DMD gene. These findings underscore the importance of AO sequence optimization for exon skipping. Copyright © 2017 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.
Two combinatorial optimization problems for SNP discovery using base-specific cleavage and mass spectrometry.

PubMed

Chen, Xin; Wu, Qiong; Sun, Ruimin; Zhang, Louxin

2012-01-01

The discovery of single-nucleotide polymorphisms (SNPs) has important implications in a variety of genetic studies on human diseases and biological functions. One valuable approach proposed for SNP discovery is based on base-specific cleavage and mass spectrometry. However, it is still very challenging to achieve the full potential of this SNP discovery approach. In this study, we formulate two new combinatorial optimization problems. While both problems are aimed at reconstructing the sample sequence that would attain the minimum number of SNPs, they search over different candidate sequence spaces. The first problem, denoted as SNP - MSP, limits its search to sequences whose in silico predicted mass spectra have all their signals contained in the measured mass spectra. In contrast, the second problem, denoted as SNP - MSQ, limits its search to sequences whose in silico predicted mass spectra instead contain all the signals of the measured mass spectra. We present an exact dynamic programming algorithm for solving the SNP - MSP problem and also show that the SNP - MSQ problem is NP-hard by a reduction from a restricted variation of the 3-partition problem. We believe that an efficient solution to either problem above could offer a seamless integration of information in four complementary base-specific cleavage reactions, thereby improving the capability of the underlying biotechnology for sensitive and accurate SNP discovery.
SeqAPASS: Sequence alignment to predict across-species ...

EPA Pesticide Factsheets

Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, such as U.S. EPA ToxCast, and application of the adverse outcome pathway (AOP) framework for identifying and defining biological key events triggered upon perturbation of molecular initiating events and leading to adverse outcomes occuring at a level of organization relevant for risk assessment [1]. With these recent initiatives to harness the power of “the pathway” in describing and evaluating toxicity comes the need to extrapolate data beyond the model species. Sequence alignment to predict across-species susceptibilty (SeqAPASS) is a web-based tool that allows the user to begin to understand how broadly HTS data or AOP constructs may plausibly be extrapolated across species, while describing the relative intrinsic susceptibiltiy of different taxa to chemicals with known modes of action (e.g., pharmaceuticals and pesticides). The tool rapidly and strategically assesses available molecular target information to describe protein sequence similarity at the primary amino acid sequence, conserved domain, and individual amino acid residue levels. This in silico approach to species extrapolation was designed to automate and streamline the relatively complex and time-consuming process of co
Improving draft genome contiguity with reference-derived in silico mate-pair libraries.

PubMed

Grau, José Horacio; Hackl, Thomas; Koepfli, Klaus-Peter; Hofreiter, Michael

2018-05-01

Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. In order to improve genome contiguity, we have developed Cross-Species Scaffolding-a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Prevalence and Identification of Burkholderia pseudomallei and Near-Neighbor Species in the Malabar Coastal Region of India

PubMed Central

Peddayelachagiri, Bhavani V.; Paul, Soumya; Nagaraj, Sowmya; Gogoi, Madhurjya; Sripathy, Murali H.; Batra, Harsh V.

2016-01-01

Accurate identification of pathogens with biowarfare importance requires detection tools that specifically differentiate them from near-neighbor species. Burkholderia pseudomallei, the causative agent of a fatal disease melioidosis, is one such biothreat agent whose differentiation from its near-neighbor species is always a challenge. This is because of its phenotypic similarity with other Burkholderia species which have a wide spread geographical distribution with shared environmental niches. Melioidosis is a major public health concern in endemic regions including Southeast Asia and northern Australia. In India, the disease is still considered to be emerging. Prevalence surveys of this saprophytic bacterium in environment are under-reported in the country. A major challenge in this case is the specific identification and differentiation of B. pseudomallei from the growing list of species of Burkholderia genus. The objectives of this study included examining the prevalence of B. pseudomallei and near-neighbor species in coastal region of South India and development of a novel detection tool for specific identification and differentiation of Burkholderia species. Briefly, we analyzed soil and water samples collected from Malabar coastal region of Kerala, South India for prevalence of B. pseudomallei. The presumptive Burkholderia isolates were identified using recA PCR assay. The recA PCR assay identified 22 of the total 40 presumptive isolates as Burkholderia strains (22.72% and 77.27% B. pseudomallei and non-pseudomallei Burkholderia respectively). In order to identify each isolate screened, we performed recA and 16S rDNA sequencing. This two genes sequencing revealed that the presumptive isolates included B. pseudomallei, non-pseudomallei Burkholderia as well as non-Burkholderia strains. Furthermore, a gene termed D-beta hydroxybutyrate dehydrogenase (bdha) was studied both in silico and in vitro for accurate detection of Burkholderia genus. The optimized bdha based PCR assay when evaluated on the Burkholderia isolates of this study, it was found to be highly specific (100%) in its detection feature and a clear detection sensitivity of 10 pg/μl of purified gDNA was recorded. Nucleotide sequence variations of bdha among interspecies, as per in silico analysis, ranged from 8 to 29% within the target stretch of 730 bp highlighting the potential utility of bdha sequencing method in specific detection of Burkholderia species. Further, sequencing of the 730 bp bdha PCR amplicon of each Burkholderia strain isolated could differentiate the species and the data was comparable with recA sequence data of the strains. All sequencing results obtained were submitted to NCBI database. Bayesian phylogenetic analysis of bdha in comparison with recA and 16S rDNA showed that the bdha gene provided comparable identification of Burkholderia species. PMID:27632353
Development of a Prokaryotic Universal Primer for Simultaneous Analysis of Bacteria and Archaea Using Next-Generation Sequencing

PubMed Central

Takahashi, Shunsuke; Tomita, Junko; Nishioka, Kaori; Hisada, Takayoshi; Nishijima, Miyuki

2014-01-01

For the analysis of microbial community structure based on 16S rDNA sequence diversity, sensitive and robust PCR amplification of 16S rDNA is a critical step. To obtain accurate microbial composition data, PCR amplification must be free of bias; however, amplifying all 16S rDNA species with equal efficiency from a sample containing a large variety of microorganisms remains challenging. Here, we designed a universal primer based on the V3-V4 hypervariable region of prokaryotic 16S rDNA for the simultaneous detection of Bacteria and Archaea in fecal samples from crossbred pigs (Landrace×Large white×Duroc) using an Illumina MiSeq next-generation sequencer. In-silico analysis showed that the newly designed universal prokaryotic primers matched approximately 98.0% of Bacteria and 94.6% of Archaea rRNA gene sequences in the Ribosomal Database Project database. For each sequencing reaction performed with the prokaryotic universal primer, an average of 69,330 (±20,482) reads were obtained, of which archaeal rRNA genes comprised approximately 1.2% to 3.2% of all prokaryotic reads. In addition, the detection frequency of Bacteria belonging to the phylum Verrucomicrobia, including members of the classes Verrucomicrobiae and Opitutae, was higher in the NGS analysis using the prokaryotic universal primer than that performed with the bacterial universal primer. Importantly, this new prokaryotic universal primer set had markedly lower bias than that of most previously designed universal primers. Our findings demonstrate that the prokaryotic universal primer set designed in the present study will permit the simultaneous detection of Bacteria and Archaea, and will therefore allow for a more comprehensive understanding of microbial community structures in environmental samples. PMID:25144201
NAC transcription factor genes: genome-wide identification, phylogenetic, motif and cis-regulatory element analysis in pigeonpea (Cajanus cajan (L.) Millsp.).

PubMed

Satheesh, Viswanathan; Jagannadham, P Tej Kumar; Chidambaranathan, Parameswaran; Jain, P K; Srinivasan, R

2014-12-01

The NAC (NAM, ATAF and CUC) proteins are plant-specific transcription factors implicated in development and stress responses. In the present study 88 pigeonpea NAC genes were identified from the recently published draft genome of pigeonpea by using homology based and de novo prediction programmes. These sequences were further subjected to phylogenetic, motif and promoter analyses. In motif analysis, highly conserved motifs were identified in the NAC domain and also in the C-terminal region of the NAC proteins. A phylogenetic reconstruction using pigeonpea, Arabidopsis and soybean NAC genes revealed 33 putative stress-responsive pigeonpea NAC genes. Several stress-responsive cis-elements were identified through in silico analysis of the promoters of these putative stress-responsive genes. This analysis is the first report of NAC gene family in pigeonpea and will be useful for the identification and selection of candidate genes associated with stress tolerance.
Depletion of Unwanted Nucleic Acid Templates by Selective Cleavage: LNAzymes, Catalytically Active Oligonucleotides Containing Locked Nucleic Acids, Open a New Window for Detecting Rare Microbial Community Members

PubMed Central

Dolinšek, Jan; Dorninger, Christiane; Lagkouvardos, Ilias; Wagner, Michael

2013-01-01

Many studies of molecular microbial ecology rely on the characterization of microbial communities by PCR amplification, cloning, sequencing, and phylogenetic analysis of genes encoding rRNAs or functional marker enzymes. However, if the established clone libraries are dominated by one or a few sequence types, the cloned diversity is difficult to analyze by random clone sequencing. Here we present a novel approach to deplete unwanted sequence types from complex nucleic acid mixtures prior to cloning and downstream analyses. It employs catalytically active oligonucleotides containing locked nucleic acids (LNAzymes) for the specific cleavage of selected RNA targets. When combined with in vitro transcription and reverse transcriptase PCR, this LNAzyme-based technique can be used with DNA or RNA extracts from microbial communities. The simultaneous application of more than one specific LNAzyme allows the concurrent depletion of different sequence types from the same nucleic acid preparation. This new method was evaluated with defined mixtures of cloned 16S rRNA genes and then used to identify accompanying bacteria in an enrichment culture dominated by the nitrite oxidizer “Candidatus Nitrospira defluvii.” In silico analysis revealed that the majority of publicly deposited rRNA-targeted oligonucleotide probes may be used as specific LNAzymes with no or only minor sequence modifications. This efficient and cost-effective approach will greatly facilitate tasks such as the identification of microbial symbionts in nucleic acid preparations dominated by plastid or mitochondrial rRNA genes from eukaryotic hosts, the detection of contaminants in microbial cultures, and the analysis of rare organisms in microbial communities of highly uneven composition. PMID:23263968
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Discovery of secondary metabolites from Bacillus spp. biocontrol strains using genome mining and mass spectroscopy

USDA-ARS?s Scientific Manuscript database

Genome sequencing, data mining and mass spectrometry were used to identify secondary metabolites produced by several Bacillus spp. biocontrol strains. These biocontrol strains have shown promise in managing Fusarium head blight in wheat. Draft genomes were produced and screened in silico using genom...
Biology in 'silico': The Bioinformatics Revolution.

ERIC Educational Resources Information Center

Bloom, Mark

2001-01-01

Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…
Bioinformatics: A History of Evolution "In Silico"

ERIC Educational Resources Information Center

Ondrej, Vladan; Dvorak, Petr

2012-01-01

Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…
Surface proteome mining for identification of potential vaccine candidates against Campylobacter jejuni: an in silico approach.

PubMed

Mehla, Kusum; Ramana, Jayashree

2017-01-01

Campylobacter jejuni remains a major cause of human gastroenteritis with estimated annual incidence rate of 450 million infections worldwide. C. jejuni is a major burden to public health in both socioeconomically developing and industrialized nations. Virulence determinants involved in C. jejuni pathogenesis are multifactorial in nature and not yet fully understood. Despite the completion of the first C. jejuni genome project in 2000, there are currently no vaccines in the market against this pathogen. Traditional vaccinology approach is an arduous and time extensive task. Omics techniques coupled with sequencing data have engaged researcher's attention to reduce the time and resources applied in the process of vaccine development. Recently, there has been remarkable increase in development of in silico analysis tools for efficiently mining biological information obscured in the genome. In silico approaches have been crucial for combating infectious diseases by accelerating the pace of vaccine development. This study employed a range of bioinformatics approaches for proteome scale identification of peptide vaccine candidates. Whole proteome of C. jejuni was investigated for varied properties like antigenicity, allergenicity, major histocompatibility class (MHC)-peptide interaction, immune cell processivity, HLA distribution, conservancy, and population coverage. Predicted epitopes were further tested for binding in MHC groove using computational docking studies. The predicted epitopes were conserved; covered more than 80 % of the world population and were presented by MHC-I supertypes. We conclude by underscoring that the epitopes predicted are believed to expedite the development of successful vaccines to control or prevent C. jejuni infections albeit the results need to be experimentally validated.
RNA-Seq and Gene Network Analysis Uncover Activation of an ABA-Dependent Signalosome During the Cork Oak Root Response to Drought

PubMed Central

Magalhães, Alexandre P.; Verde, Nuno; Reis, Francisca; Martins, Inês; Costa, Daniela; Lino-Neto, Teresa; Castro, Pedro H.; Tavares, Rui M.; Azevedo, Herlânder

2016-01-01

Quercus suber (cork oak) is a West Mediterranean species of key economic interest, being extensively explored for its ability to generate cork. Like other Mediterranean plants, Q. suber is significantly threatened by climatic changes, imposing the need to quickly understand its physiological and molecular adaptability to drought stress imposition. In the present report, we uncovered the differential transcriptome of Q. suber roots exposed to long-term drought, using an RNA-Seq approach. 454-sequencing reads were used to de novo assemble a reference transcriptome, and mapping of reads allowed the identification of 546 differentially expressed unigenes. These were enriched in both effector genes (e.g., LEA, chaperones, transporters) as well as regulatory genes, including transcription factors (TFs) belonging to various different classes, and genes associated with protein turnover. To further extend functional characterization, we identified the orthologs of differentially expressed unigenes in the model species Arabidopsis thaliana, which then allowed us to perform in silico functional inference, including gene network analysis for protein function, protein subcellular localization and gene co-expression, and in silico enrichment analysis for TFs and cis-elements. Results indicated the existence of extensive transcriptional regulatory events, including activation of ABA-responsive genes and ABF-dependent signaling. We were then able to establish that a core ABA-signaling pathway involving PP2C-SnRK2-ABF components was induced in stressed Q. suber roots, identifying a key mechanism in this species’ response to drought. PMID:26793200
PNA-COMBO-FISH: From combinatorial probe design in silico to vitality compatible, specific labelling of gene targets in cell nuclei

DOE Office of Scientific and Technical Information (OSTI.GOV)

Müller, Patrick; Rößler, Jens; Schwarz-Finsterle, Jutta

Recently, advantages concerning targeting specificity of PCR constructed oligonucleotide FISH probes in contrast to established FISH probes, e.g. BAC clones, have been demonstrated. These techniques, however, are still using labelling protocols with DNA denaturing steps applying harsh heat treatment with or without further denaturing chemical agents. COMBO-FISH (COMBinatorial Oligonucleotide FISH) allows the design of specific oligonucleotide probe combinations in silico. Thus, being independent from primer libraries or PCR laboratory conditions, the probe sequences extracted by computer sequence data base search can also be synthesized as single stranded PNA-probes (Peptide Nucleic Acid probes). Gene targets can be specifically labelled with atmore » least about 20 PNA-probes obtaining visibly background free specimens. By using appropriately designed triplex forming oligonucleotides, the denaturing procedures can completely be omitted. These results reveal a significant step towards oligonucleotide-FISH maintaining the 3D-nanostructure and even the viability of the cell target. The method is demonstrated with the detection of Her2/neu and GRB7 genes, which are indicators in breast cancer diagnosis and therapy. - Highlights: • Denaturation free protocols preserve 3D architecture of chromosomes and nuclei. • Labelling sets are determined in silico for duplex and triplex binding. • Probes are produced chemically with freely chosen backbones and base variants. • Peptide nucleic acid backbones reduce hindering charge interactions. • Intercalating side chains stabilize binding of short oligonucleotides.« less
Distinct profiling of antimicrobial peptide families

PubMed Central

Khamis, Abdullah M.; Essack, Magbubah; Gao, Xin; Bajic, Vladimir B.

2015-01-01

Motivation: The increased prevalence of multi-drug resistant (MDR) pathogens heightens the need to design new antimicrobial agents. Antimicrobial peptides (AMPs) exhibit broad-spectrum potent activity against MDR pathogens and kills rapidly, thus giving rise to AMPs being recognized as a potential substitute for conventional antibiotics. Designing new AMPs using current in-silico approaches is, however, challenging due to the absence of suitable models, large number of design parameters, testing cycles, production time and cost. To date, AMPs have merely been categorized into families according to their primary sequences, structures and functions. The ability to computationally determine the properties that discriminate AMP families from each other could help in exploring the key characteristics of these families and facilitate the in-silico design of synthetic AMPs. Results: Here we studied 14 AMP families and sub-families. We selected a specific description of AMP amino acid sequence and identified compositional and physicochemical properties of amino acids that accurately distinguish each AMP family from all other AMPs with an average sensitivity, specificity and precision of 92.88%, 99.86% and 95.96%, respectively. Many of our identified discriminative properties have been shown to be compositional or functional characteristics of the corresponding AMP family in literature. We suggest that these properties could serve as guides for in-silico methods in design of novel synthetic AMPs. The methodology we developed is generic and has a potential to be applied for characterization of any protein family. Contact: vladimir.bajic@kaust.edu.sa Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388148
In Silico PCR Tools for a Fast Primer, Probe, and Advanced Searching.

PubMed

Kalendar, Ruslan; Muterko, Alexandr; Shamekova, Malika; Zhambakin, Kabyl

2017-01-01

The polymerase chain reaction (PCR) is fundamental to molecular biology and is the most important practical molecular technique for the research laboratory. The principle of this technique has been further used and applied in plenty of other simple or complex nucleic acid amplification technologies (NAAT). In parallel to laboratory "wet bench" experiments for nucleic acid amplification technologies, in silico or virtual (bioinformatics) approaches have been developed, among which in silico PCR analysis. In silico NAAT analysis is a useful and efficient complementary method to ensure the specificity of primers or probes for an extensive range of PCR applications from homology gene discovery, molecular diagnosis, DNA fingerprinting, and repeat searching. Predicting sensitivity and specificity of primers and probes requires a search to determine whether they match a database with an optimal number of mismatches, similarity, and stability. In the development of in silico bioinformatics tools for nucleic acid amplification technologies, the prospects for the development of new NAAT or similar approaches should be taken into account, including forward-looking and comprehensive analysis that is not limited to only one PCR technique variant. The software FastPCR and the online Java web tool are integrated tools for in silico PCR of linear and circular DNA, multiple primer or probe searches in large or small databases and for advanced search. These tools are suitable for processing of batch files that are essential for automation when working with large amounts of data. The FastPCR software is available for download at http://primerdigital.com/fastpcr.html and the online Java version at http://primerdigital.com/tools/pcr.html .
Assessment of the predictive accuracy of five in silico prediction tools, alone or in combination, and two metaservers to classify long QT syndrome gene mutations.

PubMed

Leong, Ivone U S; Stuckey, Alexander; Lai, Daniel; Skinner, Jonathan R; Love, Donald R

2015-05-13

Long QT syndrome (LQTS) is an autosomal dominant condition predisposing to sudden death from malignant arrhythmia. Genetic testing identifies many missense single nucleotide variants of uncertain pathogenicity. Establishing genetic pathogenicity is an essential prerequisite to family cascade screening. Many laboratories use in silico prediction tools, either alone or in combination, or metaservers, in order to predict pathogenicity; however, their accuracy in the context of LQTS is unknown. We evaluated the accuracy of five in silico programs and two metaservers in the analysis of LQTS 1-3 gene variants. The in silico tools SIFT, PolyPhen-2, PROVEAN, SNPs&GO and SNAP, either alone or in all possible combinations, and the metaservers Meta-SNP and PredictSNP, were tested on 312 KCNQ1, KCNH2 and SCN5A gene variants that have previously been characterised by either in vitro or co-segregation studies as either "pathogenic" (283) or "benign" (29). The accuracy, sensitivity, specificity and Matthews Correlation Coefficient (MCC) were calculated to determine the best combination of in silico tools for each LQTS gene, and when all genes are combined. The best combination of in silico tools for KCNQ1 is PROVEAN, SNPs&GO and SIFT (accuracy 92.7%, sensitivity 93.1%, specificity 100% and MCC 0.70). The best combination of in silico tools for KCNH2 is SIFT and PROVEAN or PROVEAN, SNPs&GO and SIFT. Both combinations have the same scores for accuracy (91.1%), sensitivity (91.5%), specificity (87.5%) and MCC (0.62). In the case of SCN5A, SNAP and PROVEAN provided the best combination (accuracy 81.4%, sensitivity 86.9%, specificity 50.0%, and MCC 0.32). When all three LQT genes are combined, SIFT, PROVEAN and SNAP is the combination with the best performance (accuracy 82.7%, sensitivity 83.0%, specificity 80.0%, and MCC 0.44). Both metaservers performed better than the single in silico tools; however, they did not perform better than the best performing combination of in silico tools. The combination of in silico tools with the best performance is gene-dependent. The in silico tools reported here may have some value in assessing variants in the KCNQ1 and KCNH2 genes, but caution should be taken when the analysis is applied to SCN5A gene variants.
Sequence homology between HLA-bound cytomegalovirus and human peptides: A potential trigger for alloreactivity

PubMed Central

Koparde, Vishal N.; Jameson-Lee, Maximilian; Elnasseh, Abdelrhman G.; Scalora, Allison F.; Kobulnicky, David J.; Serrano, Myrna G.; Roberts, Catherine H.; Buck, Gregory A.; Neale, Michael C.; Nixon, Daniel E.; Toor, Amir A.

2017-01-01

Human cytomegalovirus (hCMV) reactivation may often coincide with the development of graft-versus-host-disease (GVHD) in stem cell transplantation (SCT). Seventy seven SCT donor-recipient pairs (DRP) (HLA matched unrelated donor (MUD), n = 50; matched related donor (MRD), n = 27) underwent whole exome sequencing to identify single nucleotide polymorphisms (SNPs) generating alloreactive peptide libraries for each DRP (9-mer peptide-HLA complexes); Human CMV CROSS (Cross-Reactive Open Source Sequence) database was compiled from NCBI; HLA class I binding affinity for each DRPs HLA was calculated by NetMHCpan 2.8 and hCMV- derived 9-mers algorithmically compared to the alloreactive peptide-HLA complex libraries. Short consecutive (≥6) amino acid (AA) sequence homology matching hCMV to recipient peptides was considered for HLA-bound-peptide (IC50<500nM) cross reactivity. Of the 70,686 hCMV 9-mers contained within the hCMV CROSS database, an average of 29,658 matched the MRD DRP alloreactive peptides and 52,910 matched MUD DRP peptides (p<0.001). In silico analysis revealed multiple high affinity, immunogenic CMV-Human peptide matches (IC50<500 nM) expressed in GVHD-affected tissue-specific manner. hCMV+GVHD was found in 18 patients, 13 developing hCMV viremia before GVHD onset. Analysis of patients with GVHD identified potential cross reactive peptide expression within affected organs. We propose that hCMV peptide sequence homology with human alloreactive peptides may contribute to the pathophysiology of GVHD. PMID:28800601
Sequence homology between HLA-bound cytomegalovirus and human peptides: A potential trigger for alloreactivity.

PubMed

Hall, Charles E; Koparde, Vishal N; Jameson-Lee, Maximilian; Elnasseh, Abdelrhman G; Scalora, Allison F; Kobulnicky, David J; Serrano, Myrna G; Roberts, Catherine H; Buck, Gregory A; Neale, Michael C; Nixon, Daniel E; Toor, Amir A

2017-01-01

Human cytomegalovirus (hCMV) reactivation may often coincide with the development of graft-versus-host-disease (GVHD) in stem cell transplantation (SCT). Seventy seven SCT donor-recipient pairs (DRP) (HLA matched unrelated donor (MUD), n = 50; matched related donor (MRD), n = 27) underwent whole exome sequencing to identify single nucleotide polymorphisms (SNPs) generating alloreactive peptide libraries for each DRP (9-mer peptide-HLA complexes); Human CMV CROSS (Cross-Reactive Open Source Sequence) database was compiled from NCBI; HLA class I binding affinity for each DRPs HLA was calculated by NetMHCpan 2.8 and hCMV- derived 9-mers algorithmically compared to the alloreactive peptide-HLA complex libraries. Short consecutive (≥6) amino acid (AA) sequence homology matching hCMV to recipient peptides was considered for HLA-bound-peptide (IC50<500nM) cross reactivity. Of the 70,686 hCMV 9-mers contained within the hCMV CROSS database, an average of 29,658 matched the MRD DRP alloreactive peptides and 52,910 matched MUD DRP peptides (p<0.001). In silico analysis revealed multiple high affinity, immunogenic CMV-Human peptide matches (IC50<500 nM) expressed in GVHD-affected tissue-specific manner. hCMV+GVHD was found in 18 patients, 13 developing hCMV viremia before GVHD onset. Analysis of patients with GVHD identified potential cross reactive peptide expression within affected organs. We propose that hCMV peptide sequence homology with human alloreactive peptides may contribute to the pathophysiology of GVHD.

Characterization of a novel ADAM protease expressed by Pneumocystis carinii.

PubMed

Kennedy, Cassie C; Kottom, Theodore J; Limper, Andrew H

2009-08-01

Pneumocystis species are opportunistic fungal pathogens that cause severe pneumonia in immunocompromised hosts. Recent evidence has suggested that unidentified proteases are involved in Pneumocystis life cycle regulation. Proteolytically active ADAM (named for "a disintegrin and metalloprotease") family molecules have been identified in some fungal organisms, such as Aspergillus fumigatus and Schizosaccharomyces pombe, and some have been shown to participate in life cycle regulation. Accordingly, we sought to characterize ADAM-like molecules in the fungal opportunistic pathogen, Pneumocystis carinii (PcADAM). After an in silico search of the P. carinii genomic sequencing project identified a 329-bp partial sequence with homology to known ADAM proteins, the full-length PcADAM sequence was obtained by PCR extension cloning, yielding a final coding sequence of 1,650 bp. Sequence analysis detected the presence of a typical ADAM catalytic active site (HEXXHXXGXXHD). Expression of PcADAM over the Pneumocystis life cycle was analyzed by Northern blot. Southern and contour-clamped homogenous electronic field blot analysis demonstrated its presence in the P. carinii genome. Expression of PcADAM was observed to be increased in Pneumocystis cysts compared to trophic forms. The full-length gene was subsequently cloned and heterologously expressed in Saccharomyces cerevisiae. Purified PcADAMp protein was proteolytically active in casein zymography, requiring divalent zinc. Furthermore, native PcADAMp extracted directly from freshly isolated Pneumocystis organisms also exhibited protease activity. This is the first report of protease activity attributable to a specific, characterized protein in the clinically important opportunistic fungal pathogen Pneumocystis.
Low Maternal Microbiota Sharing across Gut, Breast Milk and Vagina, as Revealed by 16S rRNA Gene and Reduced Metagenomic Sequencing.

PubMed

Avershina, Ekaterina; Angell, Inga Leena; Simpson, Melanie; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut

2018-05-01

The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types ( Bifidobacterium longum and Enterococcus faecalis ). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis . We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation.
Low Maternal Microbiota Sharing across Gut, Breast Milk and Vagina, as Revealed by 16S rRNA Gene and Reduced Metagenomic Sequencing

PubMed Central

Angell, Inga Leena; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut

2018-01-01

The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types (Bifidobacterium longum and Enterococcus faecalis). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis. We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation. PMID:29724017
Whole genome sequencing for deciphering the resistome of Chryseobacterium indologenes, an emerging multidrug-resistant bacterium isolated from a cystic fibrosis patient in Marseille, France.

PubMed

Cimmino, T; Rolain, J-M

2016-07-01

We decipher the resistome of Chryseobacterium indologenes MARS15, an emerging multidrug-resistant clinical strain, using the whole genome sequencing strategy. The bacterium was isolated from the sputum of a hospitalized patient with cystic fibrosis in the Timone Hospital in Marseille, France. Genome sequencing was done with Illumina MiSeq using a paired-end strategy. The in silico analysis was done by RAST, the resistome by the ARG-ANNOT database and detection of polyketide synthase (PKS) by ANTISMAH. The genome size of C. indologenes MARS15 is 4 972 580 bp with 36.4% GC content. This multidrug-resistant bacterium was resistant to all β-lactams, including imipenem, and also to colistin. The resistome of C. indologenes MARS15 includes Ambler class A and B β-lactams encoding bla CIA and bla IND-2 genes and MBL (metallo-β-lactamase) genes, the CAT (chloramphenicol acetyltransferase) gene and the multidrug efflux pump AcrB. Specific features include the presence of an urease operon, an intact prophage and a carotenoid biosynthesis pathway. Interestingly, we report for the first time in C. indologenes a PKS cluster that might be responsible for secondary metabolite biosynthesis, similar to erythromycin. The whole genome sequence analysis provides insight into the resistome and the discovery of new details, such as the PKS cluster.
Discovery of Anti-Hypertensive Oligopeptides from Adlay Based on In Silico Proteolysis and Virtual Screening

PubMed Central

Qiao, Liansheng; Li, Bin; Chen, Yankun; Li, Lingling; Chen, Xi; Wang, Lingzhi; Lu, Fang; Luo, Ganggang; Li, Gongyu; Zhang, Yanling

2016-01-01

Adlay (Coix larchryma-jobi L.) was the commonly used Traditional Chinese Medicine (TCM) with high content of seed storage protein. The hydrolyzed bioactive oligopeptides of adlay have been proven to be anti-hypertensive effective components. However, the structures and anti-hypertensive mechanism of bioactive oligopeptides from adlay were not clear. To discover the definite anti-hypertensive oligopeptides from adlay, in silico proteolysis and virtual screening were implemented to obtain potential oligopeptides, which were further identified by biochemistry assay and molecular dynamics simulation. In this paper, ten sequences of adlay prolamins were collected and in silico hydrolyzed to construct the oligopeptide library with 134 oligopeptides. This library was reverse screened by anti-hypertensive pharmacophore database, which was constructed by our research team and contained ten anti-hypertensive targets. Angiotensin-I converting enzyme (ACE) was identified as the main potential target for the anti-hypertensive activity of adlay oligopeptides. Three crystal structures of ACE were utilized for docking studies and 19 oligopeptides were finally identified with potential ACE inhibitory activity. According to mapping features and evaluation indexes of pharmacophore and docking, three oligopeptides were selected for biochemistry assay. An oligopeptide sequence, NPATY (IC50 = 61.88 ± 2.77 µM), was identified as the ACE inhibitor by reverse-phase high performance liquid chromatography (RP-HPLC) assay. Molecular dynamics simulation of NPATY was further utilized to analyze interactive bonds and key residues. ALA354 was identified as a key residue of ACE inhibitors. Hydrophobic effect of VAL518 and electrostatic effects of HIS383, HIS387, HIS513 and Zn2+ were also regarded as playing a key role in inhibiting ACE activities. This study provides a research strategy to explore the pharmacological mechanism of Traditional Chinese Medicine (TCM) proteins based on in silico proteolysis and virtual screening, which could be beneficial to reveal the pharmacological action of TCM proteins and provide new lead compounds for peptides-based drug design. PMID:27983650
Genomics of high molecular weight plasmids isolated from an on-farm biopurification system.

PubMed

Martini, María C; Wibberg, Daniel; Lozano, Mauricio; Torres Tejerizo, Gonzalo; Albicoro, Francisco J; Jaenicke, Sebastian; van Elsas, Jan Dirk; Petroni, Alejandro; Garcillán-Barcia, M Pilar; de la Cruz, Fernando; Schlüter, Andreas; Pühler, Alfred; Pistorio, Mariano; Lagares, Antonio; Del Papa, María F

2016-06-20

The use of biopurification systems (BPS) constitutes an efficient strategy to eliminate pesticides from polluted wastewaters from farm activities. BPS environments contain a high microbial density and diversity facilitating the exchange of information among bacteria, mediated by mobile genetic elements (MGEs), which play a key role in bacterial adaptation and evolution in such environments. Here we sequenced and characterized high-molecular-weight plasmids from a bacterial collection of an on-farm BPS. The high-throughput-sequencing of the plasmid pool yielded a total of several Mb sequence information. Assembly of the sequence data resulted in six complete replicons. Using in silico analyses we identified plasmid replication genes whose encoding proteins represent 13 different Pfam families, as well as proteins involved in plasmid conjugation, indicating a large diversity of plasmid replicons and suggesting the occurrence of horizontal gene transfer (HGT) events within the habitat analyzed. In addition, genes conferring resistance to 10 classes of antimicrobial compounds and those encoding enzymes potentially involved in pesticide and aromatic hydrocarbon degradation were found. Global analysis of the plasmid pool suggest that the analyzed BPS represents a key environment for further studies addressing the dissemination of MGEs carrying catabolic genes and pathway assembly regarding degradation capabilities.
Discovery and molecular characterization of a new cryptovirus dsRNA genome from Japanese persimmon through conventional cloning and high-throughput sequencing.

PubMed

Morelli, M; Chiumenti, M; De Stradis, A; La Notte, P; Minafra, A

2015-02-01

Through the application of next generation sequencing, in synergy with conventional cloning of DOP-PCR fragments, two double-stranded RNA (dsRNA) molecules of about 1.5 kbp in size were isolated from leaf tissue of a Japanese persimmon (accession SSPI) from Apulia (southern Italy) showing veinlets necrosis. High-throughput sequencing allowed whole genome sequence assembly, yielding a 1,577 and a 1,491 bp contigs identified as dsRNA-1 and dsRNA-2 of a previously undescribed virus, provisionally named as Persimmon cryptic virus (PeCV). In silico analysis showed that both dsRNA fragments were monocistronic and comprised the RNA-dependent RNA polymerase (RdRp) and the capsid protein (CP) genes, respectively. Phylogenetic reconstruction revealed a close relationship of these dsRNAs with those of cryptoviruses described in woody and herbaceous hosts, recently gathered in genus Deltapartitivirus. Virus-specific primers for RT-PCR, designed in the CP cistron, detected viral RNAs also in symptomless persimmon trees sampled from the same geographical area of SSPI, thus proving that PeCV infection may be fairly common and presumably latent.
TLR and IMD signaling pathways from Caligus rogercresseyi (Crustacea: Copepoda): in silico gene expression and SNPs discovery.

PubMed

Valenzuela-Muñoz, V; Gallardo-Escárate, C

2014-02-01

The Toll and IMD signaling pathways represent one of the first lines of innate immune defense in invertebrates like Drosophila. However, for crustaceans like Caligus rogercresseyi, there is very little genomic information and, consequently, understanding of immune mechanisms. Massive sequencing data obtained for three developmental stages of C. rogercresseyi were used to evaluate in silico the expression patterns and presence of SNPs variants in genes involved in the Toll and IMD pathways. Through RNA-seq analysis, which used 20 contigs corresponding to relevant genes of the Toll and IMD pathways, an overexpression of genes linked to the Toll pathway, such as toll3 and Dorsal, were observed in the copepod stage. For the chalimus and adult stages, overexpression of genes in both pathways, such as Akirin and Tollip and IAP and Toll9, respectively, were observed. On the other hand, PCA statistical analysis inferred that in the chalimus and adult stages, the immune response mechanism was more developed, as evidenced by a relation between these two stages and the genes of both pathways. Moreover, 136 SNPs were identified for 20 contigs in genes of the Toll and IMD pathways. This study provides transcriptomic information about the immune response mechanisms of Caligus, thus providing a foundation for the development of new control strategies through blocking the innate immune response. Copyright © 2013 Elsevier Ltd. All rights reserved.
In Silico Identification of Highly Conserved Epitopes of Influenza A H1N1, H2N2, H3N2, and H5N1 with Diagnostic and Vaccination Potential

PubMed Central

Muñoz-Medina, José Esteban; Sánchez-Vallejo, Carlos Javier; Méndez-Tenorio, Alfonso; Monroy-Muñoz, Irma Eloísa; Angeles-Martínez, Javier; Santos Coy-Arechavaleta, Andrea; Santacruz-Tinoco, Clara Esperanza; González-Ibarra, Joaquín; Anguiano-Hernández, Yu-Mei; González-Bonilla, César Raúl; Ramón-Gallegos, Eva; Díaz-Quiñonez, José Alberto

2015-01-01

The unpredictable, evolutionary nature of the influenza A virus (IAV) is the primary problem when generating a vaccine and when designing diagnostic strategies; thus, it is necessary to determine the constant regions in viral proteins. In this study, we completed an in silico analysis of the reported epitopes of the 4 IAV proteins that are antigenically most significant (HA, NA, NP, and M2) in the 3 strains with the greatest world circulation in the last century (H1N1, H2N2, and H3N2) and in one of the main aviary subtypes responsible for zoonosis (H5N1). For this purpose, the HMMER program was used to align 3,016 epitopes reported in the Immune Epitope Database and Analysis Resource (IEDB) and distributed in 34,294 stored sequences in the Pfam database. Eighteen epitopes were identified: 8 in HA, 5 in NA, 3 in NP, and 2 in M2. These epitopes have remained constant since they were first identified (~91 years) and are present in strains that have circulated on 5 continents. These sites could be targets for vaccination design strategies based on epitopes and/or as markers in the implementation of diagnostic techniques. PMID:26346523
Identification of copy number variation in French dairy and beef breeds using next-generation sequencing.

PubMed

Letaief, Rabia; Rebours, Emmanuelle; Grohs, Cécile; Meersseman, Cédric; Fritz, Sébastien; Trouilh, Lidwine; Esquerré, Diane; Barbieri, Johanna; Klopp, Christophe; Philippe, Romain; Blanquet, Véronique; Boichard, Didier; Rocha, Dominique; Boussaha, Mekki

2017-10-24

Copy number variations (CNV) are known to play a major role in genetic variability and disease pathogenesis in several species including cattle. In this study, we report the identification and characterization of CNV in eight French beef and dairy breeds using whole-genome sequence data from 200 animals. Bioinformatics analyses to search for CNV were carried out using four different but complementary tools and we validated a subset of the CNV by both in silico and experimental approaches. We report the identification and localization of 4178 putative deletion-only, duplication-only and CNV regions, which cover 6% of the bovine autosomal genome; they were validated by two in silico approaches and/or experimentally validated using array-based comparative genomic hybridization and single nucleotide polymorphism genotyping arrays. The size of these variants ranged from 334 bp to 7.7 Mb, with an average size of ~ 54 kb. Of these 4178 variants, 3940 were deletions, 67 were duplications and 171 corresponded to both deletions and duplications, which were defined as potential CNV regions. Gene content analysis revealed that, among these variants, 1100 deletions and duplications encompassed 1803 known genes, which affect a wide spectrum of molecular functions, and 1095 overlapped with known QTL regions. Our study is a large-scale survey of CNV in eight French dairy and beef breeds. These CNV will be useful to study the link between genetic variability and economically important traits, and to improve our knowledge on the genomic architecture of cattle.
Genome mining reveals the genus Xanthomonas to be a promising reservoir for new bioactive non-ribosomally synthesized peptides

PubMed Central

2013-01-01

Background Various bacteria can use non-ribosomal peptide synthesis (NRPS) to produce peptides or other small molecules. Conserved features within the NRPS machinery allow the type, and sometimes even the structure, of the synthesized polypeptide to be predicted. Thus, bacterial genome mining via in silico analyses of NRPS genes offers an attractive opportunity to uncover new bioactive non-ribosomally synthesized peptides. Xanthomonas is a large genus of Gram-negative bacteria that cause disease in hundreds of plant species. To date, the only known small molecule synthesized by NRPS in this genus is albicidin produced by Xanthomonas albilineans. This study aims to estimate the biosynthetic potential of Xanthomonas spp. by in silico analyses of NRPS genes with unknown function recently identified in the sequenced genomes of X. albilineans and related species of Xanthomonas. Results We performed in silico analyses of NRPS genes present in all published genome sequences of Xanthomonas spp., as well as in unpublished draft genome sequences of Xanthomonas oryzae pv. oryzae strain BAI3 and Xanthomonas spp. strain XaS3. These two latter strains, together with X. albilineans strain GPE PC73 and X. oryzae pv. oryzae strains X8-1A and X11-5A, possess novel NRPS gene clusters and share related NRPS-associated genes such as those required for the biosynthesis of non-proteinogenic amino acids or the secretion of peptides. In silico prediction of peptide structures according to NRPS architecture suggests eight different peptides, each specific to its producing strain. Interestingly, these eight peptides cannot be assigned to any known gene cluster or related to known compounds from natural product databases. PCR screening of a collection of 94 plant pathogenic bacteria indicates that these novel NRPS gene clusters are specific to the genus Xanthomonas and are also present in Xanthomonas translucens and X. oryzae pv. oryzicola. Further genome mining revealed other novel NRPS genes specific to X. oryzae pv. oryzicola or Xanthomonas sacchari. Conclusions This study revealed the significant potential of the genus Xanthomonas to produce new non-ribosomally synthesized peptides. Interestingly, this biosynthetic potential seems to be specific to strains of Xanthomonas associated with monocotyledonous plants, suggesting a putative involvement of non-ribosomally synthesized peptides in plant-bacteria interactions. PMID:24069909
Splice-site mutations identified in PDE6A responsible for retinitis pigmentosa in consanguineous Pakistani families

PubMed Central

Khan, Shahid Y.; Ali, Shahbaz; Naeem, Muhammad Asif; Khan, Shaheen N.; Husnain, Tayyab; Butt, Nadeem H.; Qazi, Zaheeruddin A.; Akram, Javed; Riazuddin, Sheikh; Ayyagari, Radha; Hejtmancik, J. Fielding

2015-01-01

Purpose This study was conducted to localize and identify causal mutations associated with autosomal recessive retinitis pigmentosa (RP) in consanguineous familial cases of Pakistani origin. Methods Ophthalmic examinations that included funduscopy and electroretinography (ERG) were performed to confirm the affectation status. Blood samples were collected from all participating individuals, and genomic DNA was extracted. A genome-wide scan was performed, and two-point logarithm of odds (LOD) scores were calculated. Sanger sequencing was performed to identify the causative variants. Subsequently, we performed whole exome sequencing to rule out the possibility of a second causal variant within the linkage interval. Sequence conservation was performed with alignment analyses of PDE6A orthologs, and in silico splicing analysis was completed with Human Splicing Finder version 2.4.1. Results A large multigenerational consanguineous family diagnosed with early-onset RP was ascertained. An ophthalmic clinical examination consisting of fundus photography and electroretinography confirmed the diagnosis of RP. A genome-wide scan was performed, and suggestive two-point LOD scores were observed with markers on chromosome 5q. Haplotype analyses identified the region; however, the region did not segregate with the disease phenotype in the family. Subsequently, we performed a second genome-wide scan that excluded the entire genome except the chromosome 5q region harboring PDE6A. Next-generation whole exome sequencing identified a splice acceptor site mutation in intron 16: c.2028–1G>A, which was completely conserved in PDE6A orthologs and was absent in ethnically matched 350 control chromosomes, the 1000 Genomes database, and the NHLBI Exome Sequencing Project. Subsequently, we investigated our entire cohort of RP familial cases and identified a second family who harbored a splice acceptor site mutation in intron 10: c.1408–2A>G. In silico analysis suggested that these mutations will result in the elimination of wild-type splice acceptor sites that would result in either skipping of the respective exon or the creation of a new cryptic splice acceptor site; both possibilities would result in retinal photoreceptor cells that lack PDE6A wild-type protein. Conclusions we report two splice acceptor site variations in PDE6A in consanguineous Pakistani families who manifested cardinal symptoms of RP. Taken together with our previously published work, our data suggest that mutations in PDE6A account for about 2% of the total genetic load of RP in our cohort and possibly in the Pakistani population as well. PMID:26321862
In silico mapping of quantitative trait loci in maize.

PubMed

Parisseaux, B; Bernardo, R

2004-08-01

Quantitative trait loci (QTL) are most often detected through designed mapping experiments. An alternative approach is in silico mapping, whereby genes are detected using existing phenotypic and genomic databases. We explored the usefulness of in silico mapping via a mixed-model approach in maize (Zea mays L.). Specifically, our objective was to determine if the procedure gave results that were repeatable across populations. Multilocation data were obtained from the 1995-2002 hybrid testing program of Limagrain Genetics in Europe. Nine heterotic patterns comprised 22,774 single crosses. These single crosses were made from 1,266 inbreds that had data for 96 simple sequence repeat (SSR) markers. By a mixed-model approach, we estimated the general combining ability effects associated with marker alleles in each heterotic pattern. The numbers of marker loci with significant effects--37 for plant height, 24 for smut [Ustilago maydis (DC.) Cda.] resistance, and 44 for grain moisture--were consistent with previous results from designed mapping experiments. Each trait had many loci with small effects and few loci with large effects. For smut resistance, a marker in bin 8.05 on chromosome 8 had a significant effect in seven (out of a maximum of 18) instances. For this major QTL, the maximum effect of an allele substitution ranged from 5.4% to 41.9%, with an average of 22.0%. We conclude that in silico mapping via a mixed-model approach can detect associations that are repeatable across different populations. We speculate that in silico mapping will be more useful for gene discovery than for selection in plant breeding programs. Copyright 2004 Springer-Verlag
LongISLND: in silico sequencing of lengthy and noisy datatypes.

PubMed

Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y K

2016-12-15

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
New universal ITS2 primers for high-resolution herbivory analyses using DNA metabarcoding in both tropical and temperate zones.

PubMed

Moorhouse-Gann, Rosemary J; Dunn, Jenny C; de Vere, Natasha; Goder, Martine; Cole, Nik; Hipperson, Helen; Symondson, William O C

2018-06-04

DNA metabarcoding is a rapidly growing technique for obtaining detailed dietary information. Current metabarcoding methods for herbivory, using a single locus, can lack taxonomic resolution for some applications. We present novel primers for the second internal transcribed spacer of nuclear ribosomal DNA (ITS2) designed for dietary studies in Mauritius and the UK, which have the potential to give unrivalled taxonomic coverage and resolution from a short-amplicon barcode. In silico testing used three databases of plant ITS2 sequences from UK and Mauritian floras (native and introduced) totalling 6561 sequences from 1790 species across 174 families. Our primers were well-matched in silico to 88% of species, providing taxonomic resolution of 86.1%, 99.4% and 99.9% at the species, genus and family levels, respectively. In vitro, the primers amplified 99% of Mauritian (n = 169) and 100% of UK (n = 33) species, and co-amplified multiple plant species from degraded faecal DNA from reptiles and birds in two case studies. For the ITS2 region, we advocate taxonomic assignment based on best sequence match instead of a clustering approach. With short amplicons of 187-387 bp, these primers are suitable for metabarcoding plant DNA from faecal samples, across a broad geographic range, whilst delivering unparalleled taxonomic resolution.
Species-Level Phylogeny and Polyploid Relationships in Hordeum (Poaceae) Inferred by Next-Generation Sequencing and In Silico Cloning of Multiple Nuclear Loci.

PubMed

Brassac, Jonathan; Blattner, Frank R

2015-09-01

Polyploidization is an important speciation mechanism in the barley genus Hordeum. To analyze evolutionary changes after allopolyploidization, knowledge of parental relationships is essential. One chloroplast and 12 nuclear single-copy loci were amplified by polymerase chain reaction (PCR) in all Hordeum plus six out-group species. Amplicons from each of 96 individuals were pooled, sheared, labeled with individual-specific barcodes and sequenced in a single run on a 454 platform. Reference sequences were obtained by cloning and Sanger sequencing of all loci for nine supplementary individuals. The 454 reads were assembled into contigs representing the 13 loci and, for polyploids, also homoeologues. Phylogenetic analyses were conducted for all loci separately and for a concatenated data matrix of all loci. For diploid taxa, a Bayesian concordance analysis and a coalescent-based dated species tree was inferred from all gene trees. Chloroplast matK was used to determine the maternal parent in allopolyploid taxa. The relative performance of different multilocus analyses in the presence of incomplete lineage sorting and hybridization was also assessed. The resulting multilocus phylogeny reveals for the first time species phylogeny and progenitor-derivative relationships of all di- and polyploid Hordeum taxa within a single analysis. Our study proves that it is possible to obtain a multilocus species-level phylogeny for di- and polyploid taxa by combining PCR with next-generation sequencing, without cloning and without creating a heavy load of sequence data. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
A novel site-specific recombination system derived from bacteriophage phiMR11.

PubMed

Rashel, Mohammad; Uchiyama, Jumpei; Ujihara, Takako; Takemura, Iyo; Hoshiba, Hiroshi; Matsuzaki, Shigenobu

2008-04-04

We report identification of a novel site-specific DNA recombination system that functions in both in vivo and in vitro, derived from lysogenic Staphylococcus aureus phage phiMR11. In silico analysis of the phiMR11 genome indicated orf1 as a putative integrase gene. Phage and bacterial attachment sites (attP and attB, respectively) and attachment junctions were determined and their nucleotide sequences decoded. Sequences of attP and attB were mostly different to each other except for a two bp common core that was the crossover point. We found several inverted repeats adjacent to the core sequence of attP as potential protein binding sites. The precise and efficient integration properties of phiMR11 integrase were shown on attP and attB in Escherichia coli and the minimum size of attP was found to be 34bp. In in vitro assays using crude or purified integrase, only buffer and substrate DNAs were required for the recombination reaction, indicating that other bacterially encoded factors are not essential for activity.
Whole-exome sequencing revealed two novel mutations in Usher syndrome.

PubMed

Koparir, Asuman; Karatas, Omer Faruk; Atayoglu, Ali Timucin; Yuksel, Bayram; Sagiroglu, Mahmut Samil; Seven, Mehmet; Ulucan, Hakan; Yuksel, Adnan; Ozen, Mustafa

2015-06-01

Usher syndrome is a clinically and genetically heterogeneous autosomal recessive inherited disorder accompanied by hearing loss and retinitis pigmentosa (RP). Since the associated genes are various and quite large, we utilized whole-exome sequencing (WES) as a diagnostic tool to identify the molecular basis of Usher syndrome. DNA from a 12-year-old male diagnosed with Usher syndrome was analyzed by WES. Mutations detected were confirmed by Sanger sequencing. The pathogenicity of these mutations was determined by in silico analysis. A maternally inherited deleterious frameshift mutation, c.14439_14454del in exon 66 and a paternally inherited non-sense c.10830G>A stop-gain SNV in exon 55 of USH2A were found as two novel compound heterozygous mutations. Both of these mutations disrupt the C terminal of USH2A protein. As a result, WES revealed two novel compound heterozygous mutations in a Turkish USH2A patient. This approach gave us an opportunity to have an appropriate diagnosis and provide genetic counseling to the family within a reasonable time. Copyright © 2015 Elsevier B.V. All rights reserved.
Diversity amongst trigeminal neurons revealed by high throughput single cell sequencing

PubMed Central

Nguyen, Minh Q.; Wu, Youmei; Bonilla, Lauren S.; von Buchholtz, Lars J.

2017-01-01

The trigeminal ganglion contains somatosensory neurons that detect a range of thermal, mechanical and chemical cues and innervate unique sensory compartments in the head and neck including the eyes, nose, mouth, meninges and vibrissae. We used single-cell sequencing and in situ hybridization to examine the cellular diversity of the trigeminal ganglion in mice, defining thirteen clusters of neurons. We show that clusters are well conserved in dorsal root ganglia suggesting they represent distinct functional classes of somatosensory neurons and not specialization associated with their sensory targets. Notably, functionally important genes (e.g. the mechanosensory channel Piezo2 and the capsaicin gated ion channel Trpv1) segregate into multiple clusters and often are expressed in subsets of cells within a cluster. Therefore, the 13 genetically-defined classes are likely to be physiologically heterogeneous rather than highly parallel (i.e., redundant) lines of sensory input. Our analysis harnesses the power of single-cell sequencing to provide a unique platform for in silico expression profiling that complements other approaches linking gene-expression with function and exposes unexpected diversity in the somatosensory system. PMID:28957441
T-Reg Comparator: an analysis tool for the comparison of position weight matrices

PubMed Central

Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

2005-01-01

T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55–61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91–D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at . PMID:15980506

T-Reg Comparator: an analysis tool for the comparison of position weight matrices.

PubMed

Roepcke, Stefan; Grossmann, Steffen; Rahmann, Sven; Vingron, Martin

2005-07-01

T-Reg Comparator is a novel software tool designed to support research into transcriptional regulation. Sequence motifs representing transcription factor binding sites are usually encoded as position weight matrices. The user inputs a set of such weight matrices or binding site sequences and our program matches them against the T-Reg database, which is presently built on data from the Transfac [E. Wingender (2004) In Silico Biol., 4, 55-61] and Jaspar [A. Sandelin, W. Alkema, P. Engstrom, W. W. Wasserman and B. Lenhard (2004) Nucleic Acids Res., 32, D91-D94]. Our tool delivers a detailed report on similarities between user-supplied motifs and motifs in the database. Apart from simple one-to-one relationships, T-Reg Comparator is also able to detect similarities between submatrices. In addition, we provide a user interface to a program for sequence scanning with weight matrices. Typical areas of application for T-Reg Comparator are motif and regulatory module finding and annotation of regulatory genomic regions. T-Reg Comparator is available at http://treg.molgen.mpg.de.
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

PubMed Central

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-01-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.

PubMed

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-09-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
Sequence alterations in RX in patients with microphthalmia, anophthalmia, and coloboma

PubMed Central

London, Nikolas J.S.; Kessler, Patricia; Williams, Bryan; Pauer, Gayle J.; Hagstrom, Stephanie A.

2009-01-01

Purpose Microphthalmia, anophthalmia, and coloboma are ocular malformations with a significant genetic component. Rx is a homeobox gene expressed early in the developing retina and is important in retinal cell fate specification as well as stem cell proliferation. We screened a group of 24 patients with microphthalmia, coloboma, and/or anophthalmia for RX mutations. Methods We used standard PCR and automated sequencing techniques to amplify and sequence each of the three RX exons. Patients’ charts were reviewed for clinical information. The pathologic impact of the identified sequence variant was analyzed by computational methods using PolyPhen and PMut algorithms. Results In addition to the polymorphisms we identified a single patient with coloboma having a heterozygous nucleotide change (g.197G>C) in the first exon that results in a missense mutation of arginine to threonine at amino acid position 66 (R66T). In silico analysis predicted R66T to be a deleterious mutation. Conclusions Sequence variations in RX are uncommon in patients with congenital ocular malformations, but may play a role in disease pathogenesis. We observed a missense mutation in RX in a patient with a small, typical chorioretinal coloboma, and postulate that the mutation is responsible for the patient’s phenotype. PMID:19158959
In silico analysis of 16S ribosomal RNA gene sequencing‐based methods for identification of medically important anaerobic bacteria

PubMed Central

Woo, Patrick C Y; Chung, Liliane M W; Teng, Jade L L; Tse, Herman; Pang, Sherby S Y; Lau, Veronica Y T; Wong, Vanessa W K; Kam, Kwok‐ling; Lau, Susanna K P; Yuen, Kwok‐Yung

2007-01-01

This study is the first study that provides useful guidelines to clinical microbiologists and technicians on the usefulness of full 16S rRNA sequencing, 5′‐end 527‐bp 16S rRNA sequencing and the existing MicroSeq full and 500 16S rDNA bacterial identification system (MicroSeq, Perkin‐Elmer Applied Biosystems Division, Foster City, California, USA) databases for the identification of all existing medically important anaerobic bacteria. Full and 527‐bp 16S rRNA sequencing are able to identify 52–63% of 130 Gram‐positive anaerobic rods, 72–73% of 86 Gram‐negative anaerobic rods and 78% of 23 anaerobic cocci. The existing MicroSeq databases are able to identify only 19–25% of 130 Gram‐positive anaerobic rods, 38% of 86 Gram‐negative anaerobic rods and 39% of 23 anaerobic cocci. These represent only 45–46% of those that should be confidently identified by full and 527‐bp 16S rRNA sequencing. To improve the usefulness of MicroSeq, bacterial species that should be confidently identified by full and/or 527‐bp 16S rRNA sequencing but not included in the existing MicroSeq databases should be included. PMID:17046845
Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing

PubMed Central

Staton, Margaret; Best, Teodora; Khodwekar, Sudhir; Owusu, Sandra; Xu, Tao; Xu, Yi; Jennings, Tara; Cronn, Richard; Arumuganathan, A. Kathiravetpilla; Coggeshall, Mark; Gailing, Oliver; Liang, Haiying; Romero-Severson, Jeanne; Schlarbaum, Scott; Carlson, John E.

2015-01-01

Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence. PMID:26698853
InSilico DB genomic datasets hub: an efficient starting point for analyzing genome-wide studies in GenePattern, Integrative Genomics Viewer, and R/Bioconductor.

PubMed

Coletta, Alain; Molter, Colin; Duqué, Robin; Steenhoff, David; Taminau, Jonatan; de Schaetzen, Virginie; Meganck, Stijn; Lazar, Cosmin; Venet, David; Detours, Vincent; Nowé, Ann; Bersini, Hugues; Weiss Solís, David Y

2012-11-18

Genomics datasets are increasingly useful for gaining biomedical insights, with adoption in the clinic underway. However, multiple hurdles related to data management stand in the way of their efficient large-scale utilization. The solution proposed is a web-based data storage hub. Having clear focus, flexibility and adaptability, InSilico DB seamlessly connects genomics dataset repositories to state-of-the-art and free GUI and command-line data analysis tools. The InSilico DB platform is a powerful collaborative environment, with advanced capabilities for biocuration, dataset sharing, and dataset subsetting and combination. InSilico DB is available from https://insilicodb.org.
Isolation and expression analysis of EcbZIP17 from different finger millet genotypes shows conserved nature of the gene.

PubMed

Chopperla, Ramakrishna; Singh, Sonam; Mohanty, Sasmita; Reddy, Nanja; Padaria, Jasdeep C; Solanke, Amolkumar U

2017-10-01

Basic leucine zipper (bZIP) transcription factors comprise one of the largest gene families in plants. They play a key role in almost every aspect of plant growth and development and also in biotic and abiotic stress tolerance. In this study, we report isolation and characterization of EcbZIP17 , a group B bZIP transcription factor from a climate smart cereal, finger millet ( Eleusine coracana L.). The genomic sequence of EcbZIP17 is 2662 bp long encompassing two exons and one intron with ORF of 1722 bp and peptide length of 573 aa. This gene is homologous to AtbZIP17 ( Arabidopsis ), ZmbZIP17 (maize) and OsbZIP60 (rice) which play a key role in endoplasmic reticulum (ER) stress pathway. In silico analysis confirmed the presence of basic leucine zipper (bZIP) and transmembrane (TM) domains in the EcbZIP17 protein. Allele mining of this gene in 16 different genotypes by Sanger sequencing revealed no variation in nucleotide sequence, including the 618 bp long intron. Expression analysis of EcbZIP17 under heat stress exhibited similar pattern of expression in all the genotypes across time intervals with highest upregulation after 4 h. The present study established the conserved nature of EcbZIP17 at nucleotide and expression level.
In-Silico Identification Of Micro-Loops In Myelodysplastic Syndromes

NASA Astrophysics Data System (ADS)

Beck, Dominik; Brandl, Miriam; Pham, Tuan D.; Chang, Chung-Che; Zhou, Xiaobo

2011-06-01

Micro-loops are regulatory network motifs that leverage transcriptional and posttranscriptional control to effectively regulate the transcriptome. In this paper a regulatory network for Myelodysplastic Syndromes (MDSs) was constructed from the literature and publicly available data sources. The network was filtered using data from deep-sequencing of small RNAs, exon and microarrays. Motif discovery showed that micro-loops might exist in MDS. We further used the identified micro-loops and performed basic network analysis to identify the known disease gene RUNX1/AML, as well as miRNA family hsa-mir-181. This suggested that the concept of micro-loops can be applied to enhance disease gene identification and biomarker discovery.
Computational structural analysis of an anti-l-amino acid antibody and inversion of its stereoselectivity

PubMed Central

Ranieri, Daniel I.; Hofstetter, Heike; Hofstetter, Oliver

2009-01-01

The binding site of a monoclonal anti-l-amino acid antibody was modeled using the program SWISS-MODEL. Docking experiments with the enantiomers of phenylalanine revealed that the antibody interacts with l-phenylalanine via hydrogen bonds and hydrophobic contacts, whereas the d-enantiomer is rejected due to steric hindrance. Comparison of the sequences of this antibody and an anti-d-amino acid antibody indicates that both immunoglobulins derived from the same germline progenitor. Substitution of four amino acids residues, three in the framework and one in the complementarity determining regions, allowed in silico conversion of the anti-l-amino acid antibody into an antibody that stereoselectively binds d-phenylalanine. PMID:19472280
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

PubMed

Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

2012-04-01

The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae

PubMed Central

Kurotani, Atsushi; Sakurai, Tetsuya

2015-01-01

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups. PMID:26307970
In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae.

PubMed

Kurotani, Atsushi; Sakurai, Tetsuya

2015-08-20

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups.
Mapping the zebrafish brain methylome using reduced representation bisulfite sequencing

PubMed Central

Chatterjee, Aniruddha; Ozaki, Yuichi; Stockwell, Peter A; Horsfield, Julia A; Morison, Ian M; Nakagawa, Shinichi

2013-01-01

Reduced representation bisulfite sequencing (RRBS) has been used to profile DNA methylation patterns in mammalian genomes such as human, mouse and rat. The methylome of the zebrafish, an important animal model, has not yet been characterized at base-pair resolution using RRBS. Therefore, we evaluated the technique of RRBS in this model organism by generating four single-nucleotide resolution DNA methylomes of adult zebrafish brain. We performed several simulations to show the distribution of fragments and enrichment of CpGs in different in silico reduced representation genomes of zebrafish. Four RRBS brain libraries generated 98 million sequenced reads and had higher frequencies of multiple mapping than equivalent human RRBS libraries. The zebrafish methylome indicates there is higher global DNA methylation in the zebrafish genome compared with its equivalent human methylome. This observation was confirmed by RRBS of zebrafish liver. High coverage CpG dinucleotides are enriched in CpG island shores more than in the CpG island core. We found that 45% of the mapped CpGs reside in gene bodies, and 7% in gene promoters. This analysis provides a roadmap for generating reproducible base-pair level methylomes for zebrafish using RRBS and our results provide the first evidence that RRBS is a suitable technique for global methylation analysis in zebrafish. PMID:23975027
Rhizobium etli asparaginase II

PubMed Central

Huerta-Saquero, Alejandro; Evangelista-Martínez, Zahaed; Moreno-Enriquez, Angélica; Perez-Rueda, Ernesto

2013-01-01

Bacterial l-asparaginase has been a universal component of therapies for childhood acute lymphoblastic leukemia since the 1970s. Two principal enzymes derived from Escherichia coli and Erwinia chrysanthemi are the only options clinically approved to date. We recently reported a study of recombinant l-asparaginase (AnsA) from Rhizobium etli and described an increasing type of AnsA family members. Sequence analysis revealed four conserved motifs with notable differences with respect to the conserved regions of amino acid sequences of type I and type II l-asparaginases, particularly in comparison with therapeutic enzymes from E. coli and E. chrysanthemi. These differences suggested a distinct immunological specificity. Here, we report an in silico analysis that revealed immunogenic determinants of AnsA. Also, we used an extensive approach to compare the crystal structures of E. coli and E. chrysantemi asparaginases with a computational model of AnsA and identified immunogenic epitopes. A three-dimensional model of AsnA revealed, as expected based on sequence dissimilarities, completely different folding and different immunogenic epitopes. This approach could be very useful in transcending the problem of immunogenicity in two major ways: by chemical modifications of epitopes to reduce drug immunogenicity, and by site-directed mutagenesis of amino acid residues to diminish immunogenicity without reduction of enzymatic activity. PMID:22895060
Rhizobium etli asparaginase II: an alternative for acute lymphoblastic leukemia (ALL) treatment.

PubMed

Huerta-Saquero, Alejandro; Evangelista-Martínez, Zahaed; Moreno-Enriquez, Angélica; Perez-Rueda, Ernesto

2013-01-01

Bacterial L-asparaginase has been a universal component of therapies for childhood acute lymphoblastic leukemia since the 1970s. Two principal enzymes derived from Escherichia coli and Erwinia chrysanthemi are the only options clinically approved to date. We recently reported a study of recombinant L-asparaginase (AnsA) from Rhizobium etli and described an increasing type of AnsA family members. Sequence analysis revealed four conserved motifs with notable differences with respect to the conserved regions of amino acid sequences of type I and type II L-asparaginases, particularly in comparison with therapeutic enzymes from E. coli and E. chrysanthemi. These differences suggested a distinct immunological specificity. Here, we report an in silico analysis that revealed immunogenic determinants of AnsA. Also, we used an extensive approach to compare the crystal structures of E. coli and E. chrysantemi asparaginases with a computational model of AnsA and identified immunogenic epitopes. A three-dimensional model of AsnA revealed, as expected based on sequence dissimilarities, completely different folding and different immunogenic epitopes. This approach could be very useful in transcending the problem of immunogenicity in two major ways: by chemical modifications of epitopes to reduce drug immunogenicity, and by site-directed mutagenesis of amino acid residues to diminish immunogenicity without reduction of enzymatic activity.
Phylogenetic profiles reveal structural/functional determinants of TRPC3 signal-sensing antennae

PubMed Central

Ko, Kyung Dae; Bhardwaj, Gaurav; Hong, Yoojin; Chang, Gue Su; Kiselyov, Kirill

2009-01-01

Biochemical assessment of channel structure/function is incredibly challenging. Developing computational tools that provide these data would enable translational research, accelerating mechanistic experimentation for the bench scientist studying ion channels. Starting with the premise that protein sequence encodes information about structure, function and evolution (SF&E), we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach. The Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provides phylogenetic profiles that can model, ab initio, SF&E relationships of biological sequences at the whole protein, single domain and single-amino acid level.1,2 In our recent paper,4 we have applied GDDA-BLAST analysis to study canonical TRP (TRPC) channels1 and empirically validated predicted lipid-binding and trafficking activities contained within the TRPC3 TRP_2 domain of unknown function. Overall, our in silico, in vitro, and in vivo experiments support a model in which TRPC3 has signal-sensing antennae which are adorned with lipid-binding, trafficking and calmodulin regulatory domains. In this Addendum, we correlate our functional domain analysis with the cryo-EM structure of TRPC3.3 In addition, we synthesize recent studies with our new findings to provide a refined model on the mechanism(s) of TRPC3 activation/deactivation. PMID:19704910
Cross-species extrapolation of mammalian-based ToxCast Data using Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

EPA Science Inventory

In vitro high-throughput screening (HTS) and in silico technologies have emerged as 21st century tools for chemical hazard identification. In 2007 the U.S. Environmental Protection Agency (EPA) launched the ToxCast Program, which has screened thousands of chemicals in hundreds of...
In Silico Prediction of Neuropeptides/Peptide Hormone Transcripts in the Cheilostome Bryozoan Bugula neritina

PubMed Central

Zhang, Gen; He, Li-Sheng; Qian, Pei-Yuan

2016-01-01

The bryozoan Bugula neritina has a biphasic life cycle that consists of a planktonic larval stage and a sessile juvenile/adult stage. The transition between these two stages is crucial for the development and recruitment of B. neritina. Metamorphosis in B. neritina is mediated by both the nervous system and the release of developmental signals. However, no research has been conducted to investigate the expression of neuropeptides (NP)/peptide hormones in B. neritina larvae. Here, we report a comprehensive study of the NP/peptide hormones in the marine bryozoan B. neritina based on in silico identification methods. We recovered 22 transcripts encompassing 11 NP/peptide hormone precursor transcript sequences. The transcript sequences of the 11 isolated NP precursors were validated by cDNA cloning using gene-specific primers. We also examined the expression of three peptide hormone precursor transcripts (BnFDSIG, BnILP1, BnGPB) in the coronate larvae of B. neritina, demonstrating their distinct expression patterns in the larvae. Overall, our findings serve as an important foundation for subsequent investigations of the peptidergic control of bryozoan larval behavior and settlement. PMID:27537380
Semi-Automatic In Silico Gap Closure Enabled De Novo Assembly of Two Dehalobacter Genomes from Metagenomic Data

PubMed Central

Tang, Shuiquan; Gong, Yunchen; Edwards, Elizabeth A.

2012-01-01

Typically, the assembly and closure of a complete bacterial genome requires substantial additional effort spent in a wet lab for gap resolution and genome polishing. Assembly is further confounded by subspecies polymorphism when starting from metagenome sequence data. In this paper, we describe an in silico gap-resolution strategy that can substantially improve assembly. This strategy resolves assembly gaps in scaffolds using pre-assembled contigs, followed by verification with read mapping. It is capable of resolving assembly gaps caused by repetitive elements and subspecies polymorphisms. Using this strategy, we realized the de novo assembly of the first two Dehalobacter genomes from the metagenomes of two anaerobic mixed microbial cultures capable of reductive dechlorination of chlorinated ethanes and chloroform. Only four additional PCR reactions were required even though the initial assembly with Newbler v. 2.5 produced 101 contigs within 9 scaffolds belonging to two Dehalobacter strains. By applying this strategy to the re-assembly of a recently published genome of Bacteroides, we demonstrate its potential utility for other sequencing projects, both metagenomic and genomic. PMID:23284863

Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae

DOE PAGES

Nguyen, Marcus; Brettin, Thomas; Long, S. Wesley; ...

2018-01-11

Here, antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ± 1 two-fold dilution factor, is 92%. Individual accuracies aremore » >= 90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.« less
[In silico identification of molecular mimicry between T-cell epitopes of Neisseria meningitidis B and the human proteome].

PubMed

Batista-Duharte, Alexander; Téllez, Bruno; Tamayo, Maybia; Portuondo, Deivys; Cabrera, Osmir; Sierra, Gustavo; Pérez, Oliver

2013-07-01

The objective of the study was to determine the T-cell epitopes of four of the most frequent antigenic proteins of the outer membrane of Neisseria meningitidis B, and to identify the most relevant sites for molecular mimicry with T-cell epitopes in humans. In order to do so, an in silico study -a type of study that uses bioinformatic tools- was carried out using SWISS-PROT/TrEMBL, SYFPEITHI and FASTA databases, which helped to determine the protein sequences, CD4 and CD8 T-cell epitope prediction, as well as the molecular mimicry with humans, respectively. Molecular similarity was found in several human proteins present in different organs and tissues such as: liver, skin and epithelial tissues, brain, lymphatic system and testicles. Of these, those found in testicles were more similar, showing the highest frequency of mimetic sequences. This finding shed light on the success of N. meningitidis B to colonize human tissues and the failure of certain vaccines against this bacterium, and it even helps to explain possible autoimmune reactions associated with the infection or vaccination.
Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nguyen, Marcus; Brettin, Thomas; Long, S. Wesley

Here, antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ± 1 two-fold dilution factor, is 92%. Individual accuracies aremore » >= 90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.« less
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library

PubMed Central

Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul

2005-01-01

The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
Harnessing NGS and Big Data Optimally: Comparison of miRNA Prediction from Assembled versus Non-assembled Sequencing Data--The Case of the Grass Aegilops tauschii Complex Genome.

PubMed

Budak, Hikmet; Kantar, Melda

2015-07-01

MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.
Assessment of cardiac time intervals using high temporal resolution real-time spiral phase contrast with UNFOLDed-SENSE.

PubMed

Kowalik, Grzegorz T; Knight, Daniel S; Steeden, Jennifer A; Tann, Oliver; Odille, Freddy; Atkinson, David; Taylor, Andrew; Muthurangu, Vivek

2015-02-01

To develop a real-time phase contrast MR sequence with high enough temporal resolution to assess cardiac time intervals. The sequence utilized spiral trajectories with an acquisition strategy that allowed a combination of temporal encoding (Unaliasing by fourier-encoding the overlaps using the temporal dimension; UNFOLD) and parallel imaging (Sensitivity encoding; SENSE) to be used (UNFOLDed-SENSE). An in silico experiment was performed to determine the optimum UNFOLD filter. In vitro experiments were carried out to validate the accuracy of time intervals calculation and peak mean velocity quantification. In addition, 15 healthy volunteers were imaged with the new sequence, and cardiac time intervals were compared to reference standard Doppler echocardiography measures. For comparison, in silico, in vitro, and in vivo experiments were also carried out using sliding window reconstructions. The in vitro experiments demonstrated good agreement between real-time spiral UNFOLDed-SENSE phase contrast MR and the reference standard measurements of velocity and time intervals. The protocol was successfully performed in all volunteers. Subsequent measurement of time intervals produced values in keeping with literature values and good agreement with the gold standard echocardiography. Importantly, the proposed UNFOLDed-SENSE sequence outperformed the sliding window reconstructions. Cardiac time intervals can be successfully assessed with UNFOLDed-SENSE real-time spiral phase contrast. Real-time MR assessment of cardiac time intervals may be beneficial in assessment of patients with cardiac conditions such as diastolic dysfunction. © 2014 Wiley Periodicals, Inc.
Proteins and Their Interacting Partners: An Introduction to Protein-Ligand Binding Site Prediction Methods.

PubMed

Roche, Daniel Barry; Brackenridge, Danielle Allison; McGuffin, Liam James

2015-12-15

Elucidating the biological and biochemical roles of proteins, and subsequently determining their interacting partners, can be difficult and time consuming using in vitro and/or in vivo methods, and consequently the majority of newly sequenced proteins will have unknown structures and functions. However, in silico methods for predicting protein-ligand binding sites and protein biochemical functions offer an alternative practical solution. The characterisation of protein-ligand binding sites is essential for investigating new functional roles, which can impact the major biological research spheres of health, food, and energy security. In this review we discuss the role in silico methods play in 3D modelling of protein-ligand binding sites, along with their role in predicting biochemical functionality. In addition, we describe in detail some of the key alternative in silico prediction approaches that are available, as well as discussing the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated Model EvaluatiOn (CAMEO) projects, and their impact on developments in the field. Furthermore, we discuss the importance of protein function prediction methods for tackling 21st century problems.
Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with pulmonary hypertension

PubMed Central

Heaton, Michael P.; Smith, Timothy P.L.; Carnahan, Jacky K.; Basnayake, Veronica; Qiu, Jiansheng; Simpson, Barry; Kalbfleisch, Theodore S.

2016-01-01

The availability of whole genome sequence (WGS) data has made it possible to discover protein variants in silico. However, existing bovine WGS databases do not show data in a form conducive to protein variant analysis, and tend to under represent the breadth of genetic diversity in global beef cattle. Thus, our first aim was to use 96 beef sires, sharing minimal pedigree relationships, to create a searchable and publicly viewable set of mapped genomes relevant for 19 popular breeds of U.S. cattle. Our second aim was to identify protein variants encoded by the bovine endothelial PAS domain-containing protein 1 gene ( EPAS1), a gene associated with pulmonary hypertension in Angus cattle. The identity and quality of genomic sequences were verified by comparing WGS genotypes to those derived from other methods. The average read depth, genotype scoring rate, and genotype accuracy exceeded 14, 99%, and 99%, respectively. The 96 genomes were used to discover four amino acid variants encoded by EPAS1 (E270Q, P362L, A671G, and L701F) and confirm two variants previously associated with disease (A606T and G610S). The six EPAS1 missense mutations were verified with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry assays, and their frequencies were estimated in a separate collection of 1154 U.S. cattle representing 46 breeds. A rooted phylogenetic tree of eight polypeptide sequences provided a framework for evaluating the likely order of mutations and potential impact of EPAS1 alleles on the adaptive response to chronic hypoxia in U.S. cattle. This public, whole genome resource facilitates in silico identification of protein variants in diverse types of U.S. beef cattle, and provides a means of translating WGS data into a practical biological and evolutionary context for generating and testing hypotheses. PMID:27746904
Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum)

PubMed Central

Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin

2015-01-01

We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355
Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond

PubMed Central

Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils

2013-01-01

Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683
Genomic cloning and promoter functional analysis of myostatin-2 in shi drum, Umbrina cirrosa: conservation of muscle-specific promoter activity.

PubMed

Nadjar-Boger, Elisabeth; Maccatrozzo, Lisa; Radaelli, Giuseppe; Funkenstein, Bruria

2013-02-01

Myostatin (MSTN) is a member of the transforming growth factor-ß superfamily, known as a negative regulator of skeletal muscle development and growth in mammals. In contrast to mammals, fish possess at least two paralogs of MSTN: MSTN-1 and MSTN-2. Here we describe the cloning and sequence analysis of spliced and precursor (unspliced) transcripts as well as the 5' flanking region of MSTN-2 from the marine fish Umbrina cirrosa (ucMSTN-2). In silico analysis revealed numerous putative cis regulatory elements including several E-boxes known as binding sites to myogenic transcription factors. Transient transfection experiments using non-muscle and muscle cell lines showed high transcriptional activity in muscle cells and in differentiated neural cells, in accordance with our previous findings in MSTN-2 promoter from Sparus aurata. Comparative informatics analysis of MSTN-2 from several fish species revealed high conservation of the predicted amino acid sequence as well as the gene structure (exon length) although intron length varied between species. The proximal promoter of MSTN-2 gene was found to be conserved among Perciforms. In conclusion, this study reinforces our conclusion that MSTN-2 promoter is a very strong promoter, especially in muscle cells. In addition, we show that the MSTN-2 gene structure is highly conserved among fishes as is the predicted amino acid sequence of the peptide. Copyright © 2012 Elsevier Inc. All rights reserved.
Evolutionarily conserved ELOVL4 gene expression in the vertebrate retina.

PubMed

Lagali, Pamela S; Liu, Jiafan; Ambasudhan, Rajesh; Kakuk, Laura E; Bernstein, Steven L; Seigel, Gail M; Wong, Paul W; Ayyagari, Radha

2003-07-01

The gene elongation of very long chain fatty acids-4 (ELOVL4) has been shown to underlie phenotypically heterogeneous forms of autosomal dominant macular degeneration. In this study, the extent of evolutionary conservation and the existence and localization of retinal expression of this gene was investigated across a wide variety of species. Southern blot analysis of genomic DNA and bioinformatic analysis using the human ELOVL4 cDNA and protein sequences, respectively, were performed to identify species in which ELOVL4 orthologues and/or homologues are present. Retinal RNA and protein extracts derived from different species were assessed by Northern hybridization and immunoblot techniques to assess evolutionary conservation of gene expression. Immunohistochemical analysis of tissue sections prepared from various mammalian retinas was performed to determine the distribution of ELOVL4 and homologous proteins within specific retinal cell layers. The existence of ELOVL4 sequence orthologues and homologues was confirmed by both Southern blot analysis and in silico searches of protein sequence databases. Phylogenetic analysis places ELOVL4 among a large family of known and putative fatty acid elongase proteins. Northern blot analysis revealed the presence of multiple transcripts corresponding to ELOVL4 homologues expressed in the retina of several different mammalian species. Conserved proteins were also detected among retinal extracts of different mammals and were found to localize predominantly to the photoreceptor cell layer within retinal tissue preparations. The ELOVL4 gene is highly conserved throughout evolution and is expressed in the photoreceptor cells of the retina in a variety of different species, which suggests that it plays a critical role in retinal cell biology.
High-throughput SNP genotyping in the highly heterozygous genome of Eucalyptus: assay success, polymorphism and transferability across species

PubMed Central

2011-01-01

Background High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera. Results We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of Eucalyptus from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for E. grandis. A systematic assessment of in silico SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous in silico constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species. SNP reliability was high across nine Eucalyptus species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased. Conclusions This study indicates that the GGGT performs well both within and across species of Eucalyptus notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across multiple Eucalyptus species is feasible, although strongly dependent on having a representative and sufficiently deep collection of sequences from many individuals of each target species. A higher density SNP platform will be instrumental to undertake genome-wide phylogenetic and population genomics studies and to implement molecular breeding by Genomic Selection in Eucalyptus. PMID:21492434
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

PubMed

Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

2017-01-01

Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Cloning and Expression Analysis of Genes Encoding Lytic Endopeptidases L1 and L5 from Lysobacter sp. Strain XL1

PubMed Central

Lapteva, Y. S.; Zolova, O. E.; Shlyapnikov, M. G.; Tsfasman, I. M.; Muranova, T. A.; Stepnaya, O. A.; Kulaev, I. S.

2012-01-01

Lytic enzymes are the group of hydrolases that break down structural polymers of the cell walls of various microorganisms. In this work, we determined the nucleotide sequences of the Lysobacter sp. strain XL1 alpA and alpB genes, which code for, respectively, secreted lytic endopeptidases L1 (AlpA) and L5 (AlpB). In silico analysis of their amino acid sequences showed these endopeptidases to be homologous proteins synthesized as precursors similar in structural organization: the mature enzyme sequence is preceded by an N-terminal signal peptide and a pro region. On the basis of phylogenetic analysis, endopeptidases AlpA and AlpB were assigned to the S1E family [clan PA(S)] of serine peptidases. Expression of the alpA and alpB open reading frames (ORFs) in Escherichia coli confirmed that they code for functionally active lytic enzymes. Each ORF was predicted to have the Shine-Dalgarno sequence located at a canonical distance from the start codon and a potential Rho-independent transcription terminator immediately after the stop codon. The alpA and alpB mRNAs were experimentally found to be monocistronic; transcription start points were determined for both mRNAs. The synthesis of the alpA and alpB mRNAs was shown to occur predominantly in the late logarithmic growth phase. The amount of alpA mRNA in cells of Lysobacter sp. strain XL1 was much higher, which correlates with greater production of endopeptidase L1 than of L5. PMID:22865082
Cloning and expression analysis of genes encoding lytic endopeptidases L1 and L5 from Lysobacter sp. strain XL1.

PubMed

Lapteva, Y S; Zolova, O E; Shlyapnikov, M G; Tsfasman, I M; Muranova, T A; Stepnaya, O A; Kulaev, I S; Granovsky, I E

2012-10-01

Lytic enzymes are the group of hydrolases that break down structural polymers of the cell walls of various microorganisms. In this work, we determined the nucleotide sequences of the Lysobacter sp. strain XL1 alpA and alpB genes, which code for, respectively, secreted lytic endopeptidases L1 (AlpA) and L5 (AlpB). In silico analysis of their amino acid sequences showed these endopeptidases to be homologous proteins synthesized as precursors similar in structural organization: the mature enzyme sequence is preceded by an N-terminal signal peptide and a pro region. On the basis of phylogenetic analysis, endopeptidases AlpA and AlpB were assigned to the S1E family [clan PA(S)] of serine peptidases. Expression of the alpA and alpB open reading frames (ORFs) in Escherichia coli confirmed that they code for functionally active lytic enzymes. Each ORF was predicted to have the Shine-Dalgarno sequence located at a canonical distance from the start codon and a potential Rho-independent transcription terminator immediately after the stop codon. The alpA and alpB mRNAs were experimentally found to be monocistronic; transcription start points were determined for both mRNAs. The synthesis of the alpA and alpB mRNAs was shown to occur predominantly in the late logarithmic growth phase. The amount of alpA mRNA in cells of Lysobacter sp. strain XL1 was much higher, which correlates with greater production of endopeptidase L1 than of L5.
Blood-Borne Candidatus Borrelia algerica in a Patient with Prolonged Fever in Oran, Algeria

PubMed Central

Fotso Fotso, Aurélien; Angelakis, Emmanouil; Mouffok, Nadjet; Drancourt, Michel; Raoult, Didier

2015-01-01

To improve the knowledge base of Borrelia in north Africa, we tested 257 blood samples collected from febrile patients in Oran, Algeria, between January and December 2012 for Borrelia species using flagellin gene polymerase chain reaction sequencing. A sequence indicative of a new Borrelia sp. named Candidatus Borrelia algerica was detected in one blood sample. Further multispacer sequence typing indicated this Borrelia sp. had 97% similarity with Borrelia crocidurae, Borrelia duttonii, and Borrelia recurrentis. In silico comparison of Candidatus B. algerica spacer sequences with those of Borrelia hispanica and Borrelia garinii revealed 94% and 89% similarity, respectively. Candidatus B. algerica is a new relapsing fever Borrelia sp. detected in Oran. Further studies may help predict its epidemiological importance. PMID:26416117
In vitro fatigue tests and in silico finite element analysis of dental implants with different fixture/abutment joint types using computer-aided design models.

PubMed

Yamaguchi, Satoshi; Yamanishi, Yasufumi; Machado, Lucas S; Matsumoto, Shuji; Tovar, Nick; Coelho, Paulo G; Thompson, Van P; Imazato, Satoshi

2018-01-01

The aim of this study was to evaluate fatigue resistance of dental fixtures with two different fixture-abutment connections by in vitro fatigue testing and in silico three-dimensional finite element analysis (3D FEA) using original computer-aided design (CAD) models. Dental implant fixtures with external connection (EX) or internal connection (IN) abutments were fabricated from original CAD models using grade IV titanium and step-stress accelerated life testing was performed. Fatigue cycles and loads were assessed by Weibull analysis, and fatigue cracking was observed by micro-computed tomography and a stereomicroscope with high dynamic range software. Using the same CAD models, displacement vectors of implant components were also analyzed by 3D FEA. Angles of the fractured line occurring at fixture platforms in vitro and of displacement vectors corresponding to the fractured line in silico were compared by two-way ANOVA. Fatigue testing showed significantly greater reliability for IN than EX (p<0.001). Fatigue crack initiation was primarily observed at implant fixture platforms. FEA demonstrated that crack lines of both implant systems in vitro were observed in the same direction as displacement vectors of the implant fixtures in silico. In silico displacement vectors in the implant fixture are insightful for geometric development of dental implants to reduce complex interactions leading to fatigue failure. Copyright © 2017 Japan Prosthodontic Society. Published by Elsevier Ltd. All rights reserved.
An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data.

PubMed

Braaksma, Machtelt; Martens-Uzunova, Elena S; Punt, Peter J; Schaap, Peter J

2010-10-19

The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions.
An inventory of the Aspergillus niger secretome by combining in silico predictions with shotgun proteomics data

PubMed Central

2010-01-01

Background The ecological niche occupied by a fungal species, its pathogenicity and its usefulness as a microbial cell factory to a large degree depends on its secretome. Protein secretion usually requires the presence of a N-terminal signal peptide (SP) and by scanning for this feature using available highly accurate SP-prediction tools, the fraction of potentially secreted proteins can be directly predicted. However, prediction of a SP does not guarantee that the protein is actually secreted and current in silico prediction methods suffer from gene-model errors introduced during genome annotation. Results A majority rule based classifier that also evaluates signal peptide predictions from the best homologs of three neighbouring Aspergillus species was developed to create an improved list of potential signal peptide containing proteins encoded by the Aspergillus niger genome. As a complement to these in silico predictions, the secretome associated with growth and upon carbon source depletion was determined using a shotgun proteomics approach. Overall, some 200 proteins with a predicted signal peptide were identified to be secreted proteins. Concordant changes in the secretome state were observed as a response to changes in growth/culture conditions. Additionally, two proteins secreted via a non-classical route operating in A. niger were identified. Conclusions We were able to improve the in silico inventory of A. niger secretory proteins by combining different gene-model predictions from neighbouring Aspergilli and thereby avoiding prediction conflicts associated with inaccurate gene-models. The expected accuracy of signal peptide prediction for proteins that lack homologous sequences in the proteomes of related species is 85%. An experimental validation of the predicted proteome confirmed in silico predictions. PMID:20959013

Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

PubMed

Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

2015-09-01

Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.
Screening for Antimicrobial Resistance Genes and Virulence Factors via Genome Sequencing▿†

PubMed Central

Bennedsen, Mads; Stuer-Lauridsen, Birgitte; Danielsen, Morten; Johansen, Eric

2011-01-01

Second-generation genome sequencing and alignment of the resulting reads to in silico genomes containing antimicrobial resistance and virulence factor genes were used to screen for undesirable genes in 28 strains which could be used in human nutrition. No virulence factor genes were detected, while several isolates contained antimicrobial resistance genes. PMID:21335393
Lactobacillus micheneri sp. nov., Lactobacillus timberlakei sp. nov. and Lactobacillus quenuiae sp. nov., lactic acid bacteria isolated from wild bees and flowers.

PubMed

McFrederick, Quinn S; Vuong, Hoang Q; Rothman, Jason A

2018-06-01

Gram-stain-positive, rod-shaped, non-spore forming bacteria have been isolated from flowers and the guts of adult wild bees in the families Megachilidae and Halictidae. Phylogenetic analysis of the 16S rRNA gene indicated that these bacteria belong to the genus Lactobacillus, and are most closely related to the honey-bee associated bacteria Lactobacillus kunkeei (97.0 % sequence similarity) and Lactobacillus apinorum (97.0 % sequence similarity). Phylogenetic analyses of 16S rRNA genes and six single-copy protein coding genes, in situ and in silico DNA-DNA hybridization, and fatty-acid profiling differentiates the newly isolated bacteria as three novel Lactobacillus species: Lactobacillus micheneri sp. nov. with the type strain Hlig3 T (=DSM 104126 T ,=NRRL B-65473 T ), Lactobacillus timberlakei with the type strain HV_12 T (=DSM 104128 T ,=NRRL B-65472 T ), and Lactobacillus quenuiae sp. nov. with the type strain HV_6 T (=DSM 104127 T ,=NRRL B-65474 T ).
Identification of a mouse synaptic glycoprotein gene in cultured neurons.

PubMed

Yu, Albert Cheung-Hoi; Sun, Chun Xiao; Li, Qiang; Liu, Hua Dong; Wang, Chen Ran; Zhao, Guo Ping; Jin, Meilei; Lau, Lok Ting; Fung, Yin-Wan Wendy; Liu, Shuang

2005-10-01

Neuronal differentiation and aging are known to involve many genes, which may also be differentially expressed during these developmental processes. From primary cultured cerebral cortical neurons, we have previously identified various differentially expressed gene transcripts from cultured cortical neurons using the technique of arbitrarily primed PCR (RAP-PCR). Among these transcripts, clone 0-2 was found to have high homology to rat and human synaptic glycoprotein. By in silico analysis using an EST database and the FACTURA software, the full-length sequence of 0-2 was assembled and the clone was named as mouse synaptic glycoprotein homolog 2 (mSC2). DNA sequencing revealed transcript size of mSC2 being smaller than the human and rat homologs. RT-PCR indicated that mSC2 was expressed differentially at various culture days. The mSC2 gene was located in various tissues with higher expression in brain, lung, and liver. Functions of mSC2 in neurons and other tissues remain elusive and will require more investigation.
A new missense mutation in the BCKDHB gene causes the classic form of maple syrup urine disease (MSUD).

PubMed

Miryounesi, Mohammad; Ghafouri-Fard, Soudeh; Goodarzi, Hamedreza; Fardaei, Majid

2015-05-01

Maple syrup urine disease (MSUD) is an autosomal recessive metabolic disease caused by mutations in the BCKDHA, BCKDHB, DBT and DLD genes, which encode the E1α, E1β, E2 and E3 subunits of the branched chain α ketoacid dehydrogenase (BCKD) complex, respectively. This complex is involved in the metabolism of branched-chain amino acids. In this study, we analyzed the DNA sequences of BCKDHA and BCKDHB genes in an infant who suffered from MSUD and died at the age of 6 months. We found a new missense mutation in exon 5 of BCKDHB gene (c.508C>T). The heterozygosity of the parents for the mentioned nucleotide change was confirmed by direct sequence analysis of the corresponding segment. Another missense mutation has been found in the same codon previously and shown by in silico analyses to be deleterious. This report provides further evidence that this amino acid change can cause classic MSUD.
Analysis of Draft Genome Sequence of Pseudomonas sp. QTF5 Reveals Its Benzoic Acid Degradation Ability and Heavy Metal Tolerance

PubMed Central

Li, Yang; Ren, Yi

2017-01-01

Pseudomonas sp. QTF5 was isolated from the continuous permafrost near the bitumen layers in the Qiangtang basin of Qinghai-Tibetan Plateau in China (5,111 m above sea level). It is psychrotolerant and highly and widely tolerant to heavy metals and has the ability to metabolize benzoic acid and salicylic acid. To gain insight into the genetic basis for its adaptation, we performed whole genome sequencing and analyzed the resistant genes and metabolic pathways. Based on 120 published and annotated genomes representing 31 species in the genus Pseudomonas, in silico genomic DNA-DNA hybridization (<54%) and average nucleotide identity calculation (<94%) revealed that QTF5 is closest to Pseudomonas lini and should be classified into a novel species. This study provides the genetic basis to identify the genes linked to its specific mechanisms for adaptation to extreme environment and application of this microorganism in environmental conservation. PMID:29270429
RUCS: rapid identification of PCR primers for unique core sequences.

PubMed

Thomsen, Martin Christen Frølund; Hasman, Henrik; Westh, Henrik; Kaya, Hülya; Lund, Ole

2017-12-15

Designing PCR primers to target a specific selection of whole genome sequenced strains can be a long, arduous and sometimes impractical task. Such tasks would benefit greatly from an automated tool to both identify unique targets, and to validate the vast number of potential primer pairs for the targets in silico. Here we present RUCS, a program that will find PCR primer pairs and probes for the unique core sequences of a positive genome dataset complement to a negative genome dataset. The resulting primer pairs and probes are in addition to simple selection also validated through a complex in silico PCR simulation. We compared our method, which identifies the unique core sequences, against an existing tool called ssGeneFinder, and found that our method was 6.5-20 times more sensitive. We used RUCS to design primer pairs that would target a set of genomes known to contain the mcr-1 colistin resistance gene. Three of the predicted pairs were chosen for experimental validation using PCR and gel electrophoresis. All three pairs successfully produced an amplicon with the target length for the samples containing mcr-1 and no amplification products were produced for the negative samples. The novel methods presented in this manuscript can reduce the time needed to identify target sequences, and provide a quick virtual PCR validation to eliminate time wasted on ambiguously binding primers. Source code is freely available on https://bitbucket.org/genomicepidemiology/rucs. Web service is freely available on https://cge.cbs.dtu.dk/services/RUCS. mcft@cbs.dtu.dk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Amino acid signature enables proteins to recognize modified tRNA.

PubMed

Spears, Jessica L; Xiao, Xingqing; Hall, Carol K; Agris, Paul F

2014-02-25

Human tRNA(Lys3)UUU is the primer for HIV replication. The HIV-1 nucleocapsid protein, NCp7, facilitates htRNA(Lys3)UUU recruitment from the host cell by binding to and remodeling the tRNA structure. Human tRNA(Lys3)UUU is post-transcriptionally modified, but until recently, the importance of those modifications in tRNA recognition by NCp7 was unknown. Modifications such as the 5-methoxycarbonylmethyl-2-thiouridine at anticodon wobble position-34 and 2-methylthio-N(6)-threonylcarbamoyladenosine, adjacent to the anticodon at position-37, are important to the recognition of htRNA(Lys3)UUU by NCp7. Several short peptides selected from phage display libraries were found to also preferentially recognize these modifications. Evolutionary algorithms (Monte Carlo and self-consistent mean field) and assisted model building with energy refinement were used to optimize the peptide sequence in silico, while fluorescence assays were developed and conducted to verify the in silico results and elucidate a 15-amino acid signature sequence (R-W-Q/N-H-X2-F-Pho-X-G/A-W-R-X2-G, where X can be most amino acids, and Pho is hydrophobic) that recognized the tRNA's fully modified anticodon stem and loop domain, hASL(Lys3)UUU. Peptides of this sequence specifically recognized and bound modified htRNA(Lys3)UUU with an affinity 10-fold higher than that of the starting sequence. Thus, this approach provides an effective means of predicting sequences of RNA binding peptides that have better binding properties. Such peptides can be used in cell and molecular biology as well as biochemistry to explore RNA binding proteins and to inhibit those protein functions.
Genome-guided exploration of metabolic features of Streptomyces peucetius ATCC 27952: past, current, and prospect.

PubMed

Thuan, Nguyen Huy; Dhakal, Dipesh; Pokhrel, Anaya Raj; Chu, Luan Luong; Van Pham, Thi Thuy; Shrestha, Anil; Sohng, Jae Kyung

2018-05-01

Streptomyces peucetius ATCC 27952 produces two major anthracyclines, doxorubicin (DXR) and daunorubicin (DNR), which are potent chemotherapeutic agents for the treatment of several cancers. In order to gain detailed insight on genetics and biochemistry of the strain, the complete genome was determined and analyzed. The result showed that its complete sequence contains 7187 protein coding genes in a total of 8,023,114 bp, whereas 87% of the genome contributed to the protein coding region. The genomic sequence included 18 rRNA, 66 tRNAs, and 3 non-coding RNAs. In silico studies predicted ~ 68 biosynthetic gene clusters (BCGs) encoding diverse classes of secondary metabolites, including non-ribosomal polyketide synthase (NRPS), polyketide synthase (PKS I, II, and III), terpenes, and others. Detailed analysis of the genome sequence revealed versatile biocatalytic enzymes such as cytochrome P450 (CYP), electron transfer systems (ETS) genes, methyltransferase (MT), glycosyltransferase (GT). In addition, numerous functional genes (transporter gene, SOD, etc.) and regulatory genes (afsR-sp, metK-sp, etc.) involved in the regulation of secondary metabolites were found. This minireview summarizes the genome-based genome mining (GM) of diverse BCGs and genome exploration (GE) of versatile biocatalytic enzymes, and other enzymes involved in maintenance and regulation of metabolism of S. peucetius. The detailed analysis of genome sequence provides critically important knowledge useful in the bioengineering of the strain or harboring catalytically efficient enzymes for biotechnological applications.
Proteins with an Euonymus lectin-like domain are ubiquitous in Embryophyta

PubMed Central

2009-01-01

Background Cloning of the Euonymus lectin led to the discovery of a novel domain that also occurs in some stress-induced plant proteins. The distribution and the diversity of proteins with an Euonymus lectin (EUL) domain were investigated using detailed analysis of sequences in publicly accessible genome and transcriptome databases. Results Comprehensive in silico analyses indicate that the recently identified Euonymus europaeus lectin domain represents a conserved structural unit of a novel family of putative carbohydrate-binding proteins, which will further be referred to as the Euonymus lectin (EUL) family. The EUL domain is widespread among plants. Analysis of retrieved sequences revealed that some sequences consist of a single EUL domain linked to an unrelated N-terminal domain whereas others comprise two in tandem arrayed EUL domains. A new classification system for these lectins is proposed based on the overall domain architecture. Evolutionary relationships among the sequences with EUL domains are discussed. Conclusion The identification of the EUL family provides the first evidence for the occurrence in terrestrial plants of a highly conserved plant specific domain. The widespread distribution of the EUL domain strikingly contrasts the more limited or even narrow distribution of most other lectin domains found in plants. The apparent omnipresence of the EUL domain is indicative for a universal role of this lectin domain in plants. Although there is unambiguous evidence that several EUL domains possess carbohydrate-binding activity further research is required to corroborate the carbohydrate-binding properties of different members of the EUL family. PMID:19930663
Expressed sequence tag based identification and expression analysis of some cold inducible elements in seabuckthorn (Hippophae rhamnoides L.).

PubMed

Ghangal, Rajesh; Raghuvanshi, Saurabh; Sharma, Prakash C

2012-02-01

A cDNA library was constructed from the mature leaves of seabuckthorn (Hippophae rhamnoides). Expressed Sequence Tags (ESTs) were generated by single pass sequencing of 4500 cDNA clones. We submitted 3412 ESTs to dbEST of NCBI. Clustering of these ESTs yielded 1665 unigenes comprising of 345 contigs and 1320 singletons. Out of 1665 unigenes, 1278 unigenes were annotated by similarity search while the remaining 387 unannotated unigenes were considered as organism specific. Gene Ontology (GO) analysis of the unigene dataset showed 691 unigenes related to biological processes, 727 to molecular functions and 588 to cellular component category. On the basis of similarity search and GO annotation, 43 unigenes were found responsive to biotic and abiotic stresses. To validate this observation, 13 genes that are known to be associated with cold stress tolerance from previous studies in Arabidopsis and 3 novel transcripts were examined by Real time RT-PCR to understand the change in expression pattern under cold/freeze stress. In silico study of occurrence of microsatellites in these ESTs revealed the presence of 62 Simple Sequence Repeats (SSRs), some of which are being explored to assess genetic diversity among seabuckthorn collections. This is the first report of generation of transcriptome data providing information about genes involved in managing plant abiotic stress in seabuckthorn, a plant known for its enormous medicinal and ecological value. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
Methyltransferases acquired by lactococcal 936-type phage provide protection against restriction endonuclease activity.

PubMed

Murphy, James; Klumpp, Jochen; Mahony, Jennifer; O'Connell-Motherway, Mary; Nauta, Arjen; van Sinderen, Douwe

2014-10-01

So-called 936-type phages are among the most frequently isolated phages in dairy facilities utilising Lactococcus lactis starter cultures. Despite extensive efforts to control phage proliferation and decades of research, these phages continue to negatively impact cheese production in terms of the final product quality and consequently, monetary return. Whole genome sequencing and in silico analysis of three 936-type phage genomes identified several putative (orphan) methyltransferase (MTase)-encoding genes located within the packaging and replication regions of the genome. Utilising SMRT sequencing, methylome analysis was performed on all three phages, allowing the identification of adenine modifications consistent with N-6 methyladenine sequence methylation, which in some cases could be attributed to these phage-encoded MTases. Heterologous gene expression revealed that M.Phi145I/M.Phi93I and M.Phi93DAM, encoded by genes located within the packaging module, provide protection against the restriction enzymes HphI and DpnII, respectively, representing the first functional MTases identified in members of 936-type phages. SMRT sequencing technology enabled the identification of the target motifs of MTases encoded by the genomes of three lytic 936-type phages and these MTases represent the first functional MTases identified in this species of phage. The presence of these MTase-encoding genes on 936-type phage genomes is assumed to represent an adaptive response to circumvent host encoded restriction-modification systems thereby increasing the fitness of the phages in a dynamic dairy environment.
Evolutionary insight into the ionotropic glutamate receptor superfamily of photosynthetic organisms.

PubMed

De Bortoli, Sara; Teardo, Enrico; Szabò, Ildikò; Morosinotto, Tomas; Alboresi, Alessandro

2016-11-01

Photosynthetic eukaryotes have a complex evolutionary history shaped by multiple endosymbiosis events that required a tight coordination between the organelles and the rest of the cell. Plant ionotropic glutamate receptors (iGLRs) form a large superfamily of proteins with a predicted or proven non-selective cation channel activity regulated by a broad range of amino acids. They are involved in different physiological processes such as C/N sensing, resistance against fungal infection, root and pollen tube growth and response to wounding and pathogens. Most of the present knowledge is limited to iGLRs located in plasma membranes. However, recent studies localized different iGLR isoforms to mitochondria and/or chloroplasts, suggesting the possibility that they play a specific role in bioenergetic processes. In this work, we performed a comparative analysis of GLR sequences from bacteria and various photosynthetic eukaryotes. In particular, novel types of selectivity filters of bacteria are reported adding new examples of the great diversity of the GLR superfamily. The highest variability in GLR sequences was found among the algal sequences (cryptophytes, diatoms, brown and green algae). GLRs of land plants are not closely related to the GLRs of green algae analyzed in this work. The GLR family underwent a great expansion in vascular plants. Among plant GLRs, Clade III includes sequences from Physcomitrella patens, Marchantia polymorpha and gymnosperms and can be considered the most ancient, while other clades likely emerged later. In silico analysis allowed the identification of sequences with a putative target to organelles. Sequences with a predicted localization to mitochondria and chloroplasts are randomly distributed among different type of GLRs, suggesting that no compartment-related specific function has been maintained across the species. Copyright © 2016 Elsevier B.V. All rights reserved.
Reversal of the hair loss phenotype by modulating the estradiol-ANGPT2 axis in the mouse model of female pattern hair loss.

PubMed

Endo, Yujiro; Obayashi, Yuko; Ono, Tomoji; Serizawa, Tetsushi; Murakoshi, Michiaki; Ohyama, Manabu

2018-07-01

Despite high demand for a remedy, the treatment options for female pattern hair loss (FPHL) are limited. FPHL is frequent in postmenopausal women. In ovariectomized (OVX) mice, which lack β-estradiol (E2) and manifest hair loss mimicking FPHL, E2 supplementation has been shown to increase hair density. However, the mechanism by which E2 exhibits its biological activity remains elusive. To identify the downstream targets of E2 in the context of FPHL pathophysiology and discover a potential therapeutic agent for the E2-dependent subtype of FPHL. Human dermal papilla cells (hDPCs) were cultured with E2, and a microarray analysis was performed to identify the genes regulated by E2. Using OVX mice, the identified gene product was intradermally administered and then quantitative image analysis of hair density was conducted. In silico analysis to link E2 and the identified gene was performed. Global gene expression and bioinformatics analyses revealed that the genes associated with the angiopoietin-2 (ANGPT2) pathway were upregulated by E2 in hDPCs. ANGPT2 was significantly downregulated in OVX mice than in sham-operated mice (P < 0.01). Importantly, hair density was higher in OVX mice treated with ANGPT2 than in control mice (P < 0.05). In silico analysis showed DNA sequences with high possibility of estrogen receptor binding in the promoter region of ANGPT2. The E2-ANGPT2 axis is present in hair follicles. ANGPT2 provides a strategy for the management of E2-dependent and postmenopausal subsets of FPHL. Copyright © 2018 Japanese Society for Investigative Dermatology. Published by Elsevier B.V. All rights reserved.
Extensive in silico analysis of Mimivirus coded Rab GTPase homolog suggests a possible role in virion membrane biogenesis.

PubMed

Zade, Amrutraj; Sengupta, Malavi; Kondabagil, Kiran

2015-01-01

Rab GTPases are the key regulators of intracellular membrane trafficking in eukaryotes. Many viruses and intracellular bacterial pathogens have evolved to hijack the host Rab GTPase functions, mainly through activators and effector proteins, for their benefit. Acanthamoeba polyphaga mimivirus (APMV) is one of the largest viruses and belongs to the monophyletic clade of nucleo-cytoplasmic large DNA viruses (NCLDV). The inner membrane lining is integral to the APMV virion structure. APMV assembly involves extensive host membrane modifications, like vesicle budding and fusion, leading to the formation of a membrane sheet that is incorporated into the virion. Intriguingly, APMV and all group I members of the Mimiviridae family code for a putative Rab GTPase protein. APMV is the first reported virus to code for a Rab GTPase (encoded by R214 gene). Our thorough in silico analysis of the subfamily specific (SF) region of Mimiviridae Rab GTPase sequences suggests that they are related to Rab5, a member of the group II Rab GTPases, of lower eukaryotes. Because of their high divergence from the existing three isoforms, A, B, and C of the Rab5-family, we suggest that Mimiviridae Rabs constitute a new isoform, Rab5D. Phylogenetic analysis indicated probable horizontal acquisition from a lower eukaryotic ancestor followed by selection and divergence. Furthermore, interaction network analysis suggests that vps34 (a Class III PI3K homolog, coded by APMV L615), Atg-8 and dynamin (host proteins) are recruited by APMV Rab GTPase during capsid assembly. Based on these observations, we hypothesize that APMV Rab plays a role in the acquisition of inner membrane during virion assembly.
Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii

PubMed Central

de Carvalho, João Carlos Monteiro; Mayfield, Stephen Patrick

2018-01-01

Efficient protein secretion is a desirable trait for any recombinant protein expression system, together with simple, low-cost, and defined media, such as the typical media used for photosynthetic cultures of microalgae. However, low titers of secreted heterologous proteins are usually obtained, even with the most extensively studied microalga Chlamydomonas reinhardtii, preventing their industrial application. In this study, we aimed to expand and evaluate secretory signal peptides (SP) for heterologous protein secretion in C. reinhardtii by comparing previously described SP with untested sequences. We compared the SPs from arylsulfatase 1 and carbonic anhydrase 1, with those of untried SPs from binding protein 1, an ice-binding protein, and six sequences identified in silico. We identified over 2000 unique SPs using the SignalP 4.0 software. mCherry fluorescence was used to compare the protein secretion of up to 96 colonies for each construct, non-secretion construct, and parental wild-type cc1690 cells. Supernatant fluorescence varied according to the SP used, with a 10-fold difference observed between the highest and lowest secretors. Moreover, two SPs identified in silico secreted the highest amount of mCherry. Our results demonstrate that the SP should be carefully selected and that efficient sequences can be coded in the C. reinhardtii genome. The SPs described here expand the portfolio available for research on heterologous protein secretion and for biomanufacturing applications. PMID:29408937
Glucokinase gene mutations (MODY 2) in Asian Indians.

PubMed

Kanthimathi, Sekar; Jahnavi, Suresh; Balamurugan, Kandasamy; Ranjani, Harish; Sonya, Jagadesan; Goswami, Soumik; Chowdhury, Subhankar; Mohan, Viswanathan; Radha, Venkatesan

2014-03-01

Heterozygous inactivating mutations in the glucokinase (GCK) gene cause a hyperglycemic condition termed maturity-onset diabetes of the young (MODY) 2 or GCK-MODY. This is characterized by mild, stable, usually asymptomatic, fasting hyperglycemia that rarely requires pharmacological intervention. The aim of the present study was to screen for GCK gene mutations in Asian Indian subjects with mild hyperglycemia. Of the 1,517 children and adolescents of the population-based ORANGE study in Chennai, India, 49 were found to have hyperglycemia. These children along with the six patients referred to our center with mild hyperglycemia were screened for MODY 2 mutations. The GCK gene was bidirectionally sequenced using BigDye(®) Terminator v3.1 (Applied Biosystems, Foster City, CA) chemistry. In silico predictions of the pathogenicity were carried out using the online tools SIFT, Polyphen-2, and I-Mutant 2.0 software programs. Direct sequencing of the GCK gene in the patients referred to our Centre revealed one novel mutation, Thr206Ala (c.616A>G), in exon 6 and one previously described mutation, Met251Thr (c.752T>C), in exon 7. In silico analysis predicted the novel mutation to be pathogenic. The highly conserved nature and critical location of the residue Thr206 along with the clinical course suggests that the Thr206Ala is a MODY 2 mutation. However, we did not find any MODY 2 mutations in the 49 children selected from the population-based study. Hence prevalence of GCK mutations in Chennai is <1:1,517. This is the first study of MODY 2 mutations from India and confirms the importance of considering GCK gene mutation screening in patients with mild early-onset hyperglycemia who are negative for β-cell antibodies.
Structural Diversity in the Dandelion (Taraxacum officinale) Polyphenol Oxidase Family Results in Different Responses to Model Substrates

PubMed Central

Dirks-Hofmeister, Mareike E.; Singh, Ratna; Leufken, Christine M.; Inlow, Jennifer K.; Moerschbacher, Bruno M.

2014-01-01

Polyphenol oxidases (PPOs) are ubiquitous type-3 copper enzymes that catalyze the oxygen-dependent conversion of o-diphenols to the corresponding quinones. In most plants, PPOs are present as multiple isoenzymes that probably serve distinct functions, although the precise relationship between sequence, structure and function has not been addressed in detail. We therefore compared the characteristics and activities of recombinant dandelion PPOs to gain insight into the structure–function relationships within the plant PPO family. Phylogenetic analysis resolved the 11 isoenzymes of dandelion into two evolutionary groups. More detailed in silico and in vitro analyses of four representative PPOs covering both phylogenetic groups were performed. Molecular modeling and docking predicted differences in enzyme-substrate interactions, providing a structure-based explanation for grouping. One amino acid side chain positioned at the entrance to the active site (position HB2+1) potentially acts as a “selector” for substrate binding. In vitro activity measurements with the recombinant, purified enzymes also revealed group-specific differences in kinetic parameters when the selected PPOs were presented with five model substrates. The combination of our enzyme kinetic measurements and the in silico docking studies therefore indicate that the physiological functions of individual PPOs might be defined by their specific interactions with different natural substrates. PMID:24918587
In silico comparative analysis of SSR markers in plants

PubMed Central

2011-01-01

Background The adverse environmental conditions impose extreme limitation to growth and plant development, restricting the genetic potential and reflecting on plant yield losses. The progress obtained by classic plant breeding methods aiming at increasing abiotic stress tolerances have not been enough to cope with increasing food demands. New target genes need to be identified to reach this goal, which requires extensive studies of the related biological mechanisms. Comparative analyses in ancestral plant groups can help to elucidate yet unclear biological processes. Results In this study, we surveyed the occurrence patterns of expressed sequence tag-derived microsatellite markers for model plants. A total of 13,133 SSR markers were discovered using the SSRLocator software in non-redundant EST databases made for all eleven species chosen for this study. The dimer motifs are more frequent in lower plant species, such as green algae and mosses, and the trimer motifs are more frequent for the majority of higher plant groups, such as monocots and dicots. With this in silico study we confirm several microsatellite plant survey results made with available bioinformatics tools. Conclusions The comparative studies of EST-SSR markers among all plant lineages is well suited for plant evolution studies as well as for future studies of transferability of molecular markers. PMID:21247422
In silico assessment of the potential allergenicity of transgenes used for the development of GM food crops.

PubMed

Mishra, Ankita; Gaur, S N; Singh, B P; Arora, Naveen

2012-05-01

Genetically modified (GM) crops require allergenicity and toxicity assessment of the novel protein(s) to ensure complete safety to the consumers. These assessments are performed in accordance with the guidelines proposed by Codex (2003) and ICMR (2008). The guidelines recommend sequence homology analysis as a preliminary step towards allergenicity prediction, later in vitro experiments may be performed to confirm allergenicity. In the present study, an in silico approach is employed to evaluate the allergenic potential of six transgenes routinely used for the development of GM food crops. Among the genes studied, manganese superoxide dismutase (MnSOD) and osmotin shares greater than 90% identity with Hev b 10 and Cap a 1w, respectively. Chitinase shares greater than 70% identity with allergens namely Pers a 1 and Hev b 11, and fungal chitinase showed significant IgE binding with 7 of 75 patients' sera positive to different food extracts. Glucanases (alfalfa, wheat) and glycine betaine aldehyde dehydrogenase gene share 50% homology with allergens like - Ole e 9, Cla h 10 and Alt a 10. The results demonstrate the allergenic potential of six genes and can serve as a guide for selection of transgenes to develop GM crops. Copyright © 2012 Elsevier Ltd. All rights reserved.

Structural elucidation of estrus urinary lipocalin protein (EULP) and evaluating binding affinity with pheromones using molecular docking and fluorescence study

PubMed Central

Rajesh, Durairaj; Muthukumar, Subramanian; Saibaba, Ganesan; Siva, Durairaj; Akbarsha, Mohammad Abdulkader; Gulyás, Balázs; Padmanabhan, Parasuraman; Archunan, Govindaraju

2016-01-01

Transportation of pheromones bound with carrier proteins belonging to lipocalin superfamily is known to prolong chemo-signal communication between individuals belonging to the same species. Members of lipocalin family (MLF) proteins have three structurally conserved motifs for delivery of hydrophobic molecules to the specific recognizer. However, computational analyses are critically required to validate and emphasize the sequence and structural annotation of MLF. This study focused to elucidate the evolution, structural documentation, stability and binding efficiency of estrus urinary lipocalin protein (EULP) with endogenous pheromones adopting in-silico and fluorescence study. The results revealed that: (i) EULP perhaps originated from fatty acid binding protein (FABP) revealed in evolutionary analysis; (ii) Dynamic simulation study shows that EULP is highly stable at below 0.45 Å of root mean square deviation (RMSD); (iii) Docking evaluation shows that EULP has higher binding energy with farnesol and 2-iso-butyl-3-methoxypyrazine (IBMP) than 2-naphthol; and (iv) Competitive binding and quenching assay revealed that purified EULP has good binding interaction with farnesol. Both, In-silico and experimental studies showed that EULP is an efficient binding partner to pheromones. The present study provides impetus to create a point mutation for increasing longevity of EULP to develop pheromone trap for rodent pest management. PMID:27782155
Structural elucidation of estrus urinary lipocalin protein (EULP) and evaluating binding affinity with pheromones using molecular docking and fluorescence study.

PubMed

Rajesh, Durairaj; Muthukumar, Subramanian; Saibaba, Ganesan; Siva, Durairaj; Akbarsha, Mohammad Abdulkader; Gulyás, Balázs; Padmanabhan, Parasuraman; Archunan, Govindaraju

2016-10-26

Transportation of pheromones bound with carrier proteins belonging to lipocalin superfamily is known to prolong chemo-signal communication between individuals belonging to the same species. Members of lipocalin family (MLF) proteins have three structurally conserved motifs for delivery of hydrophobic molecules to the specific recognizer. However, computational analyses are critically required to validate and emphasize the sequence and structural annotation of MLF. This study focused to elucidate the evolution, structural documentation, stability and binding efficiency of estrus urinary lipocalin protein (EULP) with endogenous pheromones adopting in-silico and fluorescence study. The results revealed that: (i) EULP perhaps originated from fatty acid binding protein (FABP) revealed in evolutionary analysis; (ii) Dynamic simulation study shows that EULP is highly stable at below 0.45 Å of root mean square deviation (RMSD); (iii) Docking evaluation shows that EULP has higher binding energy with farnesol and 2-iso-butyl-3-methoxypyrazine (IBMP) than 2-naphthol; and (iv) Competitive binding and quenching assay revealed that purified EULP has good binding interaction with farnesol. Both, In-silico and experimental studies showed that EULP is an efficient binding partner to pheromones. The present study provides impetus to create a point mutation for increasing longevity of EULP to develop pheromone trap for rodent pest management.
Structural diversity in the dandelion (Taraxacum officinale) polyphenol oxidase family results in different responses to model substrates.

PubMed

Dirks-Hofmeister, Mareike E; Singh, Ratna; Leufken, Christine M; Inlow, Jennifer K; Moerschbacher, Bruno M

2014-01-01

Polyphenol oxidases (PPOs) are ubiquitous type-3 copper enzymes that catalyze the oxygen-dependent conversion of o-diphenols to the corresponding quinones. In most plants, PPOs are present as multiple isoenzymes that probably serve distinct functions, although the precise relationship between sequence, structure and function has not been addressed in detail. We therefore compared the characteristics and activities of recombinant dandelion PPOs to gain insight into the structure-function relationships within the plant PPO family. Phylogenetic analysis resolved the 11 isoenzymes of dandelion into two evolutionary groups. More detailed in silico and in vitro analyses of four representative PPOs covering both phylogenetic groups were performed. Molecular modeling and docking predicted differences in enzyme-substrate interactions, providing a structure-based explanation for grouping. One amino acid side chain positioned at the entrance to the active site (position HB2+1) potentially acts as a "selector" for substrate binding. In vitro activity measurements with the recombinant, purified enzymes also revealed group-specific differences in kinetic parameters when the selected PPOs were presented with five model substrates. The combination of our enzyme kinetic measurements and the in silico docking studies therefore indicate that the physiological functions of individual PPOs might be defined by their specific interactions with different natural substrates.
In Silico/In Vivo Insights into the Functional and Evolutionary Pathway of Pseudomonas aeruginosa Oleate-Diol Synthase. Discovery of a New Bacterial Di-Heme Cytochrome C Peroxidase Subfamily

PubMed Central

Estupiñán, Mónica; Álvarez-García, Daniel; Barril, Xavier; Diaz, Pilar; Manresa, Angeles

2015-01-01

As previously reported, P. aeruginosa genes PA2077 and PA2078 code for 10S-DOX (10S-Dioxygenase) and 7,10-DS (7,10-Diol Synthase) enzymes involved in long-chain fatty acid oxygenation through the recently described oleate-diol synthase pathway. Analysis of the amino acid sequence of both enzymes revealed the presence of two heme-binding motifs (CXXCH) on each protein. Phylogenetic analysis showed the relation of both proteins to bacterial di-heme cytochrome c peroxidases (Ccps), similar to Xanthomonas sp. 35Y rubber oxidase RoxA. Structural homology modelling of PA2077 and PA2078 was achieved using RoxA (pdb 4b2n) as a template. From the 3D model obtained, presence of significant amino acid variations in the predicted heme-environment was found. Moreover, the presence of palindromic repeats located in enzyme-coding regions, acting as protein evolution elements, is reported here for the first time in P. aeruginosa genome. These observations and the constructed phylogenetic tree of the two proteins, allow the proposal of an evolutionary pathway for P. aeruginosa oleate-diol synthase operon. Taking together the in silico and in vivo results obtained we conclude that enzymes PA2077 and PA2078 are the first described members of a new subfamily of bacterial peroxidases, designated as Fatty acid-di-heme Cytochrome c peroxidases (FadCcp). PMID:26154497
Retinitis Pigmentosa with EYS Mutations Is the Most Prevalent Inherited Retinal Dystrophy in Japanese Populations.

PubMed

Arai, Yuuki; Maeda, Akiko; Hirami, Yasuhiko; Ishigami, Chie; Kosugi, Shinji; Mandai, Michiko; Kurimoto, Yasuo; Takahashi, Masayo

2015-01-01

The aim of this study was to gain information about disease prevalence and to identify the responsible genes for inherited retinal dystrophies (IRD) in Japanese populations. Clinical and molecular evaluations were performed on 349 patients with IRD. For segregation analyses, 63 of their family members were employed. Bioinformatics data from 1,208 Japanese individuals were used as controls. Molecular diagnosis was obtained by direct sequencing in a stepwise fashion utilizing one or two panels of 15 and 27 genes for retinitis pigmentosa patients. If a specific clinical diagnosis was suspected, direct sequencing of disease-specific genes, that is, ABCA4 for Stargardt disease, was conducted. Limited availability of intrafamily information and decreasing family size hampered identifying inherited patterns. Differential disease profiles with lower prevalence of Stargardt disease from European and North American populations were obtained. We found 205 sequence variants in 159 of 349 probands with an identification rate of 45.6%. This study found 43 novel sequence variants. In silico analysis suggests that 20 of 25 novel missense variants are pathogenic. EYS mutations had the highest prevalence at 23.5%. c.4957_4958insA and c.8868C>A were the two major EYS mutations identified in this cohort. EYS mutations are the most prevalent among Japanese patients with IRD.
Strategies for high-altitude adaptation revealed from high-quality draft genome of non-violacein producing Janthinobacterium lividum ERGS5:01.

PubMed

Kumar, Rakshak; Acharya, Vishal; Singh, Dharam; Kumar, Sanjay

2018-01-01

A light pink coloured bacterial strain ERGS5:01 isolated from glacial stream water of Sikkim Himalaya was affiliated to Janthinobacterium lividum based on 16S rRNA gene sequence identity and phylogenetic clustering. Whole genome sequencing was performed for the strain to confirm its taxonomy as it lacked the typical violet pigmentation of the genus and also to decipher its survival strategy at the aquatic ecosystem of high elevation. The PacBio RSII sequencing generated genome of 5,168,928 bp with 4575 protein-coding genes and 118 RNA genes. Whole genome-based multilocus sequence analysis clustering, in silico DDH similarity value of 95.1% and, the ANI value of 99.25% established the identity of the strain ERGS5:01 (MCC 2953) as a non-violacein producing J. lividum . The genome comparisons across genus Janthinobacterium revealed an open pan-genome with the scope of the addition of new orthologous cluster to complete the genomic inventory. The genomic insight provided the genetic basis of freezing and frequent freeze-thaw cycle tolerance and, for industrially important enzymes. Extended insight into the genome provided clues of crucial genes associated with adaptation in the harsh aquatic ecosystem of high altitude.
Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification.

PubMed

Ziesemer, Kirsten A; Mann, Allison E; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T; Brandt, Bernd W; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A; MacDonald, Sandy J; Thomas, Gavin H; Collins, Matthew J; Lewis, Cecil M; Hofman, Corinne; Warinner, Christina

2015-11-13

To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341-534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions.
Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification

PubMed Central

Ziesemer, Kirsten A.; Mann, Allison E.; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T.; Brandt, Bernd W.; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C.; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A.; MacDonald, Sandy J.; Thomas, Gavin H.; Collins, Matthew J.; Lewis, Cecil M.; Hofman, Corinne; Warinner, Christina

2015-01-01

To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341–534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions. PMID:26563586
Whole exome sequencing with genomic triangulation implicates CDH2-encoded N-cadherin as a novel pathogenic substrate for arrhythmogenic cardiomyopathy.

PubMed

Turkowski, Kari L; Tester, David J; Bos, J Martijn; Haugaa, Kristina H; Ackerman, Michael J

2017-03-01

Arrhythmogenic cardiomyopathy (ACM) is a heritable disease characterized by fibrofatty replacement of cardiomyocytes, has a prevalence of approximately 1 in 5000 individuals, and accounts for approximately 20% of sudden cardiac death in the young (≤35 years). ACM is most often inherited as an autosomal dominant trait with incomplete penetrance and variable expression. While mutations in several genes that encode key desmosomal proteins underlie about half of all ACM, the remainder is elusive genetically. Here, whole exome sequencing (WES) was performed with genomic triangulation in an effort to identify a novel explanation for a phenotype-positive, genotype-negative multi-generational pedigree with a presumed autosomal dominant, maternal inheritance of ACM. WES and genomic triangulation was performed on a symptomatic 14-year-old female proband, her affected mother and affected sister, and her unaffected father to elucidate a novel ACM-susceptibility gene for this pedigree. Following variant filtering using Ingenuity® Variant Analysis, gene priority ranking was performed on the candidate genes using ToppGene and Endeavour. The phylogenetic and physiochemical properties of candidate mutations were assessed further by 6 in silico prediction tools. Species alignment and amino acid conservation analysis was performed using the Uniprot Consortium. Tissue expression data was abstracted from Expression Atlas. Following WES and genomic triangulation, CDH2 emerged as a novel, autosomal dominant, ACM-susceptibility gene. The CDH2-encoded N-cadherin is a cell-cell adhesion protein predominately expressed in the heart. Cardiac dysfunction has been demonstrated in prior CDH2 knockout and over-expression animal studies. Further in silico mutation prediction, species conservation, and protein expression analysis supported the ultra-rare (minor allele frequency <0.005%) p.Asp407Asn-CDH2 variant as a likely pathogenic variant. Herein, it is demonstrated that genetic mutations in CDH2-encoded N-cadherin may represent a novel pathogenetic basis for ACM in humans. The prevalence of CDH2-mediated ACM in heretofore genetically elusive ACM remains to be determined. © 2017 Wiley Periodicals, Inc.
A Novel Loss-of-Sclerostin Function Mutation in a First Egyptian Family with Sclerosteosis

PubMed Central

Fayez, Alaaeldin; Aglan, Mona; Esmaiel, Nora; El Zanaty, Taher; Abdel Kader, Mohamed; El Ruby, Mona

2015-01-01

Sclerosteosis is a rare autosomal recessive condition characterized by increased bone density. Mutations in SOST gene coding for sclerostin are linked to sclerosteosis. Two Egyptian brothers with sclerosteosis and their apparently normal consanguineous parents were included in this study. Clinical evaluation and genomic sequencing of the SOST gene were performed followed by in silico analysis of the resulting variation. A novel homozygous frameshift mutation in the SOST gene, characterized as one nucleotide cytosine insertion that led to premature stop codon and loss of functional sclerostin, was identified in the two affected brothers. Their parents were heterozygous for the same mutation. To our knowledge this is the first Egyptian study of sclerosteosis and SOST gene causing mutation. PMID:25984533
Proposal of an in silico profiler for categorisation of repeat dose toxicity data of hair dyes.

PubMed

Nelms, M D; Ates, G; Madden, J C; Vinken, M; Cronin, M T D; Rogiers, V; Enoch, S J

2015-05-01

This study outlines the analysis of 94 chemicals with repeat dose toxicity data taken from Scientific Committee on Consumer Safety opinions for commonly used hair dyes in the European Union. Structural similarity was applied to group these chemicals into categories. Subsequent mechanistic analysis suggested that toxicity to mitochondria is potentially a key driver of repeat dose toxicity for chemicals within each of the categories. The mechanistic hypothesis allowed for an in silico profiler consisting of four mechanism-based structural alerts to be proposed. These structural alerts related to a number of important chemical classes such as quinones, anthraquinones, substituted nitrobenzenes and aromatic azos. This in silico profiler is intended for grouping chemicals into mechanism-based categories within the adverse outcome pathway paradigm.
MTHFR-Ala222Val and male infertility: a study in Iranian men, an updated meta-analysis and an in silico-analysis.

PubMed

Nikzad, Hossein; Karimian, Mohammad; Sareban, Kobra; Khoshsokhan, Maryam; Hosseinzadeh Colagar, Abasalt

2015-11-01

Methylenetetrahydrofolate reductase (MTHFR) functions as a main regulatory enzyme in folate metabolism. The association of MTHFR gene Ala222Val polymorphism with male infertility in an Iranian population was investigated by undertaking a meta-analysis and in-silico approach. A genetic association study included 497 men; 242 had unexplained infertility and 255 were healthy controls. Polymerase chain reaction restriction fragment length polymorphism was used for genotyping MTHFR-Ala222Val. OpenMeta[Analyst] software was used to conduct the analysis; 22 studies were identified by searching PubMed and the currently reported genetic association study. A novel in-silico approach was used to analyse the effects of Ala222Val substitution on the structure of mRNA and protein. Genetic association study revealed a significant association of MTHFR-222Val/Val genotype with oligozoospermia (OR 2.32; 95% CI, 1.12 to 4.78; P = 0.0451) and azoospermia (OR 2.59; 95% CI 1.09 to 6.17; P = 0.0314). Meta-analysis for allelic, dominant and codominant models showed a significant association between Ala222Val polymorphism and the risk of male infertility (P < 0.001). In silico-analysis showed MTHFR-Ala222Val affects enzyme structure and could also change the mRNA properties (P = 0.1641; P < 0.2 is significant). The meta-analysis suggested significant association of MTHFR-Ala222Val with risk of male infertility, especially in Asian populations. Copyright © 2015 Reproductive Healthcare Ltd. Published by Elsevier Ltd. All rights reserved.
Alkaline active cyanide dihydratase of Flavobacterium indicum MTCC 6936: Growth optimization, purification, characterization and in silico analysis.

PubMed

Kumar, Virender; Kumar, Vijay; Bhalla, Tek Chand

2018-05-15

The present work explores a rare cyanide dihydratase of Flavobacterium indicum MTCC 6936 for its potential of cyanide degradation. The enzyme is purified to 12 fold with a yield of 76%. SDS and native-PAGE analysis revealed that enzyme was monomer of 40 kDa size. The enzyme works well in mesophilic range at wide array of pH. The thermostability profile of cyanide dihydratase revealed that the enzyme is quite stable at 30 °C and 35 °C with half-life of 6 h 30 min and 5 h respectively. K m and V max for cyanide dihydratase of F. indicum was measured to be 4.76 mM and 45 U mg -1 with k cat calculated to be 27.3 s -1 and specificity constant (k cat /K m ) to be around 5.67 mM -1  s -1 . MALDI-TOF analysis of purified protein revealed that the amino acid sequence has 50% and 43% sequence identity with putative amino acid sequence of F. indicum and earlier reported cyanide dihydratase of Bacillus pumilus respectively. Homology modeling studies of cyanide dihydratase of F. indicum predicted the catalytic triad of the enzyme indicating Cys at 164, Glu at 46 and Lys at 130th position. The purified enzyme has potential applications in bioremediation and analytical sector. Copyright © 2018 Elsevier B.V. All rights reserved.
A comparative molecular characterization of AMDV strains isolated from cases of clinical and subclinical infection.

PubMed

Kowalczyk, Marek; Jakubczak, Andrzej; Horecka, Beata; Kostro, Krzysztof

2018-05-29

The Aleutian mink disease virus (AMDV) is one of the most serious threats to modern mink breeding. The disease can have various courses, from progressive to subclinical infections. The objective of the study was to provide a comparative molecular characterization of isolates of AMDV from farms with a clinical and subclinical course of the disease. The qPCR analysis showed a difference of two orders of magnitude between the number of copies of the viral DNA on the farm with the clinical course of the disease (10 5 ) and the farm with the subclinical course (10 3 ). The sequencing results confirm a high level of homogeneity within each farm and variation between them. The phylogenetic analysis indicates that the variants belonging to different farms are closely related and occupy different branches of the same clade. The in silico analysis of the effect of differences in the sequence encoding the VP2 protein between the farms revealed no effect of the polymorphism on its functionality. The close phylogenetic relationship between the isolates from the two farms, the synonymous nature of most of the polymorphisms and the potentially minor effect on the functionality of the protein indicate that the differences in the clinical picture may be due not only to polymorphisms in the nucleotide and amino acid sequences, but also to the stage of infection on the farm and the degree of stabilization of the pathogen-host relationship.
Development and application of microsatellites in candidate genes related to wood properties in the Chinese white poplar (Populus tomentosa Carr.).

PubMed

Du, Qingzhang; Gong, Chenrui; Pan, Wei; Zhang, Deqiang

2013-02-01

Gene-derived simple sequence repeats (genic SSRs), also known as functional markers, are often preferred over random genomic markers because they represent variation in gene coding and/or regulatory regions. We characterized 544 genic SSR loci derived from 138 candidate genes involved in wood formation, distributed throughout the genome of Populus tomentosa, a key ecological and cultivated wood production species. Of these SSRs, three-quarters were located in the promoter or intron regions, and dinucleotide (59.7%) and trinucleotide repeat motifs (26.5%) predominated. By screening 15 wild P. tomentosa ecotypes, we identified 188 polymorphic genic SSRs with 861 alleles, 2-7 alleles for each marker. Transferability analysis of 30 random genic SSRs, testing whether these SSRs work in 26 genotypes of five genus Populus sections (outgroup, Salix matsudana), showed that 72% of the SSRs could be amplified in Turanga and 100% could be amplified in Leuce. Based on genotyping of these 26 genotypes, a neighbour-joining analysis showed the expected six phylogenetic groupings. In silico analysis of SSR variation in 220 sequences that are homologous between P. tomentosa and Populus trichocarpa suggested that genic SSR variations between relatives were predominantly affected by repeat motif variations or flanking sequence mutations. Inheritance tests and single-marker associations demonstrated the power of genic SSRs in family-based linkage mapping and candidate gene-based association studies, as well as marker-assisted selection and comparative genomic studies of P. tomentosa and related species.
Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data.

PubMed

Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J

2017-01-01

High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.
RNA-SSPT: RNA Secondary Structure Prediction Tools.

PubMed

Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; Din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

2013-01-01

The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes.
RNA-SSPT: RNA Secondary Structure Prediction Tools

PubMed Central

Ahmad, Freed; Mahboob, Shahid; Gulzar, Tahsin; din, Salah U; Hanif, Tanzeela; Ahmad, Hifza; Afzal, Muhammad

2013-01-01

The prediction of RNA structure is useful for understanding evolution for both in silico and in vitro studies. Physical methods like NMR studies to predict RNA secondary structure are expensive and difficult. Computational RNA secondary structure prediction is easier. Comparative sequence analysis provides the best solution. But secondary structure prediction of a single RNA sequence is challenging. RNA-SSPT is a tool that computationally predicts secondary structure of a single RNA sequence. Most of the RNA secondary structure prediction tools do not allow pseudoknots in the structure or are unable to locate them. Nussinov dynamic programming algorithm has been implemented in RNA-SSPT. The current studies shows only energetically most favorable secondary structure is required and the algorithm modification is also available that produces base pairs to lower the total free energy of the secondary structure. For visualization of RNA secondary structure, NAVIEW in C language is used and modified in C# for tool requirement. RNA-SSPT is built in C# using Dot Net 2.0 in Microsoft Visual Studio 2005 Professional edition. The accuracy of RNA-SSPT is tested in terms of Sensitivity and Positive Predicted Value. It is a tool which serves both secondary structure prediction and secondary structure visualization purposes. PMID:24250115
In silico analysis of β-mannanases and β-mannosidase from Aspergillus flavus and Trichoderma virens UKM1

NASA Astrophysics Data System (ADS)

Yee, Chai Sin; Murad, Abdul Munir Abdul; Bakar, Farah Diba Abu

2013-11-01

A gene encoding an endo-β-1,4-mannanase from Trichoderma virens UKM1 (manTV) and Aspergillus flavus UKM1 (manAF) was analysed with bioinformatic tools. In addition, A. flavus NRRL 3357 genome database was screened for a β-mannosidase gene and analysed (mndA-AF). These three genes were analysed to understand their gene properties. manTV and manAF both consists of 1,332-bp and 1,386-bp nucleotides encoding 443 and 461 amino acid residues, respectively. Both the endo-β-1,4-mannanases belong to the glycosyl hydrolase family 5 and contain a carbohydrate-binding module family 1 (CBM1). On the other hand, mndA-AF which is a 2,745-bp gene encodes a protein sequence of 914 amino acid residues. This β-mannosidase belongs to the glycosyl hydrolase family 2. Predicted molecular weight of manTV, manAF and mndA-AF are 47.74 kDa, 49.71 kDa and 103 kDa, respectively. All three predicted protein sequences possessed signal peptide sequence and are highly conserved among other fungal β-mannanases and β-mannosidases.
In silico analysis of β-1,3-glucanase from a psychrophilic yeast, Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Mohammadi, Salimeh; Bakar, Farah Diba Abu; Rabu, Amir; Murad, Abdul Munir Abdul

2014-09-01

1,3-beta-glucanase is an industrially important enzyme having wide range of applications especially in food industry. It is crucial to gain an understanding about the structure and functional aspects of various beta-1,3-glucanase produced from diverse sources. In this, study a cDNA encoding β-1,3-glucanase (GaExg55) was isolated from a psychrophilic yeast, Glaciozyma antarctica PI12. The cDNA sequence has been submitted to Genbank with an accession number (KJ436377). Subsequently, the perdition protein was analyzed using various bioinformatics tools to explore the properties of the protein. GaEXG55 is consisting of 1,440-bp nucleotides encoding 480 amino acid residues. Alignment of the deduced amino acid for GaExg55 with other exo-β-1,3-glucanase available at the NCBI database indicate that deduced amino acids shared a consensus motif NEP, which is signature pattern of GH5 hydrolases. Predicted molecular weight of GaExg55 is 53.66 kDa. GaExg55 sequences possesses signal peptide sequence and it is highly conserved with other fungal exo-beta-1,3 glucanase.

Protospacer Adjacent Motif (PAM)-Distal Sequences Engage CRISPR Cas9 DNA Target Cleavage

PubMed Central

Ethier, Sylvain; Schmeing, T. Martin; Dostie, Josée; Pelletier, Jerry

2014-01-01

The clustered regularly interspaced short palindromic repeat (CRISPR)-associated enzyme Cas9 is an RNA-guided nuclease that has been widely adapted for genome editing in eukaryotic cells. However, the in vivo target specificity of Cas9 is poorly understood and most studies rely on in silico predictions to define the potential off-target editing spectrum. Using chromatin immunoprecipitation followed by sequencing (ChIP-seq), we delineate the genome-wide binding panorama of catalytically inactive Cas9 directed by two different single guide (sg) RNAs targeting the Trp53 locus. Cas9:sgRNA complexes are able to load onto multiple sites with short seed regions adjacent to 5′NGG3′ protospacer adjacent motifs (PAM). Yet among 43 ChIP-seq sites harboring seed regions analyzed for mutational status, we find editing only at the intended on-target locus and one off-target site. In vitro analysis of target site recognition revealed that interactions between the 5′ end of the guide and PAM-distal target sequences are necessary to efficiently engage Cas9 nucleolytic activity, providing an explanation for why off-target editing is significantly lower than expected from ChIP-seq data. PMID:25275497
Using RNA-sequencing and in silico subtraction to identify resistance gene analog markers for Lr16 in wheat

USDA-ARS?s Scientific Manuscript database

Leaf rust, caused by Puccinia triticina Eriks., is one of the most widespread diseases of wheat worldwide and breeding for resistance is one of the most effective methods of control. Lr16 is a wheat leaf rust resistance gene that provides resistance at both the seedling and adult stages. Simple s...
Draft Genome Sequence of Mycobacterium boenickei CIP 107829.

PubMed

Bouam, Amar; Robert, Catherine; Croce, Olivier; Levasseur, Anthony; Drancourt, Michel

2017-05-04

Mycobacterium boenickei is a rapidly growing mycobacterium isolated for the first time from a leg wound in the United States. Its 6,506,908-bp draft genome exhibits a 66.77% G+C content, 6,279 protein-coding genes, and 59 predicted RNA genes. In silico DNA-DNA hybridization confirms its assignment to the Mycobacterium fortuitum complex. Copyright © 2017 Bouam et al.
Using diverse U.S. beef cattle genomes to identify missense mutations in EPAS1, a gene associated with high-altitude pulmonary hypertension

USDA-ARS?s Scientific Manuscript database

The availability of whole genome sequence (WGS) data has made it possible to discover protein variants in silico. However, bovine WGS databases comprised of related influential sires from relatively few breeds tend to under represent the breadth of genetic diversity in U.S. beef cattle. Thus, our ...
PNA-COMBO-FISH: From combinatorial probe design in silico to vitality compatible, specific labelling of gene targets in cell nuclei.

PubMed

Müller, Patrick; Rößler, Jens; Schwarz-Finsterle, Jutta; Schmitt, Eberhard; Hausmann, Michael

2016-07-01

Recently, advantages concerning targeting specificity of PCR constructed oligonucleotide FISH probes in contrast to established FISH probes, e.g. BAC clones, have been demonstrated. These techniques, however, are still using labelling protocols with DNA denaturing steps applying harsh heat treatment with or without further denaturing chemical agents. COMBO-FISH (COMBinatorial Oligonucleotide FISH) allows the design of specific oligonucleotide probe combinations in silico. Thus, being independent from primer libraries or PCR laboratory conditions, the probe sequences extracted by computer sequence data base search can also be synthesized as single stranded PNA-probes (Peptide Nucleic Acid probes) or TINA-DNA (Twisted Intercalating Nucleic Acids). Gene targets can be specifically labelled with at least about 20 probes obtaining visibly background free specimens. By using appropriately designed triplex forming oligonucleotides, the denaturing procedures can completely be omitted. These results reveal a significant step towards oligonucleotide-FISH maintaining the 3d-nanostructure and even the viability of the cell target. The method is demonstrated with the detection of Her2/neu and GRB7 genes, which are indicators in breast cancer diagnosis and therapy. Copyright © 2016. Published by Elsevier Inc.
Cloning and in-silico analysis of beta-1,3-xylanase from psychrophilic yeast, Glaciozyma antarctica PI12

NASA Astrophysics Data System (ADS)

Nor, Nooraisyah Mohamad; Bakar, Farah Diba Abu; Mahadi, Nor Muhammad; Murad, Abdul Munir Abdul

2015-09-01

A beta-1,3-xylanase (EC 3.2.1.32) gene from psychrophilic yeast, Glaciozyma antarctica has been identified via genome data mining. The enzyme was grouped into GH26 family based on Carbohydrate Active Enzyme (CaZY) database. The molecular weight of this protein was predicted to be 42 kDa and is expected to be soluble for expression. The presence of signal peptide suggested that this enzyme may be released extracellularly into the marine environment of the host's habitat. This supports the theory that such enzymatic activity is required for degradation of nutrients of polysaccharide origins into simpler carbohydrates outside the environment before it could be taken up inside the cell. The sequence for this protein showed very little conservation (< 30%) with other beta-1,3-xylanases from available databases. Based on the phylogenetic analysis, this protein also showed distant relationship to other xylanases from eukaryotic origin. The protein may have undergone major substitution in its gene sequence order to adapt to the cold climate. This is the first report of beta-1,3-xylanase gene isolated from a psychrophilic yeast.
Novel dehydrins lacking complete K-segments in Pinaceae. The exception rather than the rule

PubMed Central

Perdiguero, Pedro; Collada, Carmen; Soto, Álvaro

2014-01-01

Dehydrins are thought to play an essential role in the plant response, acclimation and tolerance to different abiotic stresses, such as cold and drought. These proteins contain conserved and repeated segments in their amino acid sequence, used for their classification. Thus, dehydrins from angiosperms present different repetitions of the segments Y, S, and K, while gymnosperm dehydrins show A, E, S, and K segments. The only fragment present in all the dehydrins described to date is the K-segment. Different works suggest the K-segment is involved in key protective functions during dehydration stress, mainly stabilizing membranes. In this work, we describe for the first time two Pinus pinaster proteins with truncated K-segments and a third one completely lacking K-segments, but whose sequence homology leads us to consider them still as dehydrins. qRT-PCR expression analysis show a significant induction of these dehydrins during a severe and prolonged drought stress. By in silico analysis we confirmed the presence of these dehydrins in other Pinaceae species, breaking the convention regarding the compulsory presence of K-segments in these proteins. The way of action of these unusual dehydrins remains unrevealed. PMID:25520734
Satellite DNA: An Evolving Topic

PubMed Central

Garrido-Ramos, Manuel A.

2017-01-01

Satellite DNA represents one of the most fascinating parts of the repetitive fraction of the eukaryotic genome. Since the discovery of highly repetitive tandem DNA in the 1960s, a lot of literature has extensively covered various topics related to the structure, organization, function, and evolution of such sequences. Today, with the advent of genomic tools, the study of satellite DNA has regained a great interest. Thus, Next-Generation Sequencing (NGS), together with high-throughput in silico analysis of the information contained in NGS reads, has revolutionized the analysis of the repetitive fraction of the eukaryotic genomes. The whole of the historical and current approaches to the topic gives us a broad view of the function and evolution of satellite DNA and its role in chromosomal evolution. Currently, we have extensive information on the molecular, chromosomal, biological, and population factors that affect the evolutionary fate of satellite DNA, knowledge that gives rise to a series of hypotheses that get on well with each other about the origin, spreading, and evolution of satellite DNA. In this paper, I review these hypotheses from a methodological, conceptual, and historical perspective and frame them in the context of chromosomal organization and evolution. PMID:28926993
No novel, high penetrant gene might remain to be found in Japanese patients with unknown MODY.

PubMed

Horikawa, Yukio; Hosomichi, Kazuyoshi; Enya, Mayumi; Ishiura, Hiroyuki; Suzuki, Yutaka; Tsuji, Shoji; Sugano, Sumio; Inoue, Ituro; Takeda, Jun

2018-07-01

MODY 5 and 6 have been shown to be low-penetrant MODYs. As the genetic background of unknown MODY is assumed to be similar, a new analytical strategy is applied here to elucidate genetic predispositions to unknown MODY. We examined to find whether there are major MODY gene loci remaining to be identified using SNP linkage analysis in Japanese. Whole-exome sequencing was performed with seven families with typical MODY. Candidates for novel MODY genes were examined combined with in silico network analysis. Some peaks were found only in either parametric or non-parametric analysis; however, none of these peaks showed a LOD score greater than 3.7, which is approved to be the significance threshold of evidence for linkage. Exome sequencing revealed that three mutated genes were common among 3 families and 42 mutated genes were common in two families. Only one of these genes, MYO5A, having rare amino acid mutations p.R849Q and p.V1601G, was involved in the biological network of known MODY genes through the intermediary of the INS. Although only one promising candidate gene, MYO5A, was identified, no novel, high penetrant MODY genes might remain to be found in Japanese MODY.
Isolation, characterization, and structure analysis of a vacuolar processing enzyme gene (MhVPEγ) from Malus hupehensis (Pamp) Rehd.

PubMed

Ran, Kun; Yang, Hongqiang; Sun, Xiaoli; Li, Qiang; Jiang, Qianqian; Zhang, Weiwei; Shen, Wei

2014-05-01

Vacuolar processing enzymes (VPEs) have received considerable attention recently, as they exhibit caspase-1-like cleavage activity and regulate the process of PCD. However, knowledge about their detailed characteristics and structures is relatively limited. In this study, a gamma vacuolar processing enzyme gene, MhVPEγ, has been isolated from the leaves of Malus hupehensis (Ramp) Rehd. var pinyiensis Jiang. MhVPEγ coded-translated protein sequence comprised of 494 amino acids with a signal peptide and a transmembrane helix structure at N-terminal, peptidase_C13 domain, and vacuolar sorting signal at C-terminal. Consequently, genomic walking approach was performed for the isolation of its upstream sequence. Computational analysis demonstrated several motifs of the promoter exhibiting hypothetic MeJA, ABA, and light-induced characteristics, as well as some typical domains universally discovered in promoter, such as TATA-box and CAAT-box. MhVPEγ transcript level was enhanced during wounding treatment, and WUN-motif, as one of the cis-acting regulatory elements existing in the upstream sequence perhaps regulates its expression. In silico-constructed 3D models revealed that MhCPYL successively interacts with MhVPEγ like that of "Induced Fit-Lock and Key" model, providing molecular conformation evidence that CPY is a direct substrate of VPEγ. This study is the first stride to understand the molecular mechanism of VPEγ and CPYL interactions.
Bioinformatics and peptidomics approaches to the discovery and analysis of food-derived bioactive peptides.

PubMed

Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C

2018-06-01

There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed

Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

PubMed Central

Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

2015-01-01

In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179
IMGMD: A platform for the integration and standardisation of In silico Microbial Genome-scale Metabolic Models.

PubMed

Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming

2017-04-07

Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.
A novel NOTCH3 mutation identified in patients with oral cancer by whole exome sequencing.

PubMed

Yi, Yanjun; Tian, Zhuowei; Ju, Houyu; Ren, Guoxin; Hu, Jingzhou

2017-06-01

Oral cancer is a serious disease caused by environmental factors and/or susceptible genes. In the present study, in order to identify useful genetic biomarkers for cancer prediction and prevention, and for personalized treatment, we detected somatic mutations in 5 pairs of oral cancer tissues and blood samples using whole exome sequencing (WES). Finally, we confirmed a novel nonsense single-nucleotide polymorphism (SNP; chr19:15288426A>C) in the NOTCH3 gene with sanger sequencing, which resulted in a N1438T mutation in the protein sequence. Using multiple in silico analyses, this variant was found to mildly damaging effects on the NOTCH3 gene, which was supported by the results from analyses using PANTHER, SNAP and SNPs&GO. However, further analysis using Mutation Taster revealed that this SNP had a probability of 0.9997 to be 'disease causing'. In addition, we performed 3D structure simulation analysis and the results suggested that this variant had little effect on the solubility and hydrophobicity of the protein and thus on its function; however, it decreased the stability of the protein by increasing the total energy following minimization (-1,051.39 kcal/mol for the mutant and -1,229.84 kcal/mol for the native) and decreasing one stabilizing residue of the protein. Less stability of the N1438T mutant was also supported by analysis using I-Mutant with a DDG value of -1.67. Overall, the present study identified and confirmed a novel mutation in the NOTCH3 gene, which may decrease the stability of NOTCH3, and may thus prove to be helpful in cancer prognosis.
In silico analysis of Mn transporters (NRAMP1) in various plant species.

PubMed

Vatansever, Recep; Filiz, Ertugrul; Ozyigit, Ibrahim Ilker

2016-03-01

Manganese (Mn) is an essential micronutrient in plant life cycle. It may be involved in photosynthesis, carbohydrate and lipid biosynthesis, and oxidative stress protection. Mn deficiency inhibits the plant growth and development, and causes the various plant symptoms such as interveinal chlorosis and tissue necrosis. Despite its importance in plant life cycle, we still have limited knowledge about Mn transporters in many plant species. Therefore, this study aimed to identify and characterize high affinity Arabidopsis Mn root transporter NRAMP1 orthologs in 17 different plant species. Various in silico methods and digital gene expression data were used in identification and characterization of NRAMP1 homologs; physico-chemical properties of sequences were calculated, putative transmembrane domains (TMDs) and conserved motif signatures were determined, phylogenetic tree was constructed, 3D models and interactome map were generated, and gene expression data was analyzed. 49 NRAMP1 homologs were identified from proteome datasets of 17 plant species using AtNRAMP1 as query. Identified sequences were characterized with a NRAMP domain structure, 10-12 putative TMDs with cytosolic N- and C-terminuses, and 10-14 exons encoding a protein of 500-588 amino acids and 53.8-64.3 kDa molecular weight with basic characteristics. Consensus transport residues, GQSSTITGTYAGQY(/F)V(/I)MQGFLD(/E/N) between TMD-8 and 9 were identified in all sequences but putative N-linked glycosylation sites were not highly conserved. In phylogeny, NRAMP1 sequences demonstrated divergence in lower and higher plants as well as in monocots and dicots. Despite divergence of lower plant Physcomitrella patens in phylogeny, it showed similarity in superposed 3D models. Phylogenetic distribution of AtNRAMP1 and 6 homologs inferred a functional relationship to NRAMP6 sequences in Mn transport, while distribution of OsNRAMP1 and 5 homologs implicated an involvement of NRAMP1 sequences in Mn transport or a cross-talk between in Fe-Mn homeostasis. Interactome analysis further confirmed this cross-talk between Mn and Fe pathways. Gene expression profile of AtNRAMP1 under Fe-, K-, P- and S-deficiencies, and cold, drought, heat and salt stresses revealed various proteins involving in transcription regulation, cofactor biosynthesis, diverse developmental roles, carbohydrate metabolism, oxidation-reduction reactions, cellular signaling and protein degradation pathways. Mn deficiency or toxicity could cause serious adverse effects in plants as well as in humans. To reduce these adversities mainly rely on understanding the molecular mechanisms underlying Mn uptake from the soil. However, we still have limited knowledge regarding the structural and functional roles of Mn transporters in many plant species. Therefore, identification and characterization of Mn root uptake transporter, NRAMP1 orthologs in various plant species will provide valuable theoretical knowledge to better understand Mn transporters as well as it may become an insight for future studies aiming to develop genetically engineered and biofortified plants.
Miniprimer PCR, a New Lens for Viewing the Microbial World▿ †

PubMed Central

Isenbarger, Thomas A.; Finney, Michael; Ríos-Velázquez, Carlos; Handelsman, Jo; Ruvkun, Gary

2008-01-01

Molecular methods based on the 16S rRNA gene sequence are used widely in microbial ecology to reveal the diversity of microbial populations in environmental samples. Here we show that a new PCR method using an engineered polymerase and 10-nucleotide “miniprimers” expands the scope of detectable sequences beyond those detected by standard methods using longer primers and Taq polymerase. After testing the method in silico to identify divergent ribosomal genes in previously cloned environmental sequences, we applied the method to soil and microbial mat samples, which revealed novel 16S rRNA gene sequences that would not have been detected with standard primers. Deeply divergent sequences were discovered with high frequency and included representatives that define two new division-level taxa, designated CR1 and CR2, suggesting that miniprimer PCR may reveal new dimensions of microbial diversity. PMID:18083877
Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

PubMed Central

2012-01-01

Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993
An expressed sequence tag (EST) data mining strategy succeeding in the discovery of new G-protein coupled receptors.

PubMed

Wittenberger, T; Schaller, H C; Hellebrand, S

2001-03-30

We have developed a comprehensive expressed sequence tag database search method and used it for the identification of new members of the G-protein coupled receptor superfamily. Our approach proved to be especially useful for the detection of expressed sequence tag sequences that do not encode conserved parts of a protein, making it an ideal tool for the identification of members of divergent protein families or of protein parts without conserved domain structures in the expressed sequence tag database. At least 14 of the expressed sequence tags found with this strategy are promising candidates for new putative G-protein coupled receptors. Here, we describe the sequence and expression analysis of five new members of this receptor superfamily, namely GPR84, GPR86, GPR87, GPR90 and GPR91. We also studied the genomic structure and chromosomal localization of the respective genes applying in silico methods. A cluster of six closely related G-protein coupled receptors was found on the human chromosome 3q24-3q25. It consists of four orphan receptors (GPR86, GPR87, GPR91, and H963), the purinergic receptor P2Y1, and the uridine 5'-diphosphoglucose receptor KIAA0001. It seems likely that these receptors evolved from a common ancestor and therefore might have related ligands. In conclusion, we describe a data mining procedure that proved to be useful for the identification and first characterization of new genes and is well applicable for other gene families. Copyright 2001 Academic Press.
Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

PubMed Central

Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.

2017-01-01

Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800

Efficient analysis of mouse genome sequences reveal many nonsense variants

PubMed Central

Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

2016-01-01

Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Back to Basics--The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated Sludge Communities.

PubMed

Albertsen, Mads; Karst, Søren M; Ziegler, Anja S; Kirkegaard, Rasmus H; Nielsen, Per H

2015-01-01

DNA extraction and primer choice have a large effect on the observed community structure in all microbial amplicon sequencing analyses. Although the biases are well known, no comprehensive analysis has been conducted in activated sludge communities. In this study we systematically explored the impact of a number of parameters on the observed microbial community: bead beating intensity, primer choice, extracellular DNA removal, and various PCR settings. In total, 176 samples were subjected to 16S rRNA amplicon sequencing, and selected samples were investigated through metagenomics and metatranscriptomics. Quantitative fluorescence in situ hybridization was used as a DNA extraction-independent method for qualitative comparison. In general, an effect on the observed community was found on all parameters tested, although bead beating and primer choice had the largest effect. The effect of bead beating intensity correlated with cell-wall strength as seen by a large increase in DNA from Gram-positive bacteria (up to 400%). However, significant differences were present at lower phylogenetic levels within the same phylum, suggesting that additional factors are at play. The best primer set based on in silico analysis was found to underestimate a number of important bacterial groups. For 16S rRNA gene analysis in activated sludge we recommend using the FastDNA SPIN Kit for Soil with four times the normal bead beating and V1-3 primers.
Effective Feature Selection for Classification of Promoter Sequences.

PubMed

K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

2016-01-01

Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Whole genome sequencing for typing and characterisation of Listeria monocytogenes isolated in a rabbit meat processing plant.

PubMed

Palma, Federica; Pasquali, Frédérique; Lucchi, Alex; Cesare, Alessandra De; Manfreda, Gerardo

2017-08-16

Listeria monocytogenes is a food-borne pathogen able to survive and grow in different environments including food processing plants where it can persist for month or years. In the present study the discriminatory power of Whole Genome Sequencing (WGS)-based analysis (cgMLST) was compared to that of molecular typing methods on 34 L. monocytogenes isolates collected over one year in the same rabbit meat processing plant and belonging to three genotypes (ST14, ST121, ST224). Each genotype included isolates indistinguishable by standard molecular typing methods. The virulence potential of all isolates was assessed by Multi Virulence-Locus Sequence Typing (MVLST) and the investigation of a representative database of virulence determinant genes. The whole genome of each isolate was sequenced on a MiSeq platform. The cgMLST, MVLST, and in silico identification of virulence genes were performed using publicly available tools. Draft genomes included a number of contigs ranging from 13 to 28 and N50 ranging from 456298 to 580604. The coverage ranged from 41 to 187X. The cgMLST showed a significantly superior discriminatory power only in comparison to ribotyping, nevertheless it allows the detection of two singletons belonging to ST14 that were not observed by other molecular methods. All ST14 isolates belonged to VT107, which 7-loci concatenated sequence differs for only 4 nucleotides to VT1 (Epidemic clone III). Analysis of virulence genes showed the presence of a fulllength inlA version in all ST14 isolates and of a mutated version including a premature stop codon (PMSC) associated to attenuated virulence in all ST121 isolates.
Genome-wide identification and characterization of NB-ARC resistant genes in wheat (Triticum aestivum L.) and their expression during leaf rust infection.

PubMed

Chandra, Saket; Kazmi, Andaleeb Z; Ahmed, Zainab; Roychowdhury, Gargi; Kumari, Veena; Kumar, Manish; Mukhopadhyay, Kunal

2017-07-01

NB-ARC domain-containing resistance genes from the wheat genome were identified, characterized and localized on chromosome arms that displayed differential yet positive response during incompatible and compatible leaf rust interactions. Wheat (Triticum aestivum L.) is an important cereal crop; however, its production is affected severely by numerous diseases including rusts. An efficient, cost-effective and ecologically viable approach to control pathogens is through host resistance. In wheat, high numbers of resistance loci are present but only few have been identified and cloned. A comprehensive analysis of the NB-ARC-containing genes in complete wheat genome was accomplished in this study. Complete NB-ARC encoding genes were mined from the Ensembl Plants database to predict 604 NB-ARC containing sequences using the HMM approach. Genome-wide analysis of orthologous clusters in the NB-ARC-containing sequences of wheat and other members of the Poaceae family revealed maximum homology with Oryza sativa indica and Brachypodium distachyon. The identification of overlap between orthologous clusters enabled the elucidation of the function and evolution of resistance proteins. The distributions of the NB-ARC domain-containing sequences were found to be balanced among the three wheat sub-genomes. Wheat chromosome arms 4AL and 7BL had the most NB-ARC domain-containing contigs. The spatio-temporal expression profiling studies exemplified the positive role of these genes in resistant and susceptible wheat plants during incompatible and compatible interaction in response to the leaf rust pathogen Puccinia triticina. Two NB-ARC domain-containing sequences were modelled in silico, cloned and sequenced to analyze their fine structures. The data obtained in this study will augment isolation, characterization and application NB-ARC resistance genes in marker-assisted selection based breeding programs for improving rust resistance in wheat.
Expanding the clinical and genetic spectrum of G6PD deficiency: The occurrence of BCGitis and novel missense mutation.

PubMed

Khan, Taj Ali; Mazhar, Humaira; Nawaz, Mehboob; Kalsoom, Kalsoom; Ishfaq, Muhammad; Asif, Huma; Rahman, Hazir; Qasim, Muhammad; Naz, Farkhanda; Hussain, Mubashir; Khattak, Baharullah; Ullah, Waheed; Cabral-Marques, Otavio; Butt, Jawad; Iqbal, Asif

2017-01-01

Glucose-6-phosphate dehydrogenase (G6PD) is a key enzyme in the pentose phosphate pathway that ensures sufficient production of coenzyme nicotinamide adenine dinucleotide phosphate (NADPH) by catalyzing the reduction of NADP+ to NADPH. Noteworthy, the latter mediates the production of reactive oxygen species (ROS) by phagocytic cells such as neutrophils and monocytes. Therefore, patients with severe forms of G6PD deficiency may present impaired NADPH oxidase activity and become susceptible to recurrent infections. This fact, highlights the importance to characterize the immunopathologic mechanisms underlying the susceptibility to infections in patients with G6PD deficiency. Here we report the first two cases of G6PD deficiency with Bacille Calmette-Guérin (BCG) adverse effect, besides jaundice, hemolytic anemia and recurrent infections caused by Staphylococcus aureus. The qualitative G6PD screening was performed and followed by oxidative burst analysis using flow cytometry. Genetic and in silico analyses were carried out by Sanger sequencing and mutation pathogenicity predicted using bioinformatics tools, respectively. Activated neutrophils and monocytes from patients displayed impaired oxidative burst. The genetic analysis revealed the novel missense mutation c.1157T>A/p.L386Q in G6PD. In addition, in silico analysis indicated that this mutation is pathogenic, thereby hampering the oxidative burst of neutrophils and monocytes from patients. Our data expand the clinical and genetic spectrum of G6PD deficiency, and suggest that impaired oxidative burst in this severe primary immune deficiency is an underlying immunopathologic mechanism that predisposes to mycobacterial infections. Copyright © 2016 Elsevier Ltd. All rights reserved.
Structural and functional analysis of rare missense mutations in human chorionic gonadotrophin β-subunit

PubMed Central

Nagirnaja, Liina; Venclovas, Česlovas; Rull, Kristiina; Jonas, Kim C.; Peltoketo, Hellevi; Christiansen, Ole B.; Kairys, Visvaldas; Kivi, Gaily; Steffensen, Rudi; Huhtaniemi, Ilpo T.; Laan, Maris

2012-01-01

Heterodimeric hCG is one of the key hormones determining early pregnancy success. We have previously identified rare missense mutations in hCGβ genes with potential pathophysiological importance. The present study assessed the impact of these mutations on the structure and function of hCG by applying a combination of in silico (sequence and structure analysis, molecular dynamics) and in vitro (co-immunoprecipitation, immuno- and bioassays) approaches. The carrier status of each mutation was determined for 1086 North-Europeans [655 patients with recurrent miscarriage (RM)/431 healthy controls from Estonia, Finland and Denmark] using PCR-restriction fragment length polymorphism. The mutation CGB5 p.Val56Leu (rs72556325) was identified in a single heterozygous RM patient and caused a structural hindrance in the formation of the hCGα/β dimer. Although the amount of the mutant hCGβ assembled into secreted intact hCG was only 10% compared with the wild-type, a stronger signaling response was triggered upon binding to its receptor, thus compensating the effect of poor dimerization. The mutation CGB8 p.Pro73Arg (rs72556345) was found in five heterozygotes (three RM cases and two control individuals) and was inherited by two of seven studied live born children. The mutation caused ∼50% of secreted β-subunits to acquire an alternative conformation, but did not affect its biological activity. For the CGB8 p.Arg8Trp (rs72556341) substitution, the applied in vitro methods revealed no alterations in the assembly of intact hCG as also supported by an in silico analysis. In summary, the accumulated data indicate that only mutations with neutral or mild functional consequences might be tolerated in the major hCGβ genes CGB5 and CGB8. PMID:22554618
In silico analysis of a disease-causing mutation in PCDH15 gene in a consanguineous Pakistani family with Usher phenotype.

PubMed

Saleha, Shamim; Ajmal, Muhammad; Jamil, Muhammad; Nasir, Muhammad; Hameed, Abdul

2016-01-01

To map Usher phenotype in a consanguineous Pakistani family and identify disease-associated mutation in a causative gene to establish phenotype-genotype correlation. A consanguineous Pakistani family in which Usher phenotype was segregating as an autosomal recessive trait was ascertained. On the basis of results of clinical investigations of affected members of this family disease was diagnosed as Usher syndrome (USH). To identify the locus responsible for the Usher phenotype in this family, genomic DNA from blood sample of each individual was genotyped using microsatellite Short Tandem Repeat (STR) markers for the known Usher syndrome loci. Then direct sequencing was performed to find out disease associated mutations in the candidate gene. By genetic linkage analysis, the USH phenotype of this family was mapped to PCDH15 locus on chromosome 10q21.1. Three different point mutations in exon 11 of PCDH15 were identified and one of them, c.1304A>C was found to be segregating with the disease phenotype in Pakistani family with Usher phenotype. This, c.1304A>C transversion mutation predicts an amino-acid substitution of aspartic acid with an alanine at residue number 435 (p.D435A) of its protein product. Moreover, in silico analysis revealed conservation of aspartic acid at position 435 and predicated this change as pathogenic. The identification of c.1304A>C pathogenic mutation in PCDH15 gene and its association with Usher syndrome in a consanguineous Pakistani family is the first example of a missense mutation of PCDH15 causing USH1 phenotype. In previous reports, it was hypothesized that severe mutations such as truncated protein of PCDH15 led to the Usher I phenotype and that missense variants are mainly responsible for non-syndromic hearing impairment.
A case study of an integrative genomic and experimental therapeutic approach for rare tumors: identification of vulnerabilities in a pediatric poorly differentiated carcinoma.

PubMed

Dela Cruz, Filemon S; Diolaiti, Daniel; Turk, Andrew T; Rainey, Allison R; Ambesi-Impiombato, Alberto; Andrews, Stuart J; Mansukhani, Mahesh M; Nagy, Peter L; Alvarez, Mariano J; Califano, Andrea; Forouhar, Farhad; Modzelewski, Beata; Mitchell, Chelsey M; Yamashiro, Darrell J; Marks, Lianna J; Glade Bender, Julia L; Kung, Andrew L

2016-10-31

Precision medicine approaches are ideally suited for rare tumors where comprehensive characterization may have diagnostic, prognostic, and therapeutic value. We describe the clinical case and molecular characterization of an adolescent with metastatic poorly differentiated carcinoma (PDC). Given the rarity and poor prognosis associated with PDC in children, we utilized genomic analysis and preclinical models to validate oncogenic drivers and identify molecular vulnerabilities. We utilized whole exome sequencing (WES) and transcriptome analysis to identify germline and somatic alterations in the patient's tumor. In silico and in vitro studies were used to determine the functional consequences of genomic alterations. Primary tumor was used to generate a patient-derived xenograft (PDX) model, which was used for in vivo assessment of predicted therapeutic options. WES revealed a novel germline frameshift variant (p.E1554fs) in APC, establishing a diagnosis of Gardner syndrome, along with a somatic nonsense (p.R790*) APC mutation in the tumor. Somatic mutations in TP53, MAX, BRAF, ROS1, and RPTOR were also identified and transcriptome and immunohistochemical analyses suggested hyperactivation of the Wnt/ß-catenin and AKT/mTOR pathways. In silico and biochemical assays demonstrated that the MAX p.R60Q and BRAF p.K483E mutations were activating mutations, whereas the ROS1 and RPTOR mutations were of lower utility for therapeutic targeting. Utilizing a patient-specific PDX model, we demonstrated in vivo activity of mTOR inhibition with temsirolimus and partial response to inhibition of MEK. This clinical case illustrates the depth of investigation necessary to fully characterize the functional significance of the breadth of alterations identified through genomic analysis.
Proteomic analysis of sweet algerian apricot kernels (Prunus armeniaca L.) by combinatorial peptide ligand libraries and LC-MS/MS.

PubMed

Ghorab, Hamida; Lammi, Carmen; Arnoldi, Anna; Kabouche, Zahia; Aiello, Gilda

2018-01-15

An investigation on the proteome of the sweet kernel of apricot, based on equalisation with combinatorial peptide ligand libraries (CPLLs), SDS-PAGE, nLC-ESI-MS/MS, and database search, permitted identifying 175 proteins. Gene ontology analysis indicated that their main molecular functions are in nucleotide binding (20.9%), hydrolase activities (10.6%), kinase activities (7%), and catalytic activity (5.6%). A protein-protein association network analysis using STRING software permitted to build an interactomic map of all detected proteins, characterised by 34 interactions. In order to forecast the potential health benefits deriving from the consumption of these proteins, the two most abundant, i.e. Prunin 1 and 2, were enzymatically digested in silico predicting 10 and 14 peptides, respectively. Searching their sequences in the database BIOPEP, it was possible to suggest a variety of bioactivities, including dipeptidyl peptidase-IV (DPP-IV) and angiotensin converting enzyme I (ACE) inhibition, glucose uptake stimulation and antioxidant properties. Copyright © 2017 Elsevier Ltd. All rights reserved.
Flux analysis and metabolomics for systematic metabolic engineering of microorganisms.

PubMed

Toya, Yoshihiro; Shimizu, Hiroshi

2013-11-01

Rational engineering of metabolism is important for bio-production using microorganisms. Metabolic design based on in silico simulations and experimental validation of the metabolic state in the engineered strain helps in accomplishing systematic metabolic engineering. Flux balance analysis (FBA) is a method for the prediction of metabolic phenotype, and many applications have been developed using FBA to design metabolic networks. Elementary mode analysis (EMA) and ensemble modeling techniques are also useful tools for in silico strain design. The metabolome and flux distribution of the metabolic pathways enable us to evaluate the metabolic state and provide useful clues to improve target productivity. Here, we reviewed several computational applications for metabolic engineering by using genome-scale metabolic models of microorganisms. We also discussed the recent progress made in the field of metabolomics and (13)C-metabolic flux analysis techniques, and reviewed these applications pertaining to bio-production development. Because these in silico or experimental approaches have their respective advantages and disadvantages, the combined usage of these methods is complementary and effective for metabolic engineering. Copyright © 2013 Elsevier Inc. All rights reserved.
Prospecting for pig single nucleotide polymorphisms in the human genome: have we struck gold?

PubMed

Grapes, L; Rudd, S; Fernando, R L; Megy, K; Rocha, D; Rothschild, M F

2006-06-01

Gene-to-gene variation in the frequency of single nucleotide polymorphisms (SNPs) has been observed in humans, mice, rats, primates and pigs, but a relationship across species in this variation has not been described. Here, the frequency of porcine coding SNPs (cSNPs) identified by in silico methods, and the frequency of murine cSNPs, were compared with the frequency of human cSNPs across homologous genes. From 150,000 porcine expressed sequence tag (EST) sequences, a total of 452 SNP-containing sequence clusters were found, totalling 1394 putative SNPs. All the clustered porcine EST annotations and SNP data have been made publicly available at http://sputnik.btk.fi/project?name=swine. Human and murine cSNPs were identified from dbSNP and were characterized as either validated or total number of cSNPs (validated plus non-validated) for comparison purposes. The correlation between in silico pig cSNP and validated human cSNP densities was found to be 0.77 (p < 0.00001) for a set of 25 homologous genes, while a correlation of 0.48 (p < 0.0005) was found for a primarily random sample of 50 homologous human and mouse genes. This is the first evidence of conserved gene-to-gene variability in cSNP frequency across species and indicates that site-directed screening of porcine genes that are homologous to cSNP-rich human genes may rapidly advance cSNP discovery in pigs.
In Silico Analysis of the Structural and Biochemical Features of the NMD Factor UPF1 in Ustilago maydis.

PubMed

Martínez-Montiel, Nancy; Morales-Lara, Laura; Hernández-Pérez, Julio M; Martínez-Contreras, Rebeca D

2016-01-01

The molecular mechanisms regulating the accuracy of gene expression are still not fully understood. Among these mechanisms, Nonsense-mediated Decay (NMD) is a quality control process that detects post-transcriptionally abnormal transcripts and leads them to degradation. The UPF1 protein lays at the heart of NMD as shown by several structural and functional features reported for this factor mainly for Homo sapiens and Saccharomyces cerevisiae. This process is highly conserved in eukaryotes but functional diversity can be observed in various species. Ustilago maydis is a basidiomycete and the best-known smut, which has become a model to study molecular and cellular eukaryotic mechanisms. In this study, we performed in silico analysis to investigate the structural and biochemical properties of the putative UPF1 homolog in Ustilago maydis. The putative homolog for UPF1 was recognized in the annotated genome for the basidiomycete, exhibiting 66% identity with its human counterpart at the protein level. The known structural and functional domains characteristic of UPF1 homologs were also found. Based on the crystal structures available for UPF1, we constructed different three-dimensional models for umUPF1 in order to analyze the secondary and tertiary structural features of this factor. Using these models, we studied the spatial arrangement of umUPF1 and its capability to interact with UPF2. Moreover, we identified the critical amino acids that mediate the interaction of umUPF1 with UPF2, ATP, RNA and with UPF1 itself. Mutating these amino acids in silico showed an important effect over the native structure. Finally, we performed molecular dynamic simulations for UPF1 proteins from H. sapiens and U. maydis and the results obtained show a similar behavior and physicochemical properties for the protein in both organisms. Overall, our results indicate that the putative UPF1 identified in U. maydis shows a very similar sequence, structural organization, mechanical stability, physicochemical properties and spatial organization in comparison to the NMD factor depicted for Homo sapiens. These observations strongly support the notion that human and fungal UPF1 could perform equivalent biological activities.
High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology

PubMed Central

Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M

2007-01-01

Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis. PMID:18021442
IVS-II-648/649 (-T) (HBB: c.316-202del) Triggers a Novel β-Thalassemia Phenotype.

PubMed

Azimi, Azam; Alibakhshi, Reza; Hayati, Hasibeh; Tahmasebi, Soosan; Alimoradi, Sasan

2017-01-01

Thalassemia is the most common inherited disorder in Iran. There are approximately 800 different genomic alterations of the β-globin gene described in the HbVar database. In this study, we identified a novel mutation in a 21-year-old woman [IVS-II-648/649 (-T); HBB: c.316-202del)] and describe its clinical implications. Two other members of this family, all with hematological and clinical features associated with β-thalassemia (β-thal), also carried this mutation. The molecular diagnosis of the β-globin gene mutation was performed by direct sequencing. Based on the observed β-thal phenotype and in silico analysis results, we concluded that this novel β-globin gene mutation was associated with the mild phenotype of β-thal.
URPD: a specific product primer design tool

PubMed Central

2012-01-01

Background Polymerase chain reaction (PCR) plays an important role in molecular biology. Primer design fundamentally determines its results. Here, we present a currently available software that is not located in analyzing large sequence but used for a rather straight-forward way of visualizing the primer design process for infrequent users. Findings URPD (yoUR Primer Design), a web-based specific product primer design tool, combines the NCBI Reference Sequences (RefSeq), UCSC In-Silico PCR, memetic algorithm (MA) and genetic algorithm (GA) primer design methods to obtain specific primer sets. A friendly user interface is accomplished by built-in parameter settings. The incorporated smooth pipeline operations effectively guide both occasional and advanced users. URPD contains an automated process, which produces feasible primer pairs that satisfy the specific needs of the experimental design with practical PCR amplifications. Visual virtual gel electrophoresis and in silico PCR provide a simulated PCR environment. The comparison of Practical gel electrophoresis comparison to virtual gel electrophoresis facilitates and verifies the PCR experiment. Wet-laboratory validation proved that the system provides feasible primers. Conclusions URPD is a user-friendly tool that provides specific primer design results. The pipeline design path makes it easy to operate for beginners. URPD also provides a high throughput primer design function. Moreover, the advanced parameter settings assist sophisticated researchers in performing experiential PCR. Several novel functions, such as a nucleotide accession number template sequence input, local and global specificity estimation, primer pair redesign, user-interactive sequence scale selection, and virtual and practical PCR gel electrophoresis discrepancies have been developed and integrated into URPD. The URPD program is implemented in JAVA and freely available at http://bio.kuas.edu.tw/urpd/. PMID:22713312
Characterization of Chinese Haemophilus parasuis Isolates by Traditional Serotyping and Molecular Serotyping Methods

PubMed Central

Ma, Lina; Wang, Liyan; Chu, Yuefeng; Li, Xuerui; Cui, Yujun; Chen, Shengli; Zhou, Jianhua; Li, Chunling; Lu, Zhongxin; Liu, Jixing; Liu, Yongsheng

2016-01-01

Haemophilus parasuis is classified mainly through serotyping, but traditional serotyping always yields non-typable (NT) strains and unreliable results via cross-reactions. Here, we surveyed the serotype prevalence of Chinese H. parasuis isolates using traditional serotyping (gel immuno-diffusion test, GID) and molecular serotyping (multiplex PCR, mPCR). We also investigated why discrepant results between these methods were obtained, and investigated mPCR failure through whole-genome sequencing. Of the 100 isolate tested, 73 (73%) and 93 (93%) were serotyped by the GID test and mPCR, respectively, with a concordance rate of 66% (66/100). Additionally, mPCR reduced the number of NT isolates from 27 (27%) for the GID testing, to seven (7%). Eleven isolates were sequenced, including nine serotype-discrepant isolates from mPCR and GID typing (excluding strains that were NT by GID only) and two NT isolates from both methods, and their in silico serotypes were obtained from genome sequencing based on their capsule loci. The mPCR results were supported by the in silico serotyping of the seven serotype-discrepant isolates. The discrepant results and NT isolates determined by mPCR were attributed to deletions and unknown sequences in the serotype-specific region of each capsule locus. Compared with previous investigations, this study found a similar predominant serotype profile, but a different prevalence frequency for H. parasuis, and the five most prevalent serotypes or strain groups were serotypes 5, 4, NT, 7 and 13 for mPCR, and serotypes 5, NT, 4, 7 and 13/10/14 for GID. Additionally, serotype 7 was recognized as a principal serotype in this work. PMID:28005999
URPD: a specific product primer design tool.

PubMed

Chuang, Li-Yeh; Cheng, Yu-Huei; Yang, Cheng-Hong

2012-06-19

Polymerase chain reaction (PCR) plays an important role in molecular biology. Primer design fundamentally determines its results. Here, we present a currently available software that is not located in analyzing large sequence but used for a rather straight-forward way of visualizing the primer design process for infrequent users. URPD (yoUR Primer Design), a web-based specific product primer design tool, combines the NCBI Reference Sequences (RefSeq), UCSC In-Silico PCR, memetic algorithm (MA) and genetic algorithm (GA) primer design methods to obtain specific primer sets. A friendly user interface is accomplished by built-in parameter settings. The incorporated smooth pipeline operations effectively guide both occasional and advanced users. URPD contains an automated process, which produces feasible primer pairs that satisfy the specific needs of the experimental design with practical PCR amplifications. Visual virtual gel electrophoresis and in silico PCR provide a simulated PCR environment. The comparison of Practical gel electrophoresis comparison to virtual gel electrophoresis facilitates and verifies the PCR experiment. Wet-laboratory validation proved that the system provides feasible primers. URPD is a user-friendly tool that provides specific primer design results. The pipeline design path makes it easy to operate for beginners. URPD also provides a high throughput primer design function. Moreover, the advanced parameter settings assist sophisticated researchers in performing experiential PCR. Several novel functions, such as a nucleotide accession number template sequence input, local and global specificity estimation, primer pair redesign, user-interactive sequence scale selection, and virtual and practical PCR gel electrophoresis discrepancies have been developed and integrated into URPD. The URPD program is implemented in JAVA and freely available at http://bio.kuas.edu.tw/urpd/.
Challenging a bioinformatic tool’s ability to detect microbial contaminants using in silico whole genome sequencing data

PubMed Central

Zook, Justin M.; Morrow, Jayne B.; Lin, Nancy J.

2017-01-01

High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods. PMID:28924496
Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction*

PubMed Central

Rahman, Kh. Shamsur; Chowdhury, Erfan Ullah; Sachse, Konrad; Kaltenboeck, Bernhard

2016-01-01

X-ray crystallography has shown that an antibody paratope typically binds 15–22 amino acids (aa) of an epitope, of which 2–5 randomly distributed amino acids contribute most of the binding energy. In contrast, researchers typically choose for B-cell epitope mapping short peptide antigens in antibody binding assays. Furthermore, short 6–11-aa epitopes, and in particular non-epitopes, are over-represented in published B-cell epitope datasets that are commonly used for development of B-cell epitope prediction approaches from protein antigen sequences. We hypothesized that such suboptimal length peptides result in weak antibody binding and cause false-negative results. We tested the influence of peptide antigen length on antibody binding by analyzing data on more than 900 peptides used for B-cell epitope mapping of immunodominant proteins of Chlamydia spp. We demonstrate that short 7–12-aa peptides of B-cell epitopes bind antibodies poorly; thus, epitope mapping with short peptide antigens falsely classifies many B-cell epitopes as non-epitopes. We also show in published datasets of confirmed epitopes and non-epitopes a direct correlation between length of peptide antigens and antibody binding. Elimination of short, ≤11-aa epitope/non-epitope sequences improved datasets for evaluation of in silico B-cell epitope prediction. Achieving up to 86% accuracy, protein disorder tendency is the best indicator of B-cell epitope regions for chlamydial and published datasets. For B-cell epitope prediction, the most effective approach is plotting disorder of protein sequences with the IUPred-L scale, followed by antibody reactivity testing of 16–30-aa peptides from peak regions. This strategy overcomes the well known inaccuracy of in silico B-cell epitope prediction from primary protein sequences. PMID:27189949

In Silico Evaluation of the Potential Impact of Bioanalytical Bias Difference between Two Therapeutic Protein Formulations for Pharmacokinetic Assessment in a Biocomparability Study.

PubMed

Thway, Theingi M; Macaraeg, Chris; Eschenberg, Michael; Ma, Mark

2015-05-01

Formulation changes at later stages of biotherapeutics development require biocomparability (BC) assessment. Using simulation, this study aims to determine the potential effect of bias difference observed between the two formulations after spiking into serum in passing or failing of a critical BC study. An ELISA method with 20% total error was used to assess any bias differences between a reference (RF) and test formulations (TF) in serum. During bioanalytical comparison of these formulations, a 9% difference in bias was observed between the two formulations in sera. To determine acceptable level of bias difference between the RF and TF bioanalytically, two in silico simulations were performed. The in silico analysis showed that the likelihood of the study meeting the BC criteria was >90% when the bias difference between RF and TF in serum was 9% and the number of subjects was ≥20 per treatment arm. An additional simulation showed that when the bias difference was increased to 13% and the number of subjects was <40, the likelihood of meeting the BC criteria decreased to 80%. The result from in silico analysis allowed the bioanalytical laboratory to proceed with sample analysis using a single calibrator and quality controls made from the reference formulation. This modeling approach can be applied to other BC studies with similar situations.
Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

PubMed Central

Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

2014-01-01

Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool, with a user-friendly JAVA graphical interface. PMID:24291226
Toxicological evaluation in silico and in vivo of secondary metabolites of Cissampelos sympodialis in Mus musculus mice following inhalation.

PubMed

Alves, Mateus Feitosa; Ferreira, Larissa Adilis Maria Paiva; Gadelha, Francisco Allysson Assis Ferreira; Ferreira, Laércia Karla Diega Paiva; Felix, Mayara Barbalho; Scotti, Marcus Tullius; Scotti, Luciana; de Oliveira, Kardilândia Mendes; Dos Santos, Sócrates Golzio; Diniz, Margareth de Fátima Formiga Melo

2017-12-04

The ethanolic extract of the leaves of Cissampelos sympodialis showed great pharmacological potential, with inflammatory and immunomodulatory activities, however, it showed some toxicological effects. Therefore, this study aims to verify the toxicological potential of alkaloids of the genus Cissampelos through in silico methodologies, to develop a method in LC-MS/MS verifying the presence of alkaloids in the infusion and to evaluate the toxicity of the infusion of the leaves of C. sympodialis when inhaled by Swiss mice. Results in silico showed that alkaloid 93 presented high toxicological potential along with the products of its metabolism. LC-MS/MS results showed that the infusion of the leaves of this plant contained the alkaloids warifteine and methylwarifteine. Finally, the in vivo toxicological analysis of the C. sympodialis infusion showed results, both in biochemistry, organ weights and histological analysis, that the infusion of C. sympodialis leaves presents a low toxicity.
In silico analysis of expressed sequence tags from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with conventional database searches.

PubMed

Nagaraj, Shivashankar H; Gasser, Robin B; Nisbet, Alasdair J; Ranganathan, Shoba

2008-01-01

The analysis of expressed sequence tags (EST) offers a rapid and cost effective approach to elucidate the transcriptome of an organism, but requires several computational methods for assembly and annotation. Researchers frequently analyse each step manually, which is laborious and time consuming. We have recently developed ESTExplorer, a semi-automated computational workflow system, in order to achieve the rapid analysis of EST datasets. In this study, we evaluated EST data analysis for the parasitic nematode Trichostrongylus vitrinus (order Strongylida) using ESTExplorer, compared with database matching alone. We functionally annotated 1776 ESTs obtained via suppressive-subtractive hybridisation from T. vitrinus, an important parasitic trichostrongylid of small ruminants. Cluster and comparative genomic analyses of the transcripts using ESTExplorer indicated that 290 (41%) sequences had homologues in Caenorhabditis elegans, 329 (42%) in parasitic nematodes, 202 (28%) in organisms other than nematodes, and 218 (31%) had no significant match to any sequence in the current databases. Of the C. elegans homologues, 90 were associated with 'non-wildtype' double-stranded RNA interference (RNAi) phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth. We could functionally classify 267 (38%) sequences using the Gene Ontologies (GO) and establish pathway associations for 230 (33%) sequences using the Kyoto Encyclopedia of Genes and Genomes (KEGG). Further examination of this EST dataset revealed a number of signalling molecules, proteases, protease inhibitors, enzymes, ion channels and immune-related genes. In addition, we identified 40 putative secreted proteins that could represent potential candidates for developing novel anthelmintics or vaccines. We further compared the automated EST sequence annotations, using ESTExplorer, with database search results for individual T. vitrinus ESTs. ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches. We evaluated the efficacy of ESTExplorer in analysing EST data, and demonstrate that computational tools can be used to accelerate the process of gene discovery in EST sequencing projects. The present study has elucidated sets of relatively conserved and potentially novel genes for biological investigation, and the annotated EST set provides further insight into the molecular biology of T. vitrinus, towards the identification of novel drug targets.
In silico analysis of the fucosylation-associated genome of the human blood fluke Schistosoma mansoni: cloning and characterization of the fucosyltransferase multigene family.

PubMed

Peterson, Nathan A; Anderson, Tavis K; Yoshino, Timothy P

2013-01-01

Fucosylated glycans of the parasitic flatworm Schistosoma mansoni play key roles in its development and immunobiology. In the present study we used a genome-wide homology-based bioinformatics approach to search for genes that contribute to fucosylated glycan expression in S. mansoni, specifically the α2-, α3-, and α6-fucosyltransferases (FucTs), which transfer L-fucose from a GDP-L-fucose donor to an oligosaccharide acceptor. We identified and in silico characterized several novel schistosome FucT homologs, including six α3-FucTs and six α6-FucTs, as well as two protein O-FucTs that catalyze the unrelated transfer of L-fucose to serine and threonine residues of epidermal growth factor- and thrombospondin-type repeats. No α2-FucTs were observed. Primary sequence analyses identified key conserved FucT motifs as well as characteristic transmembrane domains, consistent with their putative roles as fucosyltransferases. Most genes exhibit alternative splicing, with multiple transcript variants generated. A phylogenetic analysis demonstrated that schistosome α3- and α6-FucTs form monophyletic clades within their respective gene families, suggesting multiple gene duplications following the separation of the schistosome lineage from the main evolutionary tree. Quantitative decreases in steady-state transcript levels of some FucTs during early larval development suggest a possible mechanism for differential expression of fucosylated glycans in schistosomes. This study systematically identifies the complete repertoire of FucT homologs in S. mansoni and provides fundamental information regarding their genomic organization, genetic variation, developmental expression, and evolutionary history.
Exploiting rice-sorghum synteny for targeted development of EST-SSRs to enrich the sorghum genetic linkage map.

PubMed

Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T

2009-11-01

The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.
Automated design of genomic Southern blot probes

PubMed Central

2010-01-01

Background Sothern blotting is a DNA analysis technique that has found widespread application in molecular biology. It has been used for gene discovery and mapping and has diagnostic and forensic applications, including mutation detection in patient samples and DNA fingerprinting in criminal investigations. Southern blotting has been employed as the definitive method for detecting transgene integration, and successful homologous recombination in gene targeting experiments. The technique employs a labeled DNA probe to detect a specific DNA sequence in a complex DNA sample that has been separated by restriction-digest and gel electrophoresis. Critically for the technique to succeed the probe must be unique to the target locus so as not to cross-hybridize to other endogenous DNA within the sample. Investigators routinely employ a manual approach to probe design. A genome browser is used to extract DNA sequence from the locus of interest, which is searched against the target genome using a BLAST-like tool. Ideally a single perfect match is obtained to the target, with little cross-reactivity caused by homologous DNA sequence present in the genome and/or repetitive and low-complexity elements in the candidate probe. This is a labor intensive process often requiring several attempts to find a suitable probe for laboratory testing. Results We have written an informatic pipeline to automatically design genomic Sothern blot probes that specifically attempts to optimize the resultant probe, employing a brute-force strategy of generating many candidate probes of acceptable length in the user-specified design window, searching all against the target genome, then scoring and ranking the candidates by uniqueness and repetitive DNA element content. Using these in silico measures we can automatically design probes that we predict to perform as well, or better, than our previous manual designs, while considerably reducing design time. We went on to experimentally validate a number of these automated designs by Southern blotting. The majority of probes we tested performed well confirming our in silico prediction methodology and the general usefulness of the software for automated genomic Southern probe design. Conclusions Software and supplementary information are freely available at: http://www.genes2cognition.org/software/southern_blot PMID:20113467
Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

PubMed

Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

2011-01-20

Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.
Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

PubMed Central

2011-01-01

Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Verification of Ribosomal Proteins of Aspergillus fumigatus for Use as Biomarkers in MALDI-TOF MS Identification.

PubMed

Nakamura, Sayaka; Sato, Hiroaki; Tanaka, Reiko; Yaguchi, Takashi

2016-01-01

We have previously proposed a rapid identification method for bacterial strains based on the profiles of their ribosomal subunit proteins (RSPs), observed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). This method can perform phylogenetic characterization based on the mass of housekeeping RSP biomarkers, ideally calculated from amino acid sequence information registered in public protein databases. With the aim of extending its field of application to medical mycology, this study investigates the actual state of information of RSPs of eukaryotic fungi registered in public protein databases through the characterization of ribosomal protein fractions extracted from genome-sequenced Aspergillus fumigatus strains Af293 and A1163 as a model. In this process, we have found that the public protein databases harbor problems. The RSP names are in confusion, so we have provisionally unified them using the yeast naming system. The most serious problem is that many incorrect sequences are registered in the public protein databases. Surprisingly, more than half of the sequences are incorrect, due chiefly to mis-annotation of exon/intron structures. These errors could be corrected by a combination of in silico inspection by sequence homology analysis and MALDI-TOF MS measurements. We were also able to confirm conserved post-translational modifications in eleven RSPs. After these verifications, the masses of 31 expressed RSPs under 20,000 Da could be accurately confirmed. These RSPs have a potential to be useful biomarkers for identifying clinical isolates of A. fumigatus .
Identification, characterization and expression analysis of pigeonpea miRNAs in response to Fusarium wilt.

PubMed

Hussain, Khalid; Mungikar, Kanak; Kulkarni, Abhijeet; Kamble, Avinash

2018-05-05

Upon confrontation with unfavourable conditions, plants invoke a very complex set of biochemical and physiological reactions and alter gene expression patterns to combat the situations. MicroRNAs (miRNAs), a class of small non-coding RNA, contribute extensively in regulation of gene expression through translation inhibition or degradation of their target mRNAs during such conditions. Therefore, identification of miRNAs and their targets holds importance in understanding the regulatory networks triggered during stress. Structure and sequence similarity based in silico prediction of miRNAs in Cajanus cajan L. (Pigeonpea) draft genome sequence has been carried out earlier. These annotations also appear in related GenBank genome sequence entries. However, there are no reports available on context dependent miRNA expression and their targets in pigeonpea. Therefore, in the present study we addressed these questions computationally, using pigeonpea EST sequence information. We identified five novel pigeonpea miRNA precursors, their mature forms and targets. Interestingly, only one of these miRNAs (miR169i-3p) was identified earlier in draft genome sequence. We then validated expression of these miRNAs, experimentally. It was also observed that these miRNAs show differential expression patterns in response to Fusarium inoculation indicating their biotic stress responsive nature. Overall these results will help towards better understanding the regulatory network of defense during pigeonpea -pathogen interactions and role of miRNAs in the process. Copyright © 2018 Elsevier B.V. All rights reserved.
Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

PubMed

Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

2013-06-01

Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools. © 2013 John Wiley & Sons Ltd.
Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites.

PubMed

Park, Dongbin; Goh, Chul Jun; Kim, Hyein; Hahn, Yoonsoo

2018-04-01

The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass ( Zostera marina ) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae . They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses.
Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites

PubMed Central

Park, Dongbin; Goh, Chul Jun; Kim, Hyein; Hahn, Yoonsoo

2018-01-01

The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass (Zostera marina) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae. They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses. PMID:29628822
Quadruplexes in 'Dicty': crystal structure of a four-quartet G-quadruplex formed by G-rich motif found in the Dictyostelium discoideum genome.

PubMed

Guédin, Aurore; Lin, Linda Yingqi; Armane, Samir; Lacroix, Laurent; Mergny, Jean-Louis; Thore, Stéphane; Yatsunyk, Liliya A

2018-06-01

Guanine-rich DNA has the potential to fold into non-canonical G-quadruplex (G4) structures. Analysis of the genome of the social amoeba Dictyostelium discoideum indicates a low number of sequences with G4-forming potential (249-1055). Therefore, D. discoideum is a perfect model organism to investigate the relationship between the presence of G4s and their biological functions. As a first step in this investigation, we crystallized the dGGGGGAGGGGTACAGGGGTACAGGGG sequence from the putative promoter region of two divergent genes in D. discoideum. According to the crystal structure, this sequence folds into a four-quartet intramolecular antiparallel G4 with two lateral and one diagonal loops. The G-quadruplex core is further stabilized by a G-C Watson-Crick base pair and a A-T-A triad and displays high thermal stability (Tm > 90°C at 100 mM KCl). Biophysical characterization of the native sequence and loop mutants suggests that the DNA adopts the same structure in solution and in crystalline form, and that loop interactions are important for the G4 stability but not for its folding. Four-tetrad G4 structures are sparse. Thus, our work advances understanding of the structural diversity of G-quadruplexes and yields coordinates for in silico drug screening programs and G4 predictive tools.
Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

PubMed Central

Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

2007-01-01

We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688
Sequence characterization of S100A8 gene reveals structural differences of protein and transcriptional factor binding sites in water buffalo and yak.

PubMed

Kathiravan, P; Goyal, S; Kataria, R S; Mishra, B P; Jayakumar, S; Joshi, B K

2011-01-01

The present study was undertaken to characterize the structure of S100A8 gene and its promoter in water buffalo and yak. Sequence data of 2.067 kb, 2.071 kb, and 2.052 kb with respect to complete S100A8 gene including 5' flanking region was generated in river buffalo, swamp buffalo, and yak, respectively. BLAST analysis of coding DNA sequences (CDS) of S100A8 gene revealed 95% homology of buffalo sequence with cattle, 85% with pig and horse, 83% with dog, 72-73% with murines, and around 79% with primates and humans. Phylogenetic analysis of predicted CDS revealed distinct clustering of murines, primates, and domestic animals with bovines and bubalines forming a subcluster among farm animals. In silico translation of predicted CDS revealed a sequence of 89 amino acids with 7 amino acid changes between cattle and buffalo and 2 changes between cattle and yak. The search for Pfam family revealed the N-terminal calcium binding domain and the noncanonical EF hand domain in the carboxy terminus, with more variations being observed in the N-terminal domain among different species. Two amino acid changes observed in carboxy terminal EF hand domain resulted in altered secondary structure of yak S100A8 protein. Analysis of S100A8 gene promoter revealed 14 putative motifs for transcriptional factor binding sites. Two putative motifs viz. C/EBP and v-Myb were found to be absent in swamp buffalo as compared to river buffalo and cattle. Differences in the structure of S100A8 protein and the transcriptional factor binding sites identified in the present study need to be analyzed further for their functional significance in yak and swamp buffalo respectively. Copyright © Taylor & Francis Group, LLC
Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium

PubMed Central

Yang, Fengxi; Zhu, Genfa

2015-01-01

Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral patterning of Cymbidium and supports a valuable resource for molecular breeding of the orchid plant. PMID:26580566
Genome-Wide Identification and Transferability of Microsatellite Markers between Palmae Species

PubMed Central

Xiao, Yong; Xia, Wei; Ma, Jianwei; Mason, Annaliese S.; Fan, Haikuo; Shi, Peng; Lei, Xintao; Ma, Zilong; Peng, Ming

2016-01-01

The Palmae family contains 202 genera and approximately 2800 species. Except for Elaeis guineensis and Phoenix dactylifera, almost no genetic and genomic information is available for Palmae species. Therefore, this is an obstacle to the conservation and genetic assessment of Palmae species, especially those that are currently endangered. The study was performed to develop a large number of microsatellite markers which can be used for genetic analysis in different Palmae species. Based on the assembled genome of E. guineensis and P. dactylifera, a total of 814 383 and 371 629 microsatellites were identified. Among these microsatellites identified in E. guineensis, 734 509 primer pairs could be designed from the flanking sequences of these microsatellites. The majority (618 762) of these designed primer pairs had in silico products in the genome of E. guineensis. These 618 762 primer pairs were subsequently used to in silico amplify the genome of P. dactylifera. A total of 7 265 conserved microsatellites were identified between E. guineensis and P. dactylifera. One hundred and thirty-five primer pairs flanking the conserved SSRs were stochastically selected and validated to have high cross-genera transferability, varying from 16.7 to 93.3% with an average of 73.7%. These genome-wide conserved microsatellite markers will provide a useful tool for genetic assessment and conservation of different Palmae species in the future. PMID:27826307
Analysis of consequences of non-synonymous SNP in feed conversion ratio associated TGF-β receptor type 3 gene in chicken.

PubMed

Rasal, Kiran D; Shah, Tejas M; Vaidya, Megha; Jakhesara, Subhash J; Joshi, Chaitanya G

2015-06-01

The recent advances in high throughput sequencing technology accelerate possible ways for the study of genome wide variation in several organisms and associated consequences. In the present study, mutations in TGFBR3 showing significant association with FCR trait in chicken during exome sequencing were further analyzed. Out of four SNPs, one nsSNP p.Val451Leu was found in the coding region of TGFBR3. In silico tools such as SnpSift and PANTHER predicted it as deleterious (0.04) and to be tolerated, respectively, while I-Mutant revealed that protein stability decreased. The TGFBR3 I-TASSER model has a C-score of 0.85, which was validated using PROCHECK. Based on MD simulation, mutant protein structure deviated from native with RMSD 0.08 Å due to change in the H-bonding distances of mutant residue. The docking of TGFBR3 with interacting TGFBR2 inferred that mutant required more global energy. Therefore, the present study will provide useful information about functional SNPs that have an impact on FCR traits.

Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes.

PubMed

Janicki, Mateusz; Rooke, Rebecca; Yang, Guojun

2011-08-01

A major portion of most eukaryotic genomes are transposable elements (TEs). During evolution, TEs have introduced profound changes to genome size, structure, and function. As integral parts of genomes, the dynamic presence of TEs will continue to be a major force in reshaping genomes. Early computational analyses of TEs in genome sequences focused on filtering out "junk" sequences to facilitate gene annotation. When the high abundance and diversity of TEs in eukaryotic genomes were recognized, these early efforts transformed into the systematic genome-wide categorization and classification of TEs. The availability of genomic sequence data reversed the classical genetic approaches to discovering new TE families and superfamilies. Curated TE databases and their accurate annotation of genome sequences in turn facilitated the studies on TEs in a number of frontiers including: (1) TE-mediated changes of genome size and structure, (2) the influence of TEs on genome and gene functions, (3) TE regulation by host, (4) the evolution of TEs and their population dynamics, and (5) genomic scale studies of TE activity. Bioinformatics and genomic approaches have become an integral part of large-scale studies on TEs to extract information with pure in silico analyses or to assist wet lab experimental studies. The current revolution in genome sequencing technology facilitates further progress in the existing frontiers of research and emergence of new initiatives. The rapid generation of large-sequence datasets at record low costs on a routine basis is challenging the computing industry on storage capacity and manipulation speed and the bioinformatics community for improvement in algorithms and their implementations.
Integrating in silico prediction methods, molecular docking, and molecular dynamics simulation to predict the impact of ALK missense mutations in structural perspective.

PubMed

Doss, C George Priya; Chakraborty, Chiranjib; Chen, Luonan; Zhu, Hailong

2014-01-01

Over the past decade, advancements in next generation sequencing technology have placed personalized genomic medicine upon horizon. Understanding the likelihood of disease causing mutations in complex diseases as pathogenic or neutral remains as a major task and even impossible in the structural context because of its time consuming and expensive experiments. Among the various diseases causing mutations, single nucleotide polymorphisms (SNPs) play a vital role in defining individual's susceptibility to disease and drug response. Understanding the genotype-phenotype relationship through SNPs is the first and most important step in drug research and development. Detailed understanding of the effect of SNPs on patient drug response is a key factor in the establishment of personalized medicine. In this paper, we represent a computational pipeline in anaplastic lymphoma kinase (ALK) for SNP-centred study by the application of in silico prediction methods, molecular docking, and molecular dynamics simulation approaches. Combination of computational methods provides a way in understanding the impact of deleterious mutations in altering the protein drug targets and eventually leading to variable patient's drug response. We hope this rapid and cost effective pipeline will also serve as a bridge to connect the clinicians and in silico resources in tailoring treatments to the patients' specific genotype.
In vitro and in silico studies of 3-hydroxy-3-methyl-glutaryl coenzyme A reductase inhibitory activity of the cowpea Gln-Asp-Phe peptide.

PubMed

Silva, Mariana Barros de Cerqueira E; Souza, Caio Alexandre da Cruz; Philadelpho, Biane Oliveira; Cunha, Mariana Mota Novais da; Batista, Fabiana Pacheco Reis; Silva, Jaff Ribeiro da; Druzian, Janice Izabel; Castilho, Marcelo Santos; Cilli, Eduardo Maffud; Ferreira, Ederlan S

2018-09-01

Previous studies have shown that cowpea protein positively interferes with cholesterol metabolism. In this study, we evaluated the ability of the fraction containing peptides of <3 kDa, as well as that of the Gln-Asp-Phe (QDF) peptide, derived from cowpea β-vignin protein, to inhibit HMG-CoA reductase activity. We established isolation and chromatography procedures to effectively obtain the protein with a purity above 95%. In silico predictions were performed to identify peptide sequences capable of interacting with HMG-CoA reductase. In vitro experiments showed that the fraction containing peptides of <3 kDa displayed inhibition of HMG-CoA reductase activity. The tripeptide QDF inhibits HMG-CoA reductase (IC 50  = 12.8 μM) in a dose-dependent manner. Furthermore, in silico studies revealed the binding profile of the QDF peptide and hinted at the molecular interactions that are responsible for its activity. Therefore, this study shows, for the first time, a peptide from cowpea β-vignin protein that inhibits HMG-CoA reductase and the chemical modifications that should be investigated to evaluate its binding profile. Copyright © 2018 Elsevier Ltd. All rights reserved.
In silico design of context-responsive mammalian promoters with user-defined functionality

PubMed Central

Gibson, Suzanne J.; Hatton, Diane

2017-01-01

Abstract Comprehensive de novo-design of complex mammalian promoters is restricted by unpredictable combinatorial interactions between constituent transcription factor regulatory elements (TFREs). In this study, we show that modular binding sites that do not function cooperatively can be identified by analyzing host cell transcription factor expression profiles, and subsequently testing cognate TFRE activities in varying homotypic and heterotypic promoter architectures. TFREs that displayed position-insensitive, additive function within a specific expression context could be rationally combined together in silico to create promoters with highly predictable activities. As TFRE order and spacing did not affect the performance of these TFRE-combinations, compositions could be specifically arranged to preclude the formation of undesirable sequence features. This facilitated simple in silico-design of promoters with context-required, user-defined functionalities. To demonstrate this, we de novo-created promoters for biopharmaceutical production in CHO cells that exhibited precisely designed activity dynamics and long-term expression-stability, without causing observable retroactive effects on cellular performance. The design process described can be utilized for applications requiring context-responsive, customizable promoter function, particularly where co-expression of synthetic TFs is not suitable. Although the synthetic promoter structure utilized does not closely resemble native mammalian architectures, our findings also provide additional support for a flexible billboard model of promoter regulation. PMID:28977454
Genotypic analysis of Xylella fastidiosa isolates from different hosts using sequences homologous to the Xanthomonas rpf genes.

PubMed

Meinhardt, Lyndel W; Ribeiro, Milena P M A; Coletta-Filho, Helvécio D; Dumenyo, C Korsi; Tsai, Sui M; De M Bellato, Cláudia

2003-09-01

SUMMARY This is the first report of a genotypic analysis of the phytopathogenic bacteria Xylella fastidiosa (Xf) using differences within intra- and intergenic regions of pathogenic genes. Orthologous sequences from the genome of Xf were identified for genes involved in the regulation of pathogenicity factors (rpf) from Xanthomonas campestris pv. campestris (Xcc). While the rpf genes were conserved, the chromosomal region revealed differences in gene sizes and intergenic spacings and a major translocational event when compared to Xcc. Primers were designed to amplify three regions: the intragenic region of rpfA (2354 bp), the intergenic region between rpfA and rpfB (5772 bp), and the intergenic region between rpfC and rpfF (2314 bp). Amplicons were obtained for all three regions from 32 of the 33 Xf isolates tested from citrus, grape, coffee, plum, hibiscus and periwinkle. Three Xcc isolates from cruciferous plants only generated PCR products for the rpfC-F region. Cleaved amplified polymorphic sequences (CAPS) (Taq(alpha)I) revealed differential banding profiles for the rpfA-B and rpfC-F regions. Xylella isolates were separated into seven groups via rpfA-B, of which five contained only citrus, while the other two had citrus, grape and coffee, and citrus, coffee, plum and hibiscus isolates. rpfC-F separated the isolates into three host-related groups. Citrus, coffee and hibiscus isolates formed one group, while the other two groups were comprised solely of grape and plum isolates. Xcc isolates formed an out-group. In silico analysis supports these results, which reveal the potential of the rpf genes for genotypic analysis of Xylella fastidiosa.
In silico mining and PCR-based approaches to transcription factor discovery in non-model plants: gene discovery of the WRKY transcription factors in conifers.

PubMed

Liu, Jun-Jun; Xiang, Yu

2011-01-01

WRKY transcription factors are key regulators of numerous biological processes in plant growth and development, as well as plant responses to abiotic and biotic stresses. Research on biological functions of plant WRKY genes has focused in the past on model plant species or species with largely characterized transcriptomes. However, a variety of non-model plants, such as forest conifers, are essential as feed, biofuel, and wood or for sustainable ecosystems. Identification of WRKY genes in these non-model plants is equally important for understanding the evolutionary and function-adaptive processes of this transcription factor family. Because of limited genomic information, the rarity of regulatory gene mRNAs in transcriptomes, and the sequence divergence to model organism genes, identification of transcription factors in non-model plants using methods similar to those generally used for model plants is difficult. This chapter describes a gene family discovery strategy for identification of WRKY transcription factors in conifers by a combination of in silico-based prediction and PCR-based experimental approaches. Compared to traditional cDNA library screening or EST sequencing at transcriptome scales, this integrated gene discovery strategy provides fast, simple, reliable, and specific methods to unveil the WRKY gene family at both genome and transcriptome levels in non-model plants.
Impact of novel miR-145-3p regulatory networks on survival in patients with castration-resistant prostate cancer.

PubMed

Goto, Yusuke; Kurozumi, Akira; Arai, Takayuki; Nohata, Nijiro; Kojima, Satoko; Okato, Atsushi; Kato, Mayuko; Yamazaki, Kazuto; Ishida, Yasuo; Naya, Yukio; Ichikawa, Tomohiko; Seki, Naohiko

2017-07-25

Despite recent advancements, metastatic castration-resistant prostate cancer (CRPC) is not considered curative. Novel approaches for identification of therapeutic targets of CRPC are needed. Next-generation sequencing revealed 945-1248 miRNAs from each lethal mCRPC sample. We constructed miRNA expression signatures of CRPC by comparing the expression of miRNAs between CRPC and normal prostate tissue or hormone-sensitive prostate cancer (HSPC). Genome-wide gene expression studies and in silico analyses were carried out to predict miRNA regulation and investigate the functional significance and clinical utility of the novel oncogenic pathways regulated by these miRNAs in prostate cancer (PCa). Based on the novel miRNA expression signature of CRPC, miR-145-5p and miR-145-3p were downregulated in CRPC. By focusing on miR-145-3p, which is a passenger strand and has not been well studied in previous reports, we showed that miR-145-3p targeted 4 key molecules, i.e., MELK, NCAPG, BUB1, and CDK1, in CPRC. These 4 genes significantly predicted survival in patients with PCa. Small RNA sequencing for lethal CRPC and in silico analyses provided novel therapeutic targets for CRPC.
Diversity and Antibiotic Susceptibility of Acinetobacter Strains From Milk Powder Produced in Germany

PubMed Central

Cho, Gyu-Sung; Li, Bo; Rostalsky, André; Fiedler, Gregor; Rösch, Niels; Igbinosa, Etinosa; Kabisch, Jan; Bockelmann, Wilhelm; Hammer, Philipp; Huys, Geert; Franz, Charles M. A. P.

2018-01-01

Forty-seven Acinetobacter spp. isolates from milk powder obtained from a powdered milk producer in Germany were investigated for their antibiotic resistance susceptibilities, in order to assess whether strains from food harbor multiple antibiotic resistances and whether the food route is important for dissemination of resistance genes. The strains were identified by 16S rRNA and rpoB gene sequencing, as well as by whole genome sequencing of selected isolates and their in silico DNA-DNA hybridization (DDH). Furthermore, they were genotyped by rep-PCR together with reference strains of pan-European groups I, II, and III strains of Acinetobacter baumannii. Of the 47 strains, 42 were identified as A. baumannii, 4 as Acinetobacter Pittii, and 1 as Acinetobacter calcoaceticus based on 16S rRNA gene sequencing. In silico DDH with the genome sequence data of selected strains and rpoB gene sequencing data suggested that the five non-A. baumannii strains all belonged to A. pittii, suggesting that the rpoB gene is more reliable than the 16S rRNA gene for species level identification in this genus. Rep-PCR genotyping of the A. baumannii strains showed that these could be grouped into four groups, and that some strains clustered together with reference strains of pan-European clinical group II and III strains. All strains in this study were intrinsically resistant toward chloramphenicol and oxacillin, but susceptible toward tetracycline, tobramycin, erythromycin, and ciprofloxacin. For cefotaxime, 43 strains (91.5%) were intermediate and 3 strains (6.4%) resistant, while 3 (6.4%) and 21 (44.7%) strains exhibited resistance to cefepime and streptomycin, respectively. Forty-six (97.9%) strains were susceptible to amikacin and ampicillin-sulbactam. Therefore, the strains in this study were generally not resistant to the clinically relevant antibiotics, especially tobramycin, ciprofloxacin, cefepime, and meropenem, suggesting that the food route probably poses only a low risk for multidrug resistant Acinetobacter strains or resistance genes. PMID:29636733
Comparative genomics of grass EST libraries reveals previously uncharacterized splicing events in crop plants.

PubMed

Chuang, Trees-Juen; Yang, Min-Yu; Lin, Chuang-Chieh; Hsieh, Ping-Hung; Hung, Li-Yuan

2015-02-05

Crop plants such as rice, maize and sorghum play economically-important roles as main sources of food, fuel, and animal feed. However, current genome annotations of crop plants still suffer false-positive predictions; a more comprehensive registry of alternative splicing (AS) events is also in demand. Comparative genomics of crop plants is largely unexplored. We performed a large-scale comparative analysis (ExonFinder) of the expressed sequence tag (EST) library from nine grass plants against three crop genomes (rice, maize, and sorghum) and identified 2,879 previously-unannotated exons (i.e., novel exons) in the three crops. We validated 81% of the tested exons by RT-PCR-sequencing, supporting the effectiveness of our in silico strategy. Evolutionary analysis reveals that the novel exons, comparing with their flanking annotated ones, are generally under weaker selection pressure at the protein level, but under stronger pressure at the RNA level, suggesting that most of the novel exons also represent novel alternatively spliced variants (ASVs). However, we also observed the consistency of evolutionary rates between certain novel exons and their flanking exons, which provided further evidence of their co-occurrence in the transcripts, suggesting that previously-annotated isoforms might be subject to erroneous predictions. Our validation showed that 54% of the tested genes expressed the newly-identified isoforms that contained the novel exons, rather than the previously-annotated isoforms that excluded them. The consistent results were steadily observed across cultivated (Oryza sativa and O. glaberrima) and wild (O. rufipogon and O. nivara) rice species, asserting the necessity of our curation of the crop genome annotations. Our comparative analyses also inferred the common ancestral transcriptome of grass plants and gain- and loss-of-ASV events. We have reannotated the rice, maize, and sorghum genomes, and showed that evolutionary rates might serve as an indicator for determining whether the identified exons were alternatively spliced. This study not only presents an effective in silico strategy for the improvement of plant annotations, but also provides further insights into the role of AS events in the evolution and domestication of crop plants. ExonFinder and the novel exons/ASVs identified are publicly accessible at http://exonfinder.sourceforge.net/ .
Systems-level effects of ectopic galectin-7 reconstitution in cervical cancer and its microenvironment.

PubMed

Higareda-Almaraz, Juan Carlos; Ruiz-Moreno, Juan S; Klimentova, Jana; Barbieri, Daniela; Salvador-Gallego, Raquel; Ly, Regina; Valtierra-Gutierrez, Ilse A; Dinsart, Christiane; Rabinovich, Gabriel A; Stulik, Jiri; Rösl, Frank; Rincon-Orozco, Bladimiro

2016-08-24

Galectin-7 (Gal-7) is negatively regulated in cervical cancer, and appears to be a link between the apoptotic response triggered by cancer and the anti-tumoral activity of the immune system. Our understanding of how cervical cancer cells and their molecular networks adapt in response to the expression of Gal-7 remains limited. Meta-analysis of Gal-7 expression was conducted in three cervical cancer cohort studies and TCGA. In silico prediction and bisulfite sequencing were performed to inquire epigenetic alterations. To study the effect of Gal-7 on cervical cancer, we ectopically re-expressed it in the HeLa and SiHa cervical cancer cell lines, and analyzed their transcriptome and SILAC-based proteome. We also examined the tumor and microenvironment host cell transcriptomes after xenotransplantation into immunocompromised mice. Differences between samples were assessed with the Kruskall-Wallis, Dunn's Multiple Comparison and T tests. Kaplan-Meier and log-rank tests were used to determine overall survival. Gal-7 was constantly downregulated in our meta-analysis (p < 0.0001). Tumors with combined high Gal-7 and low galectin-1 expression (p = 0.0001) presented significantly better prognoses (p = 0.005). In silico and bisulfite sequencing assays showed de novo methylation in the Gal-7 promoter and first intron. Cells re-expressing Gal-7 showed a high apoptosis ratio (p < 0.05) and their xenografts displayed strong growth retardation (p < 0.001). Multiple gene modules and transcriptional regulators were modulated in response to Gal-7 reconstitution, both in cervical cancer cells and their microenvironments (FDR < 0.05 %). Most of these genes and modules were associated with tissue morphogenesis, metabolism, transport, chemokine activity, and immune response. These functional modules could exert the same effects in vitro and in vivo, even despite different compositions between HeLa and SiHa samples. Gal-7 re-expression affects the regulation of molecular networks in cervical cancer that are involved in diverse cancer hallmarks, such as metabolism, growth control, invasion and evasion of apoptosis. The effect of Gal-7 extends to the microenvironment, where networks involved in its configuration and in immune surveillance are particularly affected.
Transcriptome-wide identification of Rauvolfia serpentina microRNAs and prediction of their potential targets.

PubMed

Prakash, Pravin; Rajakani, Raja; Gupta, Vikrant

2016-04-01

MicroRNAs (miRNAs) are small non-coding RNAs of ∼ 19-24 nucleotides (nt) in length and considered as potent regulators of gene expression at transcriptional and post-transcriptional levels. Here we report the identification and characterization of 15 conserved miRNAs belonging to 13 families from Rauvolfia serpentina through in silico analysis of available nucleotide dataset. The identified mature R. serpentina miRNAs (rse-miRNAs) ranged between 20 and 22nt in length, and the average minimal folding free energy index (MFEI) value of rse-miRNA precursor sequences was found to be -0.815 kcal/mol. Using the identified rse-miRNAs as query, their potential targets were predicted in R. serpentina and other plant species. Gene Ontology (GO) annotation showed that predicted targets of rse-miRNAs include transcription factors as well as genes involved in diverse biological processes such as primary and secondary metabolism, stress response, disease resistance, growth, and development. Few rse-miRNAs were predicted to target genes of pharmaceutically important secondary metabolic pathways such as alkaloids and anthocyanin biosynthesis. Phylogenetic analysis showed the evolutionary relationship of rse-miRNAs and their precursor sequences to homologous pre-miRNA sequences from other plant species. The findings under present study besides giving first hand information about R. serpentina miRNAs and their targets, also contributes towards the better understanding of miRNA-mediated gene regulatory processes in plants. Copyright © 2015 Elsevier Ltd. All rights reserved.
Analysis of large 16S rRNA Illumina data sets: Impact of singleton read filtering on microbial community description.

PubMed

Auer, Lucas; Mariadassou, Mahendra; O'Donohue, Michael; Klopp, Christophe; Hernandez-Raquet, Guillermina

2017-11-01

Next-generation sequencing technologies give access to large sets of data, which are extremely useful in the study of microbial diversity based on 16S rRNA gene. However, the production of such large data sets is not only marred by technical biases and sequencing noise but also increases computation time and disc space use. To improve the accuracy of OTU predictions and overcome both computations, storage and noise issues, recent studies and tools suggested removing all single reads and low abundant OTUs, considering them as noise. Although the effect of applying an OTU abundance threshold on α- and β-diversity has been well documented, the consequences of removing single reads have been poorly studied. Here, we test the effect of singleton read filtering (SRF) on microbial community composition using in silico simulated data sets as well as sequencing data from synthetic and real communities displaying different levels of diversity and abundance profiles. Scalability to large data sets is also assessed using a complete MiSeq run. We show that SRF drastically reduces the chimera content and computational time, enabling the analysis of a complete MiSeq run in just a few minutes. Moreover, SRF accurately determines the actual community diversity: the differences in α- and β-community diversity obtained with SRF and standard procedures are much smaller than the intrinsic variability of technical and biological replicates. © 2017 John Wiley & Sons Ltd.
High-resolution melting genotyping of Enterococcus faecium based on multilocus sequence typing derived single nucleotide polymorphisms.

PubMed

Tong, Steven Y C; Xie, Shirley; Richardson, Leisha J; Ballard, Susan A; Dakh, Farshid; Grabsch, Elizabeth A; Grayson, M Lindsay; Howden, Benjamin P; Johnson, Paul D R; Giffard, Philip M

2011-01-01

We have developed a single nucleotide polymorphism (SNP) nucleated high-resolution melting (HRM) technique to genotype Enterococcus faecium. Eight SNPs were derived from the E. faecium multilocus sequence typing (MLST) database and amplified fragments containing these SNPs were interrogated by HRM. We tested the HRM genotyping scheme on 85 E. faecium bloodstream isolates and compared the results with MLST, pulsed-field gel electrophoresis (PFGE) and an allele specific real-time PCR (AS kinetic PCR) SNP typing method. In silico analysis based on predicted HRM curves according to the G+C content of each fragment for all 567 sequence types (STs) in the MLST database together with empiric data from the 85 isolates demonstrated that HRM analysis resolves E. faecium into 231 "melting types" (MelTs) and provides a Simpson's Index of Diversity (D) of 0.991 with respect to MLST. This is a significant improvement on the AS kinetic PCR SNP typing scheme that resolves 61 SNP types with D of 0.95. The MelTs were concordant with the known ST of the isolates. For the 85 isolates, there were 13 PFGE patterns, 17 STs, 14 MelTs and eight SNP types. There was excellent concordance between PFGE, MLST and MelTs with Adjusted Rand Indices of PFGE to MelT 0.936 and ST to MelT 0.973. In conclusion, this HRM based method appears rapid and reproducible. The results are concordant with MLST and the MLST based population structure.
MitoRes: a resource of nuclear-encoded mitochondrial genes and their products in Metazoa.

PubMed

Catalano, Domenico; Licciulli, Flavio; Turi, Antonio; Grillo, Giorgio; Saccone, Cecilia; D'Elia, Domenica

2006-01-24

Mitochondria are sub-cellular organelles that have a central role in energy production and in other metabolic pathways of all eukaryotic respiring cells. In the last few years, with more and more genomes being sequenced, a huge amount of data has been generated providing an unprecedented opportunity to use the comparative analysis approach in studies of evolution and functional genomics with the aim of shedding light on molecular mechanisms regulating mitochondrial biogenesis and metabolism. In this context, the problem of the optimal extraction of representative datasets of genomic and proteomic data assumes a crucial importance. Specialised resources for nuclear-encoded mitochondria-related proteins already exist; however, no mitochondrial database is currently available with the same features of MitoRes, which is an update of the MitoNuc database extensively modified in its structure, data sources and graphical interface. It contains data on nuclear-encoded mitochondria-related products for any metazoan species for which this type of data is available and also provides comprehensive sequence datasets (gene, transcript and protein) as well as useful tools for their extraction and export. MitoRes http://www2.ba.itb.cnr.it/MitoRes/ consolidates information from publicly external sources and automatically annotates them into a relational database. Additionally, it also clusters proteins on the basis of their sequence similarity and interconnects them with genomic data. The search engine and sequence management tools allow the query/retrieval of the database content and the extraction and export of sequences (gene, transcript, protein) and related sub-sequences (intron, exon, UTR, CDS, signal peptide and gene flanking regions) ready to be used for in silico analysis. The tool we describe here has been developed to support lab scientists and bioinformaticians alike in the characterization of molecular features and evolution of mitochondrial targeting sequences. The way it provides for the retrieval and extraction of sequences allows the user to overcome the obstacles encountered in the integrative use of different bioinformatic resources and the completeness of the sequence collection allows intra- and interspecies comparison at different biological levels (gene, transcript and protein).
Acinetobacter lactucae sp. nov., isolated from iceberg lettuce (Asteraceae: Lactuca sativa).

PubMed

Rooney, Alejandro P; Dunlap, Christopher A; Flor-Weiler, Lina B

2016-09-01

Strain NRRL B-41902T and three closely related strains were isolated from iceberg lettuce. The strain was found to consist of strictly aerobic, Gram-stain-negative rods that formed cocci in late stationary phase. 16S rRNA gene sequence analysis showed that strain NRRL B-41902T was most closely related to species within the genera Acinetobacter, and that a grouping of it and the three other closely related strains was most closely related to the type strain of Acinetobacter pittii, which was also confirmed through a phylogenomic analysis. Moreover, in silico DNA-DNA hybridization analysis revealed a substantial amount of genomic divergence (39.1 %) between strain NRRL B-41902T and the type strain of A. pittii, which is expected if the strains represent distinct species. Further phenotypic analysis revealed that strain NRRL B-41902T was able to utilize a combination of l-serine, citraconic acid and citramalic acid, which differentiated it from other, closely related Acinetobacter species. Therefore, strain NRRL B-41902T (=CCUG 68785T) is proposed as the type strain of a novel species, Acinetobacter lactucae sp. nov.
Safety assessment and functional properties of four enterococci strains isolated from regional Argentinean cheese.

PubMed

Martino, Gabriela P; Espariz, Martín; Gallina Nizo, Gabriel; Esteban, Luis; Blancato, Víctor S; Magni, Christian

2018-07-20

The members of the Enterococcus genus are widely distributed in nature. Its strains have been extensively reported to be present in plant surfaces, soil, water and food. In an attempt to assess their potential application in food industry, four Enterococcus faecium group-strains recently isolated from Argentinean regional cheese products were evaluated using a combination of whole genome analyses and in vivo assays. In order to identify these microorganisms at species level, in silico analyses using their newly reported sequences were conducted. The average nucleotide identity (ANI), in silico DNA-DNA hybridization, and phylogenomic trees constructed using core genome data allowed IQ110, GM70 and GM75 strains to be classified as E. faecium while IQ23 strain was identified as E. durans. Besides their common origin, the strains showed differences in their genetic structure and mobile genetic element content. Furthermore, it was possible to determine the absence or presence of specific features related to growth in milk, cheese ripening, probiotic capability and gut adaptation including sugar, amino acid, and peptides utilization, flavor compound production, bile salt tolerance as well as biogenic amine production. Remarkably, all strains encoded for peptide permeases, maltose utilization, bile salt tolerance, diacetyl and tyramine production genes. On the other hand, some variability was observed regarding citrate and lactose utilization, esterase, and cell wall-associated proteinase. In addition, while strains were predicted to be non-human pathogens by the in silico inspection of pathogenicity and virulence factors, only the GM70 strain proved to be non-virulent in Galleria mellonella model. In conclusion, we propose that, in order to improve the rational selection of strains for industrial applications, a holistic approach involving a comparative genomic analysis of positive and negative features as well as in vivo evaluation of virulence behavior should be performed. Copyright © 2018 Elsevier B.V. All rights reserved.
Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine.

PubMed

Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

2002-06-01

We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5' of Xist that was recently shown to attract histone modification early after the onset of X inactivation.
Comparative genome analysis of Alkhumra hemorrhagic fever virus with Kyasanur forest disease and tick-borne encephalitis viruses by the in silico approach.

PubMed

Palanisamy, Navaneethan; Akaberi, Dario; Lennerstrand, Johan; Lundkvist, Åke

2018-05-10

Alkhumra hemorrhagic fever virus (AHFV), a relatively new member of the Flaviviruses, was discovered in Saudi Arabia 23 years ago. AHFV is classified in the tick-borne encephalitis virus serocomplex, along with the Kyasanur forest disease virus (KFDV) and tick-borne encephalitis virus (TBEV). Currently, very little is known about the pathologies of AHFV. In this study, using the available genome information of AHFV, KFDV and TBEV, we have predicted and compared the following aspects of these viruses: evolution, nucleotide and protein compositions, recombination, codon frequency, substitution rate, N- and O-glycosylation sites, signal peptide and cleavage site, transmembrane region, secondary structure of 5' and 3' UTRs and RNA-RNA interactions. Additionally, we have modeled the 3D protease and RNA-dependent RNA polymerase structures for AHFV, KFDV and TBEV. Recombination analysis showed no evidence of recombination in the AHFV genome with that of either KFDV or TBEV, although single break point analysis showed that nucleotide position 7399 (in the NS4B) is a breakpoint location. AHFV, KFDV and TBEV are very similar in terms of codon frequency, the number of transmembrane regions, properties of the polyprotein, RNA-RNA interaction sequences, NS3 protease and NS5 polymerase structures and 5' UTR structure. Using genome sequences, we showed the similarities between these closely- related viruses on several different areas.
The novel ER membrane protein PRO41 is essential for sexual development in the filamentous fungus Sordaria macrospora.

PubMed

Nowrousian, Minou; Frank, Sandra; Koers, Sandra; Strauch, Peter; Weitner, Thomas; Ringelberg, Carol; Dunlap, Jay C; Loros, Jennifer J; Kück, Ulrich

2007-05-01

The filamentous fungus Sordaria macrospora develops complex fruiting bodies (perithecia) to propagate its sexual spores. Here, we present an analysis of the sterile mutant pro41 that is unable to produce mature fruiting bodies. The mutant carries a deletion of 4 kb and is complemented by the pro41 open reading frame that is contained within the region deleted in the mutant. In silico analyses predict PRO41 to be an endoplasmic reticulum (ER) membrane protein, and a PRO41-EGFP fusion protein colocalizes with ER-targeted DsRED. Furthermore, Western blot analysis shows that the PRO41-EGFP fusion protein is present in the membrane fraction. A fusion of the predicted N-terminal signal sequence of PRO41 with EGFP is secreted out of the cell, indicating that the signal sequence is functional. pro41 transcript levels are upregulated during sexual development. This increase in transcript levels was not observed in the sterile mutant pro1 that lacks a transcription factor gene. Moreover, microarray analysis of gene expression in the mutants pro1, pro41 and the pro1/41 double mutant showed that pro41 is partly epistatic to pro1. Taken together, these data show that PRO41 is a novel ER membrane protein essential for fruiting body formation in filamentous fungi.
The novel ER membrane protein PRO41 is essential for sexual development in the filamentous fungus Sordaria macrospora

PubMed Central

Nowrousian, Minou; Frank, Sandra; Koers, Sandra; Strauch, Peter; Weitner, Thomas; Ringelberg, Carol; Dunlap, Jay C.; Loros, Jennifer J.; Kück, Ulrich

2013-01-01

Summary The filamentous fungus Sordaria macrospora develops complex fruiting bodies (perithecia) to propagate its sexual spores. Here, we present an analysis of the sterile mutant pro41 that is unable to produce mature fruiting bodies. The mutant carries a deletion of 4 kb and is complemented by the pro41 open reading frame that is contained within the region deleted in the mutant. In silico analyses predict PRO41 to be an endoplasmic reticulum (ER) membrane protein, and a PRO41–EGFP fusion protein colocalizes with ER-targeted DsRED. Furthermore, Western blot analysis shows that the PRO41–EGFP fusion protein is present in the membrane fraction. A fusion of the predicted N-terminal signal sequence of PRO41 with EGFP is secreted out of the cell, indicating that the signal sequence is functional. pro41 transcript levels are upregulated during sexual development. This increase in transcript levels was not observed in the sterile mutant pro1 that lacks a transcription factor gene. Moreover, microarray analysis of gene expression in the mutants pro1, pro41 and the pro1/41 double mutant showed that pro41 is partly epistatic to pro1. Taken together, these data show that PRO41 is a novel ER membrane protein essential for fruiting body formation in filamentous fungi. PMID:17501918

Conserved antigenic sites between MERS-CoV and Bat-coronavirus are revealed through sequence analysis.

PubMed

Sharmin, Refat; Islam, Abul B M M K

2016-01-01

MERS-CoV is a newly emerged human coronavirus reported closely related with HKU4 and HKU5 Bat coronaviruses. Bat and MERS corona-viruses are structurally related. Therefore, it is of interest to estimate the degree of conserved antigenic sites among them. It is of importance to elucidate the shared antigenic-sites and extent of conservation between them to understand the evolutionary dynamics of MERS-CoV. Multiple sequence alignment of the spike (S), membrane (M), enveloped (E) and nucleocapsid (N) proteins was employed to identify the sequence conservation among MERS and Bat (HKU4, HKU5) coronaviruses. We used various in silico tools to predict the conserved antigenic sites. We found that MERS-CoV shared 30 % of its S protein antigenic sites with HKU4 and 70 % with HKU5 bat-CoV. Whereas 100 % of its E, M and N protein's antigenic sites are found to be conserved with those in HKU4 and HKU5. This sharing suggests that in case of pathogenicity MERS-CoV is more closely related to HKU5 bat-CoV than HKU4 bat-CoV. The conserved epitopes indicates their evolutionary relationship and ancestry of pathogenicity.
Aspergillus Section Fumigati Typing by PCR-Restriction Fragment Polymorphism▿

PubMed Central

Staab, Janet F.; Balajee, S. Arunmozhi; Marr, Kieren A.

2009-01-01

Recent studies have shown that there are multiple clinically important members of the Aspergillus section Fumigati that are difficult to distinguish on the basis of morphological features (e.g., Aspergillus fumigatus, A. lentulus, and Neosartorya udagawae). Identification of these organisms may be clinically important, as some species vary in their susceptibilities to antifungal agents. In a prior study, we utilized multilocus sequence typing to describe A. lentulus as a species distinct from A. fumigatus. The sequence data show that the gene encoding β-tubulin, benA, has high interspecies variability at intronic regions but is conserved among isolates of the same species. These data were used to develop a PCR-restriction fragment length polymorphism (PCR-RFLP) method that rapidly and accurately distinguishes A. fumigatus, A. lentulus, and N. udagawae, three major species within the section Fumigati that have previously been implicated in disease. Digestion of the benA amplicon with BccI generated unique banding patterns; the results were validated by screening a collection of clinical strains and by in silico analysis of the benA sequences of Aspergillus spp. deposited in the GenBank database. PCR-RFLP of benA is a simple method for the identification of clinically important, similar morphotypes of Aspergillus spp. within the section Fumigati. PMID:19403766
Aspergillus section Fumigati typing by PCR-restriction fragment polymorphism.

PubMed

Staab, Janet F; Balajee, S Arunmozhi; Marr, Kieren A

2009-07-01

Recent studies have shown that there are multiple clinically important members of the Aspergillus section Fumigati that are difficult to distinguish on the basis of morphological features (e.g., Aspergillus fumigatus, A. lentulus, and Neosartorya udagawae). Identification of these organisms may be clinically important, as some species vary in their susceptibilities to antifungal agents. In a prior study, we utilized multilocus sequence typing to describe A. lentulus as a species distinct from A. fumigatus. The sequence data show that the gene encoding beta-tubulin, benA, has high interspecies variability at intronic regions but is conserved among isolates of the same species. These data were used to develop a PCR-restriction fragment length polymorphism (PCR-RFLP) method that rapidly and accurately distinguishes A. fumigatus, A. lentulus, and N. udagawae, three major species within the section Fumigati that have previously been implicated in disease. Digestion of the benA amplicon with BccI generated unique banding patterns; the results were validated by screening a collection of clinical strains and by in silico analysis of the benA sequences of Aspergillus spp. deposited in the GenBank database. PCR-RFLP of benA is a simple method for the identification of clinically important, similar morphotypes of Aspergillus spp. within the section Fumigati.
Transcription of two adjacent carbohydrate utilization gene clusters in Bifidobacterium breve UCC2003 is controlled by LacI- and repressor open reading frame kinase (ROK)-type regulators.

PubMed

O'Connell, Kerry Joan; Motherway, Mary O'Connell; Liedtke, Andrea; Fitzgerald, Gerald F; Paul Ross, R; Stanton, Catherine; Zomer, Aldert; van Sinderen, Douwe

2014-06-01

Members of the genus Bifidobacterium are commonly found in the gastrointestinal tracts of mammals, including humans, where their growth is presumed to be dependent on various diet- and/or host-derived carbohydrates. To understand transcriptional control of bifidobacterial carbohydrate metabolism, we investigated two genetic carbohydrate utilization clusters dedicated to the metabolism of raffinose-type sugars and melezitose. Transcriptomic and gene inactivation approaches revealed that the raffinose utilization system is positively regulated by an activator protein, designated RafR. The gene cluster associated with melezitose metabolism was shown to be subject to direct negative control by a LacI-type transcriptional regulator, designated MelR1, in addition to apparent indirect negative control by means of a second LacI-type regulator, MelR2. In silico analysis, DNA-protein interaction, and primer extension studies revealed the MelR1 and MelR2 operator sequences, each of which is positioned just upstream of or overlapping the correspondingly regulated promoter sequences. Similar analyses identified the RafR binding operator sequence located upstream of the rafB promoter. This study indicates that transcriptional control of gene clusters involved in carbohydrate metabolism in bifidobacteria is subject to conserved regulatory systems, representing either positive or negative control.
Bacterial spoilers of food: behavior, fitness and functional properties.

PubMed

Remenant, Benoît; Jaffrès, Emmanuel; Dousset, Xavier; Pilet, Marie-France; Zagorec, Monique

2015-02-01

Most food products are highly perishable as they constitute a rich nutrient source for microbial development. Among the microorganisms contaminating food, some present metabolic activities leading to spoilage. In addition to hygienic rules to reduce contamination, various treatments are applied during production and storage to avoid the growth of unwanted microbes. The nature and appearance of spoilage therefore depend on the physiological state of spoilers and on their ability to resist the processing/storage conditions and flourish on the food matrix. Spoilage also relies on the interactions between the microorganisms composing the ecosystems encountered in food. The recent rapid increase in publicly available bacterial genome sequences, as well as the access to high-throughput methods, should lead to a better understanding of spoiler behavior and to the possibility of decreasing food spoilage. This review lists the main bacterial species identified as food spoilers, their ability to develop during storage and/or processing, and the functions potentially involved in spoilage. We have also compiled an inventory of the available genome sequences of species encompassing spoilage strains. Combining in silico analysis of genome sequences with experimental data is proposed in order to understand and thus control the bacterial spoilage of food better. Copyright © 2014 Elsevier Ltd. All rights reserved.
Conservative secondary structure motifs already present in early-stage folding (in silico) as found in serpines family.

PubMed

Brylinski, Michal; Konieczny, Leszek; Kononowicz, Andrzej; Roterman, Irena

2008-03-21

The well-known procedure implemented in ClustalW oriented on the sequence comparison was applied to structure comparison. The consensus sequence as well as consensus structure has been defined for proteins belonging to serpine family. The structure of early stage intermediate was the object for similarity search. The high values of W(sequence) appeared to be accordant with high values of W(structure) making possible structure comparison using common criteria for sequence and structure comparison. Since the early stage structural form has been created according to limited conformational sub-space which does not include the beta-structure (this structure is mediated by C7eq structural form), is particularly important to see, that the C7eq structural form may be treated as the seed for beta-structure present in the final native structure of protein. The applicability of ClustalW procedure to structure comparison makes these two comparisons unified.
History and current status of wheat miRNAs using next-generation sequencing and their roles in development and stress.

PubMed

Budak, Hikmet; Khan, Zaeema; Kantar, Melda

2015-05-01

As small molecules that aid in posttranscriptional silencing, microRNA (miRNA) discovery and characterization have vastly benefited from the recent development and widespread application of next-generation sequencing (NGS) technologies. Several miRNAs were identified through sequencing of constructed small RNA libraries, whereas others were predicted by in silico methods using the recently accumulating sequence data. NGS was a major breakthrough in efforts to sequence and dissect the genomes of plants, including bread wheat and its progenitors, which have large, repetitive and complex genomes. Availability of survey sequences of wheat whole genome and its individual chromosomes enabled researchers to predict and assess wheat miRNAs both in the subgenomic and whole genome levels. Moreover, small RNA construction and sequencing-based studies identified several putative development- and stress-related wheat miRNAs, revealing their differential expression patterns in specific developmental stages and/or in response to stress conditions. With the vast amount of wheat miRNAs identified in recent years, we are approaching to an overall knowledge on the wheat miRNA repertoire. In the following years, more comprehensive research in relation to miRNA conservation or divergence across wheat and its close relatives or progenitors should be performed. Results may serve valuable in understanding both the significant roles of species-specific miRNAs and also provide us information in relation to the dynamics between miRNAs and evolution in wheat. Furthermore, putative development- or stress-related miRNAs identified should be subjected to further functional analysis, which may be valuable in efforts to develop wheat with better resistance and/or yield. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Identification of miRNA from Bouteloua gracilis, a drought tolerant grass, by deep sequencing and their in silico analysis.

PubMed

Ordóñez-Baquera, Perla Lucía; González-Rodríguez, Everardo; Aguado-Santacruz, Gerardo Armando; Rascón-Cruz, Quintín; Conesa, Ana; Moreno-Brito, Verónica; Echavarria, Raquel; Dominguez-Viveros, Joel

2017-02-01

MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate signal transduction, development, metabolism, and stress responses in plants through post-transcriptional degradation and/or translational repression of target mRNAs. Several studies have addressed the role of miRNAs in model plant species, but miRNA expression and function in economically important forage crops, such as Bouteloua gracilis (Poaceae), a high-quality and drought-resistant grass distributed in semiarid regions of the United States and northern Mexico remain unknown. We applied high-throughput sequencing technology and bioinformatics analysis and identified 31 conserved miRNA families and 53 novel putative miRNAs with different abundance of reads in chlorophyllic cell cultures derived from B. gracilis. Some conserved miRNA families were highly abundant and possessed predicted targets involved in metabolism, plant growth and development, and stress responses. We also predicted additional identified novel miRNAs with specific targets, including B. gracilis ESTs, which were detected under drought stress conditions. Here we report 31 conserved miRNA families and 53 putative novel miRNAs in B. gracilis. Our results suggested the presence of regulatory miRNAs involved in modulating physiological and stress responses in this grass species. Copyright © 2016 Elsevier Ltd. All rights reserved.
Novel Mutation in the CASR Gene (p.Leu123Ser) in a Case of Autosomal Dominant Hypocalcemia

PubMed Central

Regala, Joana; Cavaco, Branca; Domingues, Rita; Limbert, Catarina; Lopes, Lurdes

2015-01-01

Autosomal dominant hypocalcemia, caused by activating mutations of the calcium-sensing receptor (CASR) gene, is characterized by hypocalcemia with an inappropriately low concentration of parathyroid hormone (PTH). In this report, we describe the identification of a novel missense mutation in the CASR gene, in a boy with autosomal dominant hypocalcemia. Polymerase chain reaction (PCR)–single strand and DNA sequencing revealed a heterozygous mutation in CASR gene that causes a leucine substitution for serine at codon 123 (p.Leu123Ser). This mutation was absent in DNA from 50 control patients. In silico studies suggest that the identified variant was likely pathogenic. Sequencing analysis in the mother suggested mosaicism for the same variant, and she was clinically and biochemically unaffected. Clinical manifestations of the index case started with seizures at 14 months of age; cognitive impairment and several neuropsychological disabilities were noted during childhood. Extrapyramidal signs and basal ganglia calcification developed later, namely, hand tremor and rigidity at the age of 7 and 18 years, respectively. Laboratory analysis revealed hypocalcemia, hyperphosphatemia, and low-serum PTH with hypomagnesemia and mild hypercalciuria. After 2 years of treatment with calcium supplements and calcitriol, some brief periods of clinical improvement were reported; as well as an absence of nephrocalcinosis. PMID:27617113
In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

PubMed Central

2016-01-01

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons. PMID:27698666
RNA-Seq mediated root transcriptome analysis of Chlorophytum borivilianum for identification of genes involved in saponin biosynthesis.

PubMed

Kumar, Sunil; Kalra, Shikha; Singh, Baljinder; Kumar, Avneesh; Kaur, Jagdeep; Singh, Kashmir

2016-01-01

Chlorophytum borivilianum is an important species of liliaceae family, owing to its vital medicinal properties. Plant roots are used for aphrodisiac, adaptogen, anti-aging, health-restorative and health-promoting purposes. Saponins, are considered to be the principal bioactive components responsible for the wide variety of pharmacological properties of this plant. In the present study, we have performed de novo root transcriptome sequencing of C. borivilianum using Illumina Hiseq 2000 platform, to gain molecular insight into saponins biosynthesis. A total of 33,963,356 high-quality reads were obtained after quality filtration. Sequences were assembled using various programs which generated 97,344 transcripts with a size range of 100-5,216 bp and N50 value of 342. Data was analyzed against non-redundant proteins, gene ontology (GO), and enzyme commission (EC) databases. All the genes involved in saponins biosynthesis along with five full-length genes namely farnesyl pyrophosphate synthase, cycloartenol synthase, β-amyrin synthase, cytochrome p450, and sterol-3-glucosyltransferase were identified. Read per exon kilobase per million (RPKM)-based comparative expression profiling was done to study the differential regulation of the genes. In silico expression analysis of seven selected genes of saponin biosynthetic pathway was validated by qRT-PCR.
In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites.

PubMed

Irizarry, Kristopher J L; Bryden, Randall L

2016-01-01

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus . Our results provide insight into pigment phenotypes in pythons.
Genetic and bioinformatics analysis of four novel GCK missense variants detected in Caucasian families with GCK-MODY phenotype.

PubMed

Costantini, S; Malerba, G; Contreas, G; Corradi, M; Marin Vargas, S P; Giorgetti, A; Maffeis, C

2015-05-01

Heterozygous loss-of-function mutations in the glucokinase (GCK) gene cause maturity-onset diabetes of the young (MODY) subtype GCK (GCK-MODY/MODY2). GCK sequencing revealed 16 distinct mutations (13 missense, 1 nonsense, 1 splice site, and 1 frameshift-deletion) co-segregating with hyperglycaemia in 23 GCK-MODY families. Four missense substitutions (c.718A>G/p.Asn240Asp, c.757G>T/p.Val253Phe, c.872A>C/p.Lys291Thr, and c.1151C>T/p.Ala384Val) were novel and a founder effect for the nonsense mutation (c.76C>T/p.Gln26*) was supposed. We tested whether an accurate bioinformatics approach could strengthen family-genetic evidence for missense variant pathogenicity in routine diagnostics, where wet-lab functional assays are generally unviable. In silico analyses of the novel missense variants, including orthologous sequence conservation, amino acid substitution (AAS)-pathogenicity predictors, structural modeling and splicing predictors, suggested that the AASs and/or the underlying nucleotide changes are likely to be pathogenic. This study shows how a careful bioinformatics analysis could provide effective suggestions to help molecular-genetic diagnosis in absence of wet-lab validations. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Fungal proteomics: from identification to function.

PubMed

Doyle, Sean

2011-08-01

Some fungi cause disease in humans and plants, while others have demonstrable potential for the control of insect pests. In addition, fungi are also a rich reservoir of therapeutic metabolites and industrially useful enzymes. Detailed analysis of fungal biochemistry is now enabled by multiple technologies including protein mass spectrometry, genome and transcriptome sequencing and advances in bioinformatics. Yet, the assignment of function to fungal proteins, encoded either by in silico annotated, or unannotated genes, remains problematic. The purpose of this review is to describe the strategies used by many researchers to reveal protein function in fungi, and more importantly, to consolidate the nomenclature of 'unknown function protein' as opposed to 'hypothetical protein' - once any protein has been identified by protein mass spectrometry. A combination of approaches including comparative proteomics, pathogen-induced protein expression and immunoproteomics are outlined, which, when used in combination with a variety of other techniques (e.g. functional genomics, microarray analysis, immunochemical and infection model systems), appear to yield comprehensive and definitive information on protein function in fungi. The relative advantages of proteomic, as opposed to transcriptomic-only, analyses are also described. In the future, combined high-throughput, quantitative proteomics, allied to transcriptomic sequencing, are set to reveal much about protein function in fungi. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
BRCA1/2 missense mutations and the value of in-silico analyses.

PubMed

Sadowski, Carolin E; Kohlstedt, Daniela; Meisel, Cornelia; Keller, Katja; Becker, Kerstin; Mackenroth, Luisa; Rump, Andreas; Schröck, Evelin; Wimberger, Pauline; Kast, Karin

2017-11-01

The clinical implications of genetic variants in BRCA1/2 in healthy and affected individuals are considerable. Variant interpretation, however, is especially challenging for missense variants. The majority of them are classified as variants of unknown clinical significance (VUS). Computational (in-silico) predictive programs are easy to access, but represent only one tool out of a wide range of complemental approaches to classify VUS. With this single-center study, we aimed to evaluate the impact of in-silico analyses in a spectrum of different BRCA1/2 missense variants. We conducted mutation analysis of BRCA1/2 in 523 index patients with suspected hereditary breast and ovarian cancer (HBOC). Classification of the genetic variants was performed according to the German Consortium (GC)-HBOC database. Additionally, all missense variants were classified by the following three in-silico prediction tools: SIFT, Mutation Taster (MT2) and PolyPhen2 (PPH2). Overall 201 different variants, 68 of which constituted missense variants were ranked as pathogenic, neutral, or unknown. The classification of missense variants by in-silico tools resulted in a higher amount of pathogenic mutations (25% vs. 13.2%) compared to the GC-HBOC-classification. Altogether, more than fifty percent (38/68, 55.9%) of missense variants were ranked differently. Sensitivity of in-silico-tools for mutation prediction was 88.9% (PPH2), 100% (SIFT) and 100% (MT2). We found a relevant discrepancy in variant classification by using in-silico prediction tools, resulting in potential overestimation and/or underestimation of cancer risk. More reliable, notably gene-specific, prediction tools and functional tests are needed to improve clinical counseling. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
In silico evidence for sequence-dependent nucleosome sliding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lequieu, Joshua; Schwartz, David C.; de Pablo, Juan J.

Nucleosomes represent the basic building block of chromatin and provide an important mechanism by which cellular processes are controlled. The locations of nucleosomes across the genome are not random but instead depend on both the underlying DNA sequence and the dynamic action of other proteins within the nucleus. These processes are central to cellular function, and the molecular details of the interplay between DNA sequence and nudeosome dynamics remain poorly understood. In this work, we investigate this interplay in detail by relying on a molecular model, which permits development of a comprehensive picture of the underlying free energy surfaces andmore » the corresponding dynamics of nudeosome repositioning. The mechanism of nudeosome repositioning is shown to be strongly linked to DNA sequence and directly related to the binding energy of a given DNA sequence to the histone core. It is also demonstrated that chromatin remodelers can override DNA-sequence preferences by exerting torque, and the histone H4 tail is then identified as a key component by which DNA-sequence, histone modifications, and chromatin remodelers could in fact be coupled.« less
Phylogenetic reconstruction in the Order Nymphaeales: ITS2 secondary structure analysis and in silico testing of maturase k (matK) as a potential marker for DNA bar coding

PubMed Central

2012-01-01

Background The Nymphaeales (waterlilly and relatives) lineage has diverged as the second branch of basal angiosperms and comprises of two families: Cabombaceae and Nymphaceae. The classification of Nymphaeales and phylogeny within the flowering plants are quite intriguing as several systems (Thorne system, Dahlgren system, Cronquist system, Takhtajan system and APG III system (Angiosperm Phylogeny Group III system) have attempted to redefine the Nymphaeales taxonomy. There have been also fossil records consisting especially of seeds, pollen, stems, leaves and flowers as early as the lower Cretaceous. Here we present an in silico study of the order Nymphaeales taking maturaseK (matK) and internal transcribed spacer (ITS2) as biomarkers for phylogeny reconstruction (using character-based methods and Bayesian approach) and identification of motifs for DNA barcoding. Results The Maximum Likelihood (ML) and Bayesian approach yielded congruent fully resolved and well-supported trees using a concatenated (ITS2+ matK) supermatrix aligned dataset. The taxon sampling corroborates the monophyly of Cabombaceae. Nuphar emerges as a monophyletic clade in the family Nymphaeaceae while there are slight discrepancies in the monophyletic nature of the genera Nymphaea owing to Victoria-Euryale and Ondinea grouping in the same node of Nymphaeaceae. ITS2 secondary structures alignment corroborate the primary sequence analysis. Hydatellaceae emerged as a sister clade to Nymphaeaceae and had a basal lineage amongst the water lilly clades. Species from Cycas and Ginkgo were taken as outgroups and were rooted in the overall tree topology from various methods. Conclusions MatK genes are fast evolving highly variant regions of plant chloroplast DNA that can serve as potential biomarkers for DNA barcoding and also in generating primers for angiosperms with identification of unique motif regions. We have reported unique genus specific motif regions in the Order Nymphaeles from matK dataset which can be further validated for barcoding and designing of PCR primers. Our analysis using a novel approach of sequence-structure alignment and phylogenetic reconstruction using molecular morphometrics congrue with the current placement of Hydatellaceae within the early-divergent angiosperm order Nymphaeales. The results underscore the fact that more diverse genera, if not fully resolved to be monophyletic, should be represented by all major lineages. PMID:23282079
A Caenorhabditis elegans Mass Spectrometric Resource for Neuropeptidomics

NASA Astrophysics Data System (ADS)

Van Bael, Sven; Zels, Sven; Boonen, Kurt; Beets, Isabel; Schoofs, Liliane; Temmerman, Liesbet

2018-05-01

Neuropeptides are important signaling molecules used by nervous systems to mediate and fine-tune neuronal communication. They can function as neurotransmitters or neuromodulators in neural circuits, or they can be released as neurohormones to target distant cells and tissues. Neuropeptides are typically cleaved from larger precursor proteins by the action of proteases and can be the subject of post-translational modifications. The short, mature neuropeptide sequences often entail the only evolutionarily reasonably conserved regions in these precursor proteins. Therefore, it is particularly challenging to predict all putative bioactive peptides through in silico mining of neuropeptide precursor sequences. Peptidomics is an approach that allows de novo characterization of peptides extracted from body fluids, cells, tissues, organs, or whole-body preparations. Mass spectrometry, often combined with on-line liquid chromatography, is a hallmark technique used in peptidomics research. Here, we used an acidified methanol extraction procedure and a quadrupole-Orbitrap LC-MS/MS pipeline to analyze the neuropeptidome of Caenorhabditis elegans. We identified an unprecedented number of 203 mature neuropeptides from C. elegans whole-body extracts, including 35 peptides from known, hypothetical, as well as from completely novel neuropeptide precursor proteins that have not been predicted in silico. This set of biochemically verified peptide sequences provides the most elaborate C. elegans reference neurpeptidome so far. To exploit this resource to the fullest, we make our in-house database of known and predicted neuropeptides available to the community as a valuable resource. We are providing these collective data to help the community progress, amongst others, by supporting future differential and/or functional studies. [Figure not available: see fulltext.
A Caenorhabditis elegans Mass Spectrometric Resource for Neuropeptidomics

NASA Astrophysics Data System (ADS)

Van Bael, Sven; Zels, Sven; Boonen, Kurt; Beets, Isabel; Schoofs, Liliane; Temmerman, Liesbet

2018-01-01

Neuropeptides are important signaling molecules used by nervous systems to mediate and fine-tune neuronal communication. They can function as neurotransmitters or neuromodulators in neural circuits, or they can be released as neurohormones to target distant cells and tissues. Neuropeptides are typically cleaved from larger precursor proteins by the action of proteases and can be the subject of post-translational modifications. The short, mature neuropeptide sequences often entail the only evolutionarily reasonably conserved regions in these precursor proteins. Therefore, it is particularly challenging to predict all putative bioactive peptides through in silico mining of neuropeptide precursor sequences. Peptidomics is an approach that allows de novo characterization of peptides extracted from body fluids, cells, tissues, organs, or whole-body preparations. Mass spectrometry, often combined with on-line liquid chromatography, is a hallmark technique used in peptidomics research. Here, we used an acidified methanol extraction procedure and a quadrupole-Orbitrap LC-MS/MS pipeline to analyze the neuropeptidome of Caenorhabditis elegans. We identified an unprecedented number of 203 mature neuropeptides from C. elegans whole-body extracts, including 35 peptides from known, hypothetical, as well as from completely novel neuropeptide precursor proteins that have not been predicted in silico. This set of biochemically verified peptide sequences provides the most elaborate C. elegans reference neurpeptidome so far. To exploit this resource to the fullest, we make our in-house database of known and predicted neuropeptides available to the community as a valuable resource. We are providing these collective data to help the community progress, amongst others, by supporting future differential and/or functional studies.
An in silico DNA cloning experiment for the biochemistry laboratory.

PubMed

Elkins, Kelly M

2011-01-01

This laboratory exercise introduces students to concepts in recombinant DNA technology while accommodating a major semester project in protein purification, structure, and function in a biochemistry laboratory for junior- and senior-level undergraduate students. It is also suitable for forensic science courses focused in DNA biology and advanced high school biology classes. Students begin by examining a plasmid map with the goal of identifying which restriction enzymes may be used to clone a piece of foreign DNA containing a gene of interest into the vector. From the National Center for Biotechnology Initiative website, students are instructed to retrieve a protein sequence and use Expasy's Reverse Translate program to reverse translate the protein to cDNA. Students then use Integrated DNA Technologies' OligoAnalyzer to predict the complementary DNA strand and obtain DNA recognition sequences for the desired restriction enzymes from New England Biolabs' website. Students add the appropriate DNA restriction sequences to the double-stranded foreign DNA for cloning into the plasmid and infecting Escherichia coli cells. Students are introduced to computational biology tools, molecular biology terminology and the process of DNA cloning in this valuable single session, in silico experiment. This project develops students' understanding of the cloning process as a whole and contrasts with other laboratory and internship experiences in which the students may be involved in only a piece of the cloning process/techniques. Students interested in pursuing postgraduate study and research or employment in an academic biochemistry or molecular biology laboratory or industry will benefit most from this experience. Copyright © 2010 Wiley Periodicals, Inc.

A Caenorhabditis elegans Mass Spectrometric Resource for Neuropeptidomics.

PubMed

Van Bael, Sven; Zels, Sven; Boonen, Kurt; Beets, Isabel; Schoofs, Liliane; Temmerman, Liesbet

2018-05-01

Neuropeptides are important signaling molecules used by nervous systems to mediate and fine-tune neuronal communication. They can function as neurotransmitters or neuromodulators in neural circuits, or they can be released as neurohormones to target distant cells and tissues. Neuropeptides are typically cleaved from larger precursor proteins by the action of proteases and can be the subject of post-translational modifications. The short, mature neuropeptide sequences often entail the only evolutionarily reasonably conserved regions in these precursor proteins. Therefore, it is particularly challenging to predict all putative bioactive peptides through in silico mining of neuropeptide precursor sequences. Peptidomics is an approach that allows de novo characterization of peptides extracted from body fluids, cells, tissues, organs, or whole-body preparations. Mass spectrometry, often combined with on-line liquid chromatography, is a hallmark technique used in peptidomics research. Here, we used an acidified methanol extraction procedure and a quadrupole-Orbitrap LC-MS/MS pipeline to analyze the neuropeptidome of Caenorhabditis elegans. We identified an unprecedented number of 203 mature neuropeptides from C. elegans whole-body extracts, including 35 peptides from known, hypothetical, as well as from completely novel neuropeptide precursor proteins that have not been predicted in silico. This set of biochemically verified peptide sequences provides the most elaborate C. elegans reference neurpeptidome so far. To exploit this resource to the fullest, we make our in-house database of known and predicted neuropeptides available to the community as a valuable resource. We are providing these collective data to help the community progress, amongst others, by supporting future differential and/or functional studies. Graphical Abstract ᅟ.
Promoter analysis reveals cis-regulatory motifs associated with the expression of the WRKY transcription factor CrWRKY1 in Catharanthus roseus.

PubMed

Yang, Zhirong; Patra, Barunava; Li, Runzhi; Pattanaik, Sitakanta; Yuan, Ling

2013-12-01

WRKY transcription factors (TFs) are emerging as an important group of regulators of plant secondary metabolism. However, the cis-regulatory elements associated with their regulation have not been well characterized. We have previously demonstrated that CrWRKY1, a member of subgroup III of the WRKY TF family, regulates biosynthesis of terpenoid indole alkaloids in the ornamental and medicinal plant, Catharanthus roseus. Here, we report the isolation and functional characterization of the CrWRKY1 promoter. In silico analysis of the promoter sequence reveals the presence of several potential TF binding motifs, indicating the involvement of additional TFs in the regulation of the TIA pathway. The CrWRKY1 promoter can drive the expression of a β-glucuronidase (GUS) reporter gene in native (C. roseus protoplasts and transgenic hairy roots) and heterologous (transgenic tobacco seedlings) systems. Analysis of 5'- or 3'-end deletions indicates that the sequence located between positions -140 to -93 bp and -3 to +113 bp, relative to the transcription start site, is critical for promoter activity. Mutation analysis shows that two overlapping as-1 elements and a CT-rich motif contribute significantly to promoter activity. The CrWRKY1 promoter is induced in response to methyl jasmonate (MJ) treatment and the promoter region between -230 and -93 bp contains a putative MJ-responsive element. The CrWRKY1 promoter can potentially be used as a tool to isolate novel TFs involved in the regulation of the TIA pathway.
In silico Analysis of Toxins of Staphylococcus aureus for Validating Putative Drug Targets.

PubMed

Mohana, Ramadevi; Venugopal, Subhashree

2017-01-01

Toxins are one among the numerous virulence factors produced by the bacteria. These are powerful poisonous substances enabling the bacteria to encounter the defense mechanism of human body. The pathogenic system of Staphylococcus aureus is evolved with various exotoxins that cause detrimental effects on human immune system. Four toxins namely enterotoxin A, exfoliative toxin A, TSST-1 and γ-hemolysin were downloaded from Uniprot database and were analyzed to understand the nature of the toxins and for drug target validation. The results inferred that the toxins were found to interact with many protein partners and no homologous sequences for human proteome were found, and based on similarity search in Drugbank, the targets were identified as novel drug targets. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Gene expression analysis of flax seed development

PubMed Central

2011-01-01

Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages) seed coats (globular and torpedo stages) and endosperm (pooled globular to torpedo stages) and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST) (GenBank accessions LIBEST_026995 to LIBEST_027011) were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152) had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid clones that comprise even low-expressed genes such as those encoding transcription factors. This has allowed us to delineate the spatio-temporal aspects of gene expression underlying the biosynthesis of a number of important seed constituents in flax. Flax belongs to a taxonomic group of diverse plants and the large sequence database will allow for evolutionary studies as well. PMID:21529361
A calmodulin-like protein (LCALA) is a new Leishmania amazonensis candidate for telomere end-binding protein.

PubMed

Morea, Edna G O; Viviescas, Maria Alejandra; Fernandes, Carlos A H; Matioli, Fabio F; Lira, Cristina B B; Fernandez, Maribel F; Moraes, Barbara S; da Silva, Marcelo S; Storti, Camila B; Fontes, Marcos R M; Cano, Maria Isabel N

2017-11-01

Leishmania spp. telomeres are composed of 5'-TTAGGG-3' repeats associated with proteins. We have previously identified LaRbp38 and LaRPA-1 as proteins that bind the G-rich telomeric strand. At that time, we had also partially characterized a protein: DNA complex, named LaGT1, but we could not identify its protein component. Using protein-DNA interaction and competition assays, we confirmed that LaGT1 is highly specific to the G-rich telomeric single-stranded DNA. Three protein bands, with LaGT1 activity, were isolated from affinity-purified protein extracts in-gel digested, and sequenced de novo using mass spectrometry analysis. In silico analysis of the digested peptide identified them as a putative calmodulin with sequences identical to the T. cruzi calmodulin. In the Leishmania genome, the calmodulin ortholog is present in three identical copies. We cloned and sequenced one of the gene copies, named it LCalA, and obtained the recombinant protein. Multiple sequence alignment and molecular modeling showed that LCalA shares homology to most eukaryotes calmodulin. In addition, we demonstrated that LCalA is nuclear, partially co-localizes with telomeres and binds in vivo the G-rich telomeric strand. Recombinant LCalA can bind specifically and with relative affinity to the G-rich telomeric single-strand and to a 3'G-overhang, and DNA binding is calcium dependent. We have described a novel candidate component of Leishmania telomeres, LCalA, a nuclear calmodulin that binds the G-rich telomeric strand with high specificity and relative affinity, in a calcium-dependent manner. LCalA is the first reported calmodulin that binds in vivo telomeric DNA. Copyright © 2017 Elsevier B.V. All rights reserved.
Genome Sequence Analysis of New Isolates of the Winona Strain of Plum pox virus and the First Definitive Evidence of Intrastrain Recombination Events.

PubMed

James, Delano; Sanderson, Dan; Varga, Aniko; Sheveleva, Anna; Chirkov, Sergei

2016-04-01

Plum pox virus (PPV) is genetically diverse with nine different strains identified. Mutations, indel events, and interstrain recombination events are known to contribute to the genetic diversity of PPV. This is the first report of intrastrain recombination events that contribute to PPV's genetic diversity. Fourteen isolates of the PPV strain Winona (W) were analyzed including nine new strain W isolates sequenced completely in this study. Isolates of other strains of PPV with more than one isolate with the complete genome sequence available in GenBank were included also in this study for comparison and analysis. Five intrastrain recombination events were detected among the PPV W isolates, one among PPV C strain isolates, and one among PPV M strain isolates. Four (29%) of the PPV W isolates analyzed are recombinants; one of which (P2-1) is a mosaic, with three recombination events identified. A new interstrain recombinant event was identified between a strain M isolate and a strain Rec isolate, a known recombinant. In silico recombination studies and pairwise distance analyses of PPV strain D isolates indicate that a threshold of genetic diversity exists for the detectability of recombination events, in the range of approximately 0.78×10(-2) to 1.33×10(-2) mean pairwise distance. RDP4 analyses indicate that in the case of PPV Rec isolates there may be a recombinant breakpoint distinct from the obvious transition point of strain sequences. Evidence was obtained that indicates that the frequency of PPV recombination is underestimated, which may be true for other RNA viruses where low genetic diversity exists.
Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

PubMed Central

Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

2012-01-01

Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Assessment of Epstein-Barr virus nucleic acids in gastric but not in breast cancer by next-generation sequencing of pooled Mexican samples

PubMed Central

Fuentes-Pananá, Ezequiel M; Larios-Serrato, Violeta; Méndez-Tenorio, Alfonso; Morales-Sánchez, Abigail; Arias, Carlos F; Torres, Javier

2016-01-01

Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline. PMID:26910355
Assessment of Epstein-Barr virus nucleic acids in gastric but not in breast cancer by next-generation sequencing of pooled Mexican samples.

PubMed

Fuentes-Pananá, Ezequiel M; Larios-Serrato, Violeta; Méndez-Tenorio, Alfonso; Morales-Sánchez, Abigail; Arias, Carlos F; Torres, Javier

2016-03-01

Gastric (GC) and breast (BrC) cancer are two of the most common and deadly tumours. Different lines of evidence suggest a possible causative role of viral infections for both GC and BrC. Wide genome sequencing (WGS) technologies allow searching for viral agents in tissues of patients with cancer. These technologies have already contributed to establish virus-cancer associations as well as to discovery new tumour viruses. The objective of this study was to document possible associations of viral infection with GC and BrC in Mexican patients. In order to gain idea about cost effective conditions of experimental sequencing, we first carried out an in silico simulation of WGS. The next-generation-platform IlluminaGallx was then used to sequence GC and BrC tumour samples. While we did not find viral sequences in tissues from BrC patients, multiple reads matching Epstein-Barr virus (EBV) sequences were found in GC tissues. An end-point polymerase chain reaction confirmed an enrichment of EBV sequences in one of the GC samples sequenced, validating the next-generation sequencing-bioinformatics pipeline.
Is plant mitochondrial RNA editing a source of phylogenetic incongruence? An answer from in silico and in vivo data sets.

PubMed

Picardi, Ernesto; Quagliariello, Carla

2008-03-26

In plant mitochondria, the post-transcriptional RNA editing process converts C to U at a number of specific sites of the mRNA sequence and usually restores phylogenetically conserved codons and the encoded amino acid residues. Sites undergoing RNA editing evolve at a higher rate than sites not modified by the process. As a result, editing sites strongly affect the evolution of plant mitochondrial genomes, representing an important source of sequence variability and potentially informative characters. To date no clear and convincing evidence has established whether or not editing sites really affect the topology of reconstructed phylogenetic trees. For this reason, we investigated here the effect of RNA editing on the tree building process of twenty different plant mitochondrial gene sequences and by means of computer simulations. Based on our simulation study we suggest that the editing 'noise' in tree topology inference is mainly manifested at the cDNA level. In particular, editing sites tend to confuse tree topologies when artificial genomic and cDNA sequences are generated shorter than 500 bp and with an editing percentage higher than 5.0%. Similar results have been also obtained with genuine plant mitochondrial genes. In this latter instance, indeed, the topology incongruence increases when the editing percentage goes up from about 3.0 to 14.0%. However, when the average gene length is higher than 1,000 bp (rps3, matR and atp1) no differences in the comparison between inferred genomic and cDNA topologies could be detected. Our findings by the here reported in silico and in vivo computer simulation system seem to strongly suggest that editing sites contribute in the generation of misleading phylogenetic trees if the analyzed mitochondrial gene sequence is highly edited (higher than 3.0%) and reduced in length (shorter than 500 bp). In the current lack of direct experimental evidence the results presented here encourage, thus, the use of genomic mitochondrial rather than cDNA sequences for reconstructing phylogenetic events in land plants.
In silico identification and characterization of common epitope-based peptide vaccine for Nipah and Hendra viruses.

PubMed

Saha, Chayan Kumar; Mahbub Hasan, Md; Saddam Hossain, Md; Asraful Jahan, Md; Azad, Abul Kalam

2017-06-01

To explore a common B- and T-cell epitope-based vaccine that can elicit an immune response against encephalitis causing genus Henipaviruses, Hendra virus (HeV) and Nipah virus (NiV). Membrane proteins F, G and M of HeV and NiV were retrieved from the protein database and subjected to different bioinformatics tools to predict antigenic B-cell epitopes. Best B-cell epitopes were then analyzed to predict their T-cell antigenic potentiality. Antigenic B- and T-cell epitopes that shared maximum identity with HeV and NiV were selected. Stability of the selected epitopes was predicted. Finally, the selected epitopes were subjected to molecular docking simulation with HLA-DR to confirm their antigenic potentiality in silico. One epitope from G proteins, one from M proteins and none from F proteins were selected based on their antigenic potentiality. The epitope from the G proteins was stable whereas that from M was unstable. The M-epitope was made stable by adding flanking dipeptides. The 15-mer G-epitope (VDPLRVQWRNNSVIS) showed at least 66% identity with all NiV and HeV G protein sequences, while the 15-mer M-epitope (GKLEFRRNNAIAFKG) with the dipeptide flanking residues showed 73% identity with all NiV and HeV M protein sequences available in the database. Molecular docking simulation with most frequent MHC class-II (MHC II) and class-I (MHC I) molecules showed that these epitopes could bind within HLA binding grooves to elicit an immune response. Data in our present study revealed the notion that the epitopes from G and M proteins might be the target for peptide-based subunit vaccine design against HeV and NiV. However, the biochemical analysis is necessary to experimentally validate the interaction of epitopes individually with the MHC molecules through elucidation of immunity induction. Copyright © 2017 Hainan Medical University. Production and hosting by Elsevier B.V. All rights reserved.
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
Understanding mechanism of in vitro maturation, fertilization and culture of sheep embryoes through in silico analysis.

PubMed

Sreenivas, Dulam; Kaladhar, Dowluru Svgk; Samy, A Palni; Kumar, R Sangeeth

2012-01-01

Protein interations are presently required to understand the mechanisms of in vitro maturation, fertilization and culture of sheep embryoes through in silico analysis. The present work has been conducted on TCM-199 supplemented with epidermal growth factor (EGF), fetal bovine serum (FBS) or wheat peptones The maturation rate of oocyte was significantly higher in the FBS supplemented group when compared with BSA and wheat peptone supplemented groups. The in silico protein interaction studies has shown that the proteins EGFR (epidermal growth factor receptor), CCK (cholecystokinin)- a peptide hormone, Alb - a serum albumin, ESR- estrogen receptor 1, TGFA- transforming growth factor, STAT- signal transducer and FN1- fibronectin 1 has direct interaction and produces cell growth in in vitro culture. Alb is directly activates EGF and promotes MAPK3 that mediates diverse biological functions such as cell growth, adhesion and proliferation. Alb may also involve in stress response signalling and may be in cell cycle control.
In Silico Screening Based on Predictive Algorithms as a Design Tool for Exon Skipping Oligonucleotides in Duchenne Muscular Dystrophy

PubMed Central

Echigoya, Yusuke; Mouly, Vincent; Garcia, Luis; Yokota, Toshifumi; Duddy, William

2015-01-01

The use of antisense ‘splice-switching’ oligonucleotides to induce exon skipping represents a potential therapeutic approach to various human genetic diseases. It has achieved greatest maturity in exon skipping of the dystrophin transcript in Duchenne muscular dystrophy (DMD), for which several clinical trials are completed or ongoing, and a large body of data exists describing tested oligonucleotides and their efficacy. The rational design of an exon skipping oligonucleotide involves the choice of an antisense sequence, usually between 15 and 32 nucleotides, targeting the exon that is to be skipped. Although parameters describing the target site can be computationally estimated and several have been identified to correlate with efficacy, methods to predict efficacy are limited. Here, an in silico pre-screening approach is proposed, based on predictive statistical modelling. Previous DMD data were compiled together and, for each oligonucleotide, some 60 descriptors were considered. Statistical modelling approaches were applied to derive algorithms that predict exon skipping for a given target site. We confirmed (1) the binding energetics of the oligonucleotide to the RNA, and (2) the distance in bases of the target site from the splice acceptor site, as the two most predictive parameters, and we included these and several other parameters (while discounting many) into an in silico screening process, based on their capacity to predict high or low efficacy in either phosphorodiamidate morpholino oligomers (89% correctly predicted) and/or 2’O Methyl RNA oligonucleotides (76% correctly predicted). Predictions correlated strongly with in vitro testing for sixteen de novo PMO sequences targeting various positions on DMD exons 44 (R2 0.89) and 53 (R2 0.89), one of which represents a potential novel candidate for clinical trials. We provide these algorithms together with a computational tool that facilitates screening to predict exon skipping efficacy at each position of a target exon. PMID:25816009
hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms

PubMed Central

Xie, Qingjun; Tzfadia, Oren; Levy, Matan; Weithorn, Efrat; Peled-Zehavi, Hadas; Van Parys, Thomas; Van de Peer, Yves; Galili, Gad

2016-01-01

ABSTRACT Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/. PMID:27071037
Propagating annotations of molecular networks using in silico fragmentation

PubMed Central

da Silva, Ricardo R.; Wang, Mingxun; Fox, Evan; Balunas, Marcy J.; Klassen, Jonathan L.; Dorrestein, Pieter C.

2018-01-01

The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp. PMID:29668671
Propagating annotations of molecular networks using in silico fragmentation.

PubMed

da Silva, Ricardo R; Wang, Mingxun; Nothias, Louis-Félix; van der Hooft, Justin J J; Caraballo-Rodríguez, Andrés Mauricio; Fox, Evan; Balunas, Marcy J; Klassen, Jonathan L; Lopes, Norberto Peporine; Dorrestein, Pieter C

2018-04-01

The annotation of small molecules is one of the most challenging and important steps in untargeted mass spectrometry analysis, as most of our biological interpretations rely on structural annotations. Molecular networking has emerged as a structured way to organize and mine data from untargeted tandem mass spectrometry (MS/MS) experiments and has been widely applied to propagate annotations. However, propagation is done through manual inspection of MS/MS spectra connected in the spectral networks and is only possible when a reference library spectrum is available. One of the alternative approaches used to annotate an unknown fragmentation mass spectrum is through the use of in silico predictions. One of the challenges of in silico annotation is the uncertainty around the correct structure among the predicted candidate lists. Here we show how molecular networking can be used to improve the accuracy of in silico predictions through propagation of structural annotations, even when there is no match to a MS/MS spectrum in spectral libraries. This is accomplished through creating a network consensus of re-ranked structural candidates using the molecular network topology and structural similarity to improve in silico annotations. The Network Annotation Propagation (NAP) tool is accessible through the GNPS web-platform https://gnps.ucsd.edu/ProteoSAFe/static/gnps-theoretical.jsp.
Decreased HIV diversity after allogeneic stem cell transplantation of an HIV-1 infected patient: a case report

PubMed Central

2010-01-01

The human immunodeficiency virus type 1 (HIV-1) coreceptor use and viral evolution were analyzed in blood samples from an HIV-1 infected patient undergoing allogeneic stem cell transplantation (SCT). Coreceptor use was predicted in silico from sequence data obtained from the third variable loop region of the viral envelope gene with two software tools. Viral diversity and evolution was evaluated on the same samples by Bayesian inference and maximum likelihood methods. In addition, phenotypic analysis was done by comparison of viral growth in peripheral blood mononuclear cells and in a CCR5 (R5)-deficient T-cell line which was controlled by a reporter assay confirming viral tropism. In silico coreceptor predictions did not match experimental determinations that showed a consistent R5 tropism. Anti-HIV directed antibodies could be detected before and after the SCT. These preexisting antibodies did not prevent viral rebound after the interruption of antiretroviral therapy during the SCT. Eventually, transplantation and readministration of anti-retroviral drugs lead to sustained increase in CD4 counts and decreased viral load to undetectable levels. Unexpectedly, viral diversity decreased after successful SCT. Our data evidence that only R5-tropic virus was found in the patient before and after transplantation. Therefore, blocking CCR5 receptor during stem cell transplantation might have had beneficial effects and this might apply to more patients undergoing allogeneic stem cell transplantation. Furthermore, we revealed a scenario of HIV-1 dynamic different from the commonly described ones. Analysis of viral evolution shows the decrease of viral diversity even during episodes with bursts in viral load. PMID:20210988
An in silico approach to investigate the source of the controversial interpretations about the phenotypic results of the human AhR-gene G1661A polymorphism.

PubMed

Aftabi, Younes; Colagar, Abasalt Hosseinzadeh; Mehrnejad, Faramarz

2016-03-21

Aryl hydrocarbon receptor (AhR) acts as an enhancer binding ligand-activated intracellular receptor. Chromatin remodeling components and general transcription factors such as TATA-binding protein (TBP) are evoked on AhR-target genes by interaction with its flexible transactivation domain (TAD). AhR-G1661A single nucleotide polymorphism (SNP: rs2066853) causes an arginine to lysine substitution in the acidic sub-domain of TAD at position 554 (R554K). Although, numerous studies associate the SNP with some abnormalities such as cancer, other reliable investigations refuse the associations. Consequently, the interpretation of the phenotypic results of G1661A-transition has been controversial. In this study, an in silico analysis were performed to investigate the possible effects of the transition on AhR-mRNA, protein structure, interaction properties and modifications. The analysis revealed that the R554K substitution affects secondary structure and solvent accessibility of adjacent residues. Also, it causes to decreasing of the AhR stability; altering the hydropathy features of the local sequence and changing the pattern of the residues at the binding site of the TAD-acidic sub-domain. Generating of new sites for ubiquitination and acetylation for AhR-K554 variant respectively at positions 544 and 560 was predicted. Our findings intensify the idea that the AhR-G1661A transition may affects AhR-TAD interactions, especially with the TBP, which influence AhR-target genes expression. However, the previously reported flexibility of the modular TAD could act as an intervening factor, moderate the SNP effects and causes distinct outcomes in different individuals and tissues. Copyright © 2016 Elsevier Ltd. All rights reserved.
Elevated heart rate triggers action potential alternans and sudden death. translational study of a homozygous KCNH2 mutation.

PubMed

Schweigmann, Ulrich; Biliczki, Peter; Ramirez, Rafael J; Marschall, Christoph; Takac, Ina; Brandes, Ralf P; Kotzot, Dieter; Girmatsion, Zenawit; Hohnloser, Stefan H; Ehrlich, Joachim R

2014-01-01

Long QT syndrome (LQTS) leads to arrhythmic events and increased risk for sudden cardiac death (SCD). Homozygous KCNH2 mutations underlying LQTS-2 have previously been termed "human HERG knockout" and typically express severe phenotypes. We studied genotype-phenotype correlations of an LQTS type 2 mutation identified in the homozygous index patient from a consanguineous Turkish family after his brother died suddenly during febrile illness. Clinical work-up, DNA sequencing, mutagenesis, cell culture, patch-clamp, in silico mathematical modelling, protein biochemistry, confocal microscopy were performed. Genetic analysis revealed a homozygous C-terminal KCNH2 mutation (p.R835Q) in the index patient (QTc ∼506 ms with notched T waves). Parents were I° cousins - both heterozygous for the mutation and clinically unremarkable (QTc ∼447 ms, father and ∼396 ms, mother). Heterologous expression of KCNH2-R835Q showed mildly reduced current amplitudes. Biophysical properties of ionic currents were also only nominally changed with slight acceleration of deactivation and more negative V50 in R835Q-currents. Protein biochemistry and confocal microscopy revealed similar expression patterns and trafficking of WT and R835Q, even at elevated temperature. In silico analysis demonstrated mildly prolonged ventricular action potential duration (APD) compared to WT at a cycle length of 1000 ms. At a cycle length of 350 ms M-cell APD remained stable in WT, but displayed APD alternans in R835Q. Kv11.1 channels affected by the C-terminal R835Q mutation display mildly modified biophysical properties, but leads to M-cell APD alternans with elevated heart rate and could precipitate SCD under specific clinical circumstances associated with high heart rates.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.