Sample records for bioinformatics analysis predicted

  1. An integrated bioinformatics analysis to dissect kinase dependency in triple negative breast cancer.

    PubMed

    Ryall, Karen A; Kim, Jihye; Klauck, Peter J; Shin, Jimin; Yoo, Minjae; Ionkina, Anastasia; Pitts, Todd M; Tentler, John J; Diamond, Jennifer R; Eckhardt, S Gail; Heasley, Lynn E; Kang, Jaewoo; Tan, Aik Choon

    2015-01-01

    Triple-Negative Breast Cancer (TNBC) is an aggressive disease with a poor prognosis. Clinically, TNBC patients have limited treatment options besides chemotherapy. The goal of this study was to determine the kinase dependency in TNBC cell lines and to predict compounds that could inhibit these kinases using integrative bioinformatics analysis. We integrated publicly available gene expression data, high-throughput pharmacological profiling data, and quantitative in vitro kinase binding data to determine the kinase dependency in 12 TNBC cell lines. We employed Kinase Addiction Ranker (KAR), a novel bioinformatics approach, which integrated these data sources to dissect kinase dependency in TNBC cell lines. We then used the kinase dependency predicted by KAR for each TNBC cell line to query K-Map for compounds targeting these kinases. We validated our predictions using published and new experimental data. In summary, we implemented an integrative bioinformatics analysis that determines kinase dependency in TNBC. Our analysis revealed candidate kinases as potential targets in TNBC for further pharmacological and biological studies.

  2. Bioinformatics and peptidomics approaches to the discovery and analysis of food-derived bioactive peptides.

    PubMed

    Agyei, Dominic; Tsopmo, Apollinaire; Udenigwe, Chibuike C

    2018-06-01

    There are emerging advancements in the strategies used for the discovery and development of food-derived bioactive peptides because of their multiple food and health applications. Bioinformatics and peptidomics are two computational and analytical techniques that have the potential to speed up the development of bioactive peptides from bench to market. Structure-activity relationships observed in peptides form the basis for bioinformatics and in silico prediction of bioactive sequences encrypted in food proteins. Peptidomics, on the other hand, relies on "hyphenated" (liquid chromatography-mass spectrometry-based) techniques for the detection, profiling, and quantitation of peptides. Together, bioinformatics and peptidomics approaches provide a low-cost and effective means of predicting, profiling, and screening bioactive protein hydrolysates and peptides from food. This article discuses the basis, strengths, and limitations of bioinformatics and peptidomics approaches currently used for the discovery and analysis of food-derived bioactive peptides.

  3. Prediction of Acute Mountain Sickness using a Blood-Based Test

    DTIC Science & Technology

    2016-01-01

    2015): In quarter 17 we focused on two major tasks: getting the RNA purified and ready for chip analysis and working on the bioinformatics ... bioinformatics organization of all the data we will examine for this study. To remind the reviewer, we have a primary dataset of ~120 subjects who were studied...companion study, AltitudeOmics, to the database of gene studies to be analyzed for AMS prediction • expansion of a bioinformatics team to include an

  4. Taking Bioinformatics to Systems Medicine.

    PubMed

    van Kampen, Antoine H C; Moerland, Perry D

    2016-01-01

    Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.

  5. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  6. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  7. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Fang, Xiang; Li, Ning-qiu; Fu, Xiao-zhe; Li, Kai-bin; Lin, Qiang; Liu, Li-hui; Shi, Cun-bin; Wu, Shu-qin

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  8. ENFIN--A European network for integrative systems biology.

    PubMed

    Kahlem, Pascal; Clegg, Andrew; Reisinger, Florian; Xenarios, Ioannis; Hermjakob, Henning; Orengo, Christine; Birney, Ewan

    2009-11-01

    Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.

  9. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.

  10. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  11. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  12. BIAS: Bioinformatics Integrated Application Software.

    PubMed

    Finak, G; Godin, N; Hallett, M; Pepin, F; Rajabi, Z; Srivastava, V; Tang, Z

    2005-04-15

    We introduce a development platform especially tailored to Bioinformatics research and software development. BIAS (Bioinformatics Integrated Application Software) provides the tools necessary for carrying out integrative Bioinformatics research requiring multiple datasets and analysis tools. It follows an object-relational strategy for providing persistent objects, allows third-party tools to be easily incorporated within the system and supports standards and data-exchange protocols common to Bioinformatics. BIAS is an OpenSource project and is freely available to all interested users at http://www.mcb.mcgill.ca/~bias/. This website also contains a paper containing a more detailed description of BIAS and a sample implementation of a Bayesian network approach for the simultaneous prediction of gene regulation events and of mRNA expression from combinations of gene regulation events. hallett@mcb.mcgill.ca.

  13. Making authentic science accessible—the benefits and challenges of integrating bioinformatics into a high-school science curriculum

    PubMed Central

    Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769

  14. Making authentic science accessible-the benefits and challenges of integrating bioinformatics into a high-school science curriculum.

    PubMed

    Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat

    2017-01-01

    Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. © The Author 2016. Published by Oxford University Press.

  15. An "in silico" Bioinformatics Laboratory Manual for Bioscience Departments: "Prediction of Glycosylation Sites in Phosphoethanolamine Transferases"

    ERIC Educational Resources Information Center

    Alyuruk, Hakan; Cavas, Levent

    2014-01-01

    Genomics and proteomics projects have produced a huge amount of raw biological data including DNA and protein sequences. Although these data have been stored in data banks, their evaluation is strictly dependent on bioinformatics tools. These tools have been developed by multidisciplinary experts for fast and robust analysis of biological data.…

  16. Combining medical informatics and bioinformatics toward tools for personalized medicine.

    PubMed

    Sarachan, B D; Simmons, M K; Subramanian, P; Temkin, J M

    2003-01-01

    Key bioinformatics and medical informatics research areas need to be identified to advance knowledge and understanding of disease risk factors and molecular disease pathology in the 21 st century toward new diagnoses, prognoses, and treatments. Three high-impact informatics areas are identified: predictive medicine (to identify significant correlations within clinical data using statistical and artificial intelligence methods), along with pathway informatics and cellular simulations (that combine biological knowledge with advanced informatics to elucidate molecular disease pathology). Initial predictive models have been developed for a pilot study in Huntington's disease. An initial bioinformatics platform has been developed for the reconstruction and analysis of pathways, and work has begun on pathway simulation. A bioinformatics research program has been established at GE Global Research Center as an important technology toward next generation medical diagnostics. We anticipate that 21 st century medical research will be a combination of informatics tools with traditional biology wet lab research, and that this will translate to increased use of informatics techniques in the clinic.

  17. A bioinformatics prediction approach towards analyzing the glycosylation, co-expression and interaction patterns of epithelial membrane antigen (EMA/MUC1)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kalra, Rajkumar S., E-mail: renu-wadhwa@aist.go.jp; Wadhwa, Renu, E-mail: renu-wadhwa@aist.go.jp

    2015-02-27

    Epithelial membrane antigen (EMA or MUC1) is a heavily glycosylated, type I transmembrane glycoprotein commonly expressed by epithelial cells of duct organs. It has been shown to be aberrantly glycosylated in several diseases including cancer. Protein sequence based annotation and analysis of glycosylation profile of glycoproteins by robust computational and comprehensive algorithms provides possible insights to the mechanism(s) of anomalous glycosylation. In present report, by using a number of bioinformatics applications we studied EMA/MUC1 and explored its trans-membrane structural domain sequence that is widely subjected to glycosylation. Exploration of different extracellular motifs led to prediction of N and O-linked glycosylationmore » target sites. Based on the putative O-linked target sites, glycosylated moieties and pathways were envisaged. Furthermore, Protein network analysis demonstrated physical interaction of EMA with a number of proteins and confirmed its functional involvement in cell growth and proliferation pathways. Gene Ontology analysis suggested an involvement of EMA in a number of functions including signal transduction, protein binding, processing and transport along with glycosylation. Thus, present study explored potential of bioinformatics prediction approach in analyzing glycosylation, co-expression and interaction patterns of EMA/MUC1 glycoprotein.« less

  18. Scalable web services for the PSIPRED Protein Analysis Workbench.

    PubMed

    Buchan, Daniel W A; Minneci, Federico; Nugent, Tim C O; Bryson, Kevin; Jones, David T

    2013-07-01

    Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.

  19. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. Robust enzyme design: bioinformatic tools for improved protein stability.

    PubMed

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. [Bioinformatics analysis of mosquito densovirus nostructure protein NS1].

    PubMed

    Dong, Yun-qiao; Ma, Wen-li; Gu, Jin-bao; Zheng, Wen-ling

    2009-12-01

    To analyze and predict the structure and function of mosquito densovirus (MDV) nostructual protein1 (NS1). Using different bioinformatics software, the EXPASY pmtparam tool, ClustalX1.83, Bioedit, MEGA3.1, ScanProsite, and Motifscan, respectively to comparatively analyze and predict the physic-chemical parameters, homology, evolutionary relation, secondary structure and main functional motifs of NS1. MDV NS1 protein was a unstable hydrophilic protein and the amino acid sequence was highly conserved which had a relatively closer evolutionary distance with infectious hypodermal and hematopoietic necrosis virus (IHHNV). MDV NS1 has a specific domain of superfamily 3 helicase of small DNA viruses. This domain contains the NTP-binding region with a metal ion-dependent ATPase activity. A virus replication roller rolling-circle replication(RCR) initiation domain was found near the N terminal of this protein. This protien has the biological function of single stranded incision enzyme. The bioinformatics prediction results suggest that MDV NS1 protein plays a key role in viral replication, packaging, and the other stages of viral life.

  2. MLH1 Promoter Methylation and Prediction/Prognosis of Gastric Cancer: A Systematic Review and Meta and Bioinformatic Analysis.

    PubMed

    Shen, Shixuan; Chen, Xiaohui; Li, Hao; Sun, Liping; Yuan, Yuan

    2018-01-01

    Background: The promoter methylation of MLH1 gene and gastric cancer (GC)has been investigated previously. To get a more credible conclusion, we performed a systematic review and meta and bioinformatic analysis to clarify the role of MLH1 methylation in the prediction and prognosis of GC. Methods: Eligible studies were targeted after searching the PubMed, Web of Science, Embase, BIOSIS, CNKI and Wanfang Data to collect the information of MLH1 methylation and GC. The link strength between the two was estimated by odds ratio with its 95% confidence interval. The Newcastle-Ottawa scale was used for quantity assessment . Subgroup and sensitivity analysis were conducted to explore sources of heterogeneity. The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) were employed for bioinformatics analysis on the correlation between MLH1 methylation and GC risk, clinicopathological behavior as well as prognosis. Results: 2365 GC and 1563 controls were included in the meta-analysis. The pooled OR of MLH1 methylation in GC was 4.895 (95% CI: 3.149-7.611, P<0.001), which considerably associated with increased GC risk. No significant difference was found in relation to Lauren classification, tumor invasion, lymph node/distant metastasis and tumor stage in GC. Analysis based on GEO and TCGA showed that high MLH1 methylation enhanced GC risk but might not related with GC clinicopathological features and prognosis. Conclusion: MLH1 methylation is an alive biomarker for the prediction of GC and it might not affect GC behavior. Further study could be conducted to verify the impact of MLH1 methylation on GC prognosis.

  3. MLH1 Promoter Methylation and Prediction/Prognosis of Gastric Cancer: A Systematic Review and Meta and Bioinformatic Analysis

    PubMed Central

    Shen, Shixuan; Chen, Xiaohui; Li, Hao; Sun, Liping; Yuan, Yuan

    2018-01-01

    Background: The promoter methylation of MLH1 gene and gastric cancer (GC)has been investigated previously. To get a more credible conclusion, we performed a systematic review and meta and bioinformatic analysis to clarify the role of MLH1 methylation in the prediction and prognosis of GC. Methods: Eligible studies were targeted after searching the PubMed, Web of Science, Embase, BIOSIS, CNKI and Wanfang Data to collect the information of MLH1 methylation and GC. The link strength between the two was estimated by odds ratio with its 95% confidence interval. The Newcastle-Ottawa scale was used for quantity assessment. Subgroup and sensitivity analysis were conducted to explore sources of heterogeneity. The Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) were employed for bioinformatics analysis on the correlation between MLH1 methylation and GC risk, clinicopathological behavior as well as prognosis. Results: 2365 GC and 1563 controls were included in the meta-analysis. The pooled OR of MLH1 methylation in GC was 4.895 (95% CI: 3.149-7.611, P<0.001), which considerably associated with increased GC risk. No significant difference was found in relation to Lauren classification, tumor invasion, lymph node/distant metastasis and tumor stage in GC. Analysis based on GEO and TCGA showed that high MLH1 methylation enhanced GC risk but might not related with GC clinicopathological features and prognosis. Conclusion: MLH1 methylation is an alive biomarker for the prediction of GC and it might not affect GC behavior. Further study could be conducted to verify the impact of MLH1 methylation on GC prognosis. PMID:29896277

  4. Special Focus

    PubMed Central

    Nawrocki, Eric P.; Burge, Sarah W.

    2013-01-01

    The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research. PMID:23948768

  5. Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

    PubMed

    Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

    2018-06-20

    Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.

  6. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

    PubMed

    Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

    2013-12-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.

  7. Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

    PubMed Central

    Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

    2013-01-01

    Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704

  8. Influenza Research Database: An integrated bioinformatics resource for influenza virus research.

    PubMed

    Zhang, Yun; Aevermann, Brian D; Anderson, Tavis K; Burke, David F; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N; Lee, Alexandra J; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B; Scheuermann, Richard H

    2017-01-04

    The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Bioinformatics Identification of Modules of Transcription Factor Binding Sites in Alzheimer's Disease-Related Genes by In Silico Promoter Analysis and Microarrays

    PubMed Central

    Augustin, Regina; Lichtenthaler, Stefan F.; Greeff, Michael; Hansen, Jens; Wurst, Wolfgang; Trümbach, Dietrich

    2011-01-01

    The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases. PMID:21559189

  10. Agonist Binding to Chemosensory Receptors: A Systematic Bioinformatics Analysis

    PubMed Central

    Fierro, Fabrizio; Suku, Eda; Alfonso-Prieto, Mercedes; Giorgetti, Alejandro; Cichon, Sven; Carloni, Paolo

    2017-01-01

    Human G-protein coupled receptors (hGPCRs) constitute a large and highly pharmaceutically relevant membrane receptor superfamily. About half of the hGPCRs' family members are chemosensory receptors, involved in bitter taste and olfaction, along with a variety of other physiological processes. Hence these receptors constitute promising targets for pharmaceutical intervention. Molecular modeling has been so far the most important tool to get insights on agonist binding and receptor activation. Here we investigate both aspects by bioinformatics-based predictions across all bitter taste and odorant receptors for which site-directed mutagenesis data are available. First, we observe that state-of-the-art homology modeling combined with previously used docking procedures turned out to reproduce only a limited fraction of ligand/receptor interactions inferred by experiments. This is most probably caused by the low sequence identity with available structural templates, which limits the accuracy of the protein model and in particular of the side-chains' orientations. Methods which transcend the limited sampling of the conformational space of docking may improve the predictions. As an example corroborating this, we review here multi-scale simulations from our lab and show that, for the three complexes studied so far, they significantly enhance the predictive power of the computational approach. Second, our bioinformatics analysis provides support to previous claims that several residues, including those at positions 1.50, 2.50, and 7.52, are involved in receptor activation. PMID:28932739

  11. The Topology Prediction of Membrane Proteins: A Web-Based Tutorial.

    PubMed

    Kandemir-Cavas, Cagin; Cavas, Levent; Alyuruk, Hakan

    2018-06-01

    There is a great need for development of educational materials on the transfer of current bioinformatics knowledge to undergraduate students in bioscience departments. In this study, it is aimed to prepare an example in silico laboratory tutorial on the topology prediction of membrane proteins by bioinformatics tools. This laboratory tutorial is prepared for biochemistry lessons at bioscience departments (biology, chemistry, biochemistry, molecular biology and genetics, and faculty of medicine). The tutorial is intended for students who have not taken a bioinformatics course yet or already have taken a course as an introduction to bioinformatics. The tutorial is based on step-by-step explanations with illustrations. It can be applied under supervision of an instructor in the lessons, or it can be used as a self-study guide by students. In the tutorial, membrane-spanning regions and α-helices of membrane proteins were predicted by internet-based bioinformatics tools. According to the results achieved from internet-based bioinformatics tools, the algorithms and parameters used were effective on the accuracy of prediction. The importance of this laboratory tutorial lies on the facts that it provides an introduction to the bioinformatics and that it also demonstrates an in silico laboratory application to the students at natural sciences. The presented example education material is applicable easily at all departments that have internet connection. This study presents an alternative education material to the students in biochemistry laboratories in addition to classical laboratory experiments.

  12. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  13. Small-angle X-Ray analysis of macromolecular structure: the structure of protein NS2 (NEP) in solution

    NASA Astrophysics Data System (ADS)

    Shtykova, E. V.; Bogacheva, E. N.; Dadinova, L. A.; Jeffries, C. M.; Fedorova, N. V.; Golovko, A. O.; Baratova, L. A.; Batishchev, O. V.

    2017-11-01

    A complex structural analysis of nuclear export protein NS2 (NEP) of influenza virus A has been performed using bioinformatics predictive methods and small-angle X-ray scattering data. The behavior of NEP molecules in a solution (their aggregation, oligomerization, and dissociation, depending on the buffer composition) has been investigated. It was shown that stable associates are formed even in a conventional aqueous salt solution at physiological pH value. For the first time we have managed to get NEP dimers in solution, to analyze their structure, and to compare the models obtained using the method of the molecular tectonics with the spatial protein structure predicted by us using the bioinformatics methods. The results of the study provide a new insight into the structural features of nuclear export protein NS2 (NEP) of the influenza virus A, which is very important for viral infection development.

  14. The construction and assessment of a statistical model for the prediction of protein assay data.

    PubMed

    Pittman, J; Sacks, J; Young, S Stanley

    2002-01-01

    The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.

  15. A comparative proteomic strategy for subcellular proteome research: ICAT approach coupled with bioinformatics prediction to ascertain rat liver mitochondrial proteins and indication of mitochondrial localization for catalase.

    PubMed

    Jiang, Xiao-Sheng; Dai, Jie; Sheng, Quan-Hu; Zhang, Lei; Xia, Qi-Chang; Wu, Jia-Rui; Zeng, Rong

    2005-01-01

    Subcellular proteomics, as an important step to functional proteomics, has been a focus in proteomic research. However, the co-purification of "contaminating" proteins has been the major problem in all the subcellular proteomic research including all kinds of mitochondrial proteome research. It is often difficult to conclude whether these "contaminants" represent true endogenous partners or artificial associations induced by cell disruption or incomplete purification. To solve such a problem, we applied a high-throughput comparative proteome experimental strategy, ICAT approach performed with two-dimensional LC-MS/MS analysis, coupled with combinational usage of different bioinformatics tools, to study the proteome of rat liver mitochondria prepared with traditional centrifugation (CM) or further purified with a Nycodenz gradient (PM). A total of 169 proteins were identified and quantified convincingly in the ICAT analysis, in which 90 proteins have an ICAT ratio of PM:CM>1.0, while another 79 proteins have an ICAT ratio of PM:CM<1.0. Almost all the proteins annotated as mitochondrial according to Swiss-Prot annotation, bioinformatics prediction, and literature reports have a ratio of PM:CM>1.0, while proteins annotated as extracellular or secreted, cytoplasmic, endoplasmic reticulum, ribosomal, and so on have a ratio of PM:CM<1.0. Catalase and AP endonuclease 1, which have been known as peroxisomal and nuclear, respectively, have shown a ratio of PM:CM>1.0, confirming the reports about their mitochondrial location. Moreover, the 125 proteins with subcellular location annotation have been used as a testing dataset to evaluate the efficiency for ascertaining mitochondrial proteins by ICAT analysis and the bioinformatics tools such as PSORT, TargetP, SubLoc, MitoProt, and Predotar. The results indicated that ICAT analysis coupled with combinational usage of different bioinformatics tools could effectively ascertain mitochondrial proteins and distinguish contaminant proteins and even multilocation proteins. Using such a strategy, many novel proteins, known proteins without subcellular location annotation, and even known proteins that have been annotated as other locations have been strongly indicated for their mitochondrial location.

  16. Bioinformatics analysis of the predicted polyprenol reductase genes in higher plants

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wati, R.

    2018-03-01

    The present study evaluates the bioinformatics methods to analyze twenty-four predicted polyprenol reductase genes from higher plants on GenBank as well as predicted the structure, composition, similarity, subcellular localization, and phylogenetic. The physicochemical properties of plant polyprenol showed diversity among the observed genes. The percentage of the secondary structure of plant polyprenol genes followed the ratio order of α helix > random coil > extended chain structure. The values of chloroplast but not signal peptide were too low, indicated that few chloroplast transit peptide in plant polyprenol reductase genes. The possibility of the potential transit peptide showed variation among the plant polyprenol reductase, suggested the importance of understanding the variety of peptide components of plant polyprenol genes. To clarify this finding, a phylogenetic tree was drawn. The phylogenetic tree shows several branches in the tree, suggested that plant polyprenol reductase genes grouped into divergent clusters in the tree.

  17. A bioinformatics approach to identify patients with symptomatic peanut allergy using peptide microarray immunoassay

    PubMed Central

    Lin, Jing; Bruni, Francesca M.; Fu, Zhiyan; Maloney, Jennifer; Bardina, Ludmilla; Boner, Attilio L.; Gimenez, Gustavo; Sampson, Hugh A.

    2013-01-01

    Background Peanut allergy is relatively common, typically permanent, and often severe. Double-blind, placebo-controlled food challenge is considered the gold standard for the diagnosis of food allergy–related disorders. However, the complexity and potential of double-blind, placebo-controlled food challenge to cause life-threatening allergic reactions affects its clinical application. A laboratory test that could accurately diagnose symptomatic peanut allergy would greatly facilitate clinical practice. Objective We sought to develop an allergy diagnostic method that could correctly predict symptomatic peanut allergy by using peptide microarray immunoassays and bioinformatic methods. Methods Microarray immunoassays were performed by using the sera from 62 patients (31 with symptomatic peanut allergy and 31 who had outgrown their peanut allergy or were sensitized but were clinically tolerant to peanut). Specific IgE and IgG4 binding to 419 overlapping peptides (15 mers, 3 offset) covering the amino acid sequences of Ara h 1, Ara h 2, and Ara h 3 were measured by using a peptide microarray immunoassay. Bioinformatic methods were applied for data analysis. Results Individuals with peanut allergy showed significantly greater IgE binding and broader epitope diversity than did peanut-tolerant individuals. No significant difference in IgG4 binding was found between groups. By using machine learning methods, 4 peptide biomarkers were identified and prediction models that can predict the outcome of double-blind, placebo-controlled food challenges with high accuracy were developed by using a combination of the biomarkers. Conclusions In this study, we developed a novel diagnostic approach that can predict peanut allergy with high accuracy by combining the results of a peptide microarray immunoassay and bioinformatic methods. Further studies are needed to validate the efficacy of this assay in clinical practice. PMID:22444503

  18. Bioinformatics analysis and characteristics of VP23 encoded by the newly identified UL18 gene of duck enteritis virus

    NASA Astrophysics Data System (ADS)

    Chen, Xiwen; Cheng, Anchun; Wang, Mingshu; Xiang, Jun

    2011-10-01

    In this study, the predicted information about structures and functions of VP23 encoded by the newly identified DEV UL18 gene through bioinformatics softwares and tools. The DEV UL18 was predicted to encode a polypeptide with 322 amino acids, termed VP23, with a putative molecular mass of 35.250 kDa and a predicted isoelectric point (PI) of 8.37, no signal peptide and transmembrane domain in the polypeptide. The prediction of subcellular localization showed that the DEV-VP23 located at endoplasmic reticulum with 33.3%, mitochondrial with 22.2%, extracellular, including cell wall with 11.1%, vesicles of secretory system with 11.1%, Golgi with 11.1%, and plasma membrane with 11.1%. The acid sequence of analysis showed that the potential antigenic epitopes are situated in 45-47, 53-60, 102-105, 173-180, 185-189, 260-265, 267-271, and 292-299 amino acids. All the consequences inevitably provide some insights for further research about the DEV-VP23 and also provide a fundament for further study on the the new type clinical diagnosis of DEV and can be used for the development of new DEV vaccine.

  19. High-throughput protein analysis integrating bioinformatics and experimental assays

    PubMed Central

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  20. Prediction of the in planta Phakopsora pachyrhizi secretome and potential effector families.

    PubMed

    de Carvalho, Mayra C da C G; Costa Nascimento, Leandro; Darben, Luana M; Polizel-Podanosqui, Adriana M; Lopes-Caitar, Valéria S; Qi, Mingsheng; Rocha, Carolina S; Carazzolle, Marcelo Falsarella; Kuwahara, Márcia K; Pereira, Goncalo A G; Abdelnoor, Ricardo V; Whitham, Steven A; Marcelino-Guimarães, Francismar C

    2017-04-01

    Asian soybean rust (ASR), caused by the obligate biotrophic fungus Phakopsora pachyrhizi, can cause losses greater than 80%. Despite its economic importance, there is no soybean cultivar with durable ASR resistance. In addition, the P. pachyrhizi genome is not yet available. However, the availability of other rust genomes, as well as the development of sample enrichment strategies and bioinformatics tools, has improved our knowledge of the ASR secretome and its potential effectors. In this context, we used a combination of laser capture microdissection (LCM), RNAseq and a bioinformatics pipeline to identify a total of 36 350 P. pachyrhizi contigs expressed in planta and a predicted secretome of 851 proteins. Some of the predicted secreted proteins had characteristics of candidate effectors: small size, cysteine rich, do not contain PFAM domains (except those associated with pathogenicity) and strongly expressed in planta. A comparative analysis of the predicted secreted proteins present in Pucciniales species identified new members of soybean rust and new Pucciniales- or P. pachyrhizi-specific families (tribes). Members of some families were strongly up-regulated during early infection, starting with initial infection through haustorium formation. Effector candidates selected from two of these families were able to suppress immunity in transient assays, and were localized in the plant cytoplasm and nuclei. These experiments support our bioinformatics predictions and show that these families contain members that have functions consistent with P. pachyrhizi effectors. © 2016 BSPP AND JOHN WILEY & SONS LTD.

  1. Bioinformatics of cardiovascular miRNA biology.

    PubMed

    Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas

    2015-12-01

    MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Improved, ACMG-Compliant, in silico prediction of pathogenicity for missense substitutions encoded by TP53 variants.

    PubMed

    Fortuno, Cristina; James, Paul A; Young, Erin L; Feng, Bing; Olivier, Magali; Pesaran, Tina; Tavtigian, Sean V; Spurdle, Amanda B

    2018-05-18

    Clinical interpretation of germline missense variants represents a major challenge, including those in the TP53 Li-Fraumeni syndrome gene. Bioinformatic prediction is a key part of variant classification strategies. We aimed to optimize the performance of the Align-GVGD tool used for p53 missense variant prediction, and compare its performance to other bioinformatic tools (SIFT, PolyPhen-2) and ensemble methods (REVEL, BayesDel). Reference sets of assumed pathogenic and assumed benign variants were defined using functional and/or clinical data. Area under the curve and Matthews correlation coefficient (MCC) values were used as objective functions to select an optimized protein multi-sequence alignment with best performance for Align-GVGD. MCC comparison of tools using binary categories showed optimized Align-GVGD (C15 cut-off) combined with BayesDel (0.16 cut-off), or with REVEL (0.5 cut-off), to have the best overall performance. Further, a semi-quantitative approach using multiple tiers of bioinformatic prediction, validated using an independent set of non-functional and functional variants, supported use of Align-GVGD and BayesDel prediction for different strength of evidence levels in ACMG/AMP rules. We provide rationale for bioinformatic tool selection for TP53 variant classification, and have also computed relevant bioinformatic predictions for every possible p53 missense variant to facilitate their use by the scientific and medical community. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  3. SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oh, J; Deasy, J; Kerns, S

    Purpose: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. Methods: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction. The model building process was performed using 2/3 of samples (training) and the predictive model was tested with 1/3 of samples (validation).more » To identify important single nucleotide polymorphisms (SNPs), we computed the SNP importance score, resulting from our random forest regression model. We performed gene ontology (GO) enrichment analysis for nearby genes of the important SNPs. Results: After univariate analysis on the training dataset, we filtered out many SNPs with p>0.001, resulting in 749 and 367 SNPs that were used in the model building process for rectal bleeding and erectile dysfunction, respectively. On the validation dataset, our random forest regression model achieved the area under the curve (AUC)=0.70 and 0.62 for rectal bleeding and erectile dysfunction, respectively. We performed GO enrichment analysis for the top 25%, 50%, 75%, and 100% SNPs out of the select SNPs in the univariate analysis. When we used the top 50% SNPs, more plausible biological processes were obtained for both toxicities. An additional test with the top 50% SNPs improved predictive power with AUC=0.71 and 0.65 for rectal bleeding and erectile dysfunction. A better performance was achieved with AUC=0.67 when age and androgen deprivation therapy were added to the model for erectile dysfunction. Conclusion: Our approach that combines machine learning and bioinformatics techniques enabled designing better models and identifying more plausible biological processes associated with the outcomes.« less

  4. Bioinformatics approaches for structural and functional analysis of proteins in secondary metabolism in Withania somnifera.

    PubMed

    Sanchita; Singh, Swati; Sharma, Ashok

    2014-11-01

    Withania somnifera (Ashwagandha) is an affluent storehouse of large number of pharmacologically active secondary metabolites known as withanolides. These secondary metabolites are produced by withanolide biosynthetic pathway. Very less information is available on structural and functional aspects of enzymes involved in withanolides biosynthetic pathways of Withiana somnifera. We therefore performed a bioinformatics analysis to look at functional and structural properties of these important enzymes. The pathway enzymes taken for this study were 3-Hydroxy-3-methylglutaryl coenzyme A reductase, 1-Deoxy-D-xylulose-5-phosphate synthase, 1-Deoxy-D-xylulose-5-phosphate reductase, farnesyl pyrophosphate synthase, squalene synthase, squalene epoxidase, and cycloartenol synthase. The prediction of secondary structure was performed for basic structural information. Three-dimensional structures for these enzymes were predicted. The physico-chemical properties such as pI, AI, GRAVY and instability index were also studied. The current information will provide a platform to know the structural attributes responsible for the function of these protein until experimental structures become available.

  5. Bioinformatics analysis reveals biophysical and evolutionary insights into the 3-nitrotyrosine post-translational modification in the human proteome

    PubMed Central

    Ng, John Y.; Boelen, Lies; Wong, Jason W. H.

    2013-01-01

    Protein 3-nitrotyrosine is a post-translational modification that commonly arises from the nitration of tyrosine residues. This modification has been detected under a wide range of pathological conditions and has been shown to alter protein function. Whether 3-nitrotyrosine is important in normal cellular processes or is likely to affect specific biological pathways remains unclear. Using GPS-YNO2, a recently described 3-nitrotyrosine prediction algorithm, a set of predictions for nitrated residues in the human proteome was generated. In total, 9.27 per cent of the proteome was predicted to be nitratable (27 922/301 091). By matching the predictions against a set of curated and experimentally validated 3-nitrotyrosine sites in human proteins, it was found that GPS-YNO2 is able to predict 73.1 per cent (404/553) of these sites. Furthermore, of these sites, 42 have been shown to be nitrated endogenously, with 85.7 per cent (36/42) of these predicted to be nitrated. This demonstrates the feasibility of using the predicted dataset for a whole proteome analysis. A comprehensive bioinformatics analysis was subsequently performed on predicted and all experimentally validated nitrated tyrosine. This found mild but specific biophysical constraints that affect the susceptibility of tyrosine to nitration, and these may play a role in increasing the likelihood of 3-nitrotyrosine to affect processes, including phosphorylation and DNA binding. Furthermore, examining the evolutionary conservation of predicted 3-nitrotyrosine showed that, relative to non-nitrated tyrosine residues, 3-nitrotyrosine residues are generally less conserved. This suggests that, at least in the majority of cases, 3-nitrotyrosine is likely to have a deleterious effect on protein function and less likely to be important in normal cellular function. PMID:23389939

  6. eMolTox: prediction of molecular toxicity with confidence.

    PubMed

    Ji, Changge; Svensson, Fredrik; Zoufir, Azedine; Bender, Andreas

    2018-03-07

    In this work we present eMolTox, a web server for the prediction of potential toxicity associated with a given molecule. 174 toxicology-related in vitro/vivo experimental datasets were used for model construction and Mondrian conformal prediction was used to estimate the confidence of the resulting predictions. Toxic substructure analysis is also implemented in eMolTox. eMolTox predicts and displays a wealth of information of potential molecular toxicities for safety analysis in drug development. The eMolTox Server is freely available for use on the web at http://xundrug.cn/moltox. chicago.ji@gmail.com or ab454@cam.ac.uk. Supplementary data are available at Bioinformatics online.

  7. IN SILICO APPROACHES TO MECHANISTIC AND PREDICTIVE TOXICOLOGY: AN INTRODUCTION TO BIOINFORMATICS FOR TOXICOLOGISTS. (R827402)

    EPA Science Inventory

    Abstract

    Bioinformatics, or in silico biology, is a rapidly growing field that encompasses the theory and application of computational approaches to model, predict, and explain biological function at the molecular level. This information rich field requires new ...

  8. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology.

    PubMed

    Cock, Peter J A; Grüning, Björn A; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu).

  9. In the loop: promoter–enhancer interactions and bioinformatics

    PubMed Central

    Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke

    2016-01-01

    Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731

  10. Compare the Difference of B-cell Epitopes of EgAgB1 and EgAgB3 Proteins Selected through Bioinformatic Analysis

    NASA Astrophysics Data System (ADS)

    An, Mengting; Zhang, Fengbo; Zhu, Yuejie; Zhao, Xiao; Ding, Jianbing

    2018-01-01

    Cystic echinococcosis, as a zoonosis, seriously endangers humans and animals, so early diagnosis of this disease is particularly important. Therefore, this study is to predict B-cell epitopes of EgAgB1 and EgAgB3 proteins by bioinformatics software. B-cell epitopes of EgAgB1 and EgAgB3 proteins are predicted using DNAStar and IEDB software. The results suggest that there are two potential B-cell epitopes in EgAgB1, which located in the 8-15 and 31-37 amino acid residue segments. And two potential B-cell epitopes in EgAgB2, located in the 20∼27 and 47∼53 amino acid residue segments. This study predicted the B-cell epitopes of EgAgB1 and EgAgB3 proteins, which laid the foundation for the early diagnosis of Cystic echinococcosis.

  11. Genotypic analysis of human immunodeficiency virus type 1 env V3 loop sequences: bioinformatics prediction of coreceptor usage among 28 infected mother-infant pairs in a drug-naive population.

    PubMed

    Duri, Kerina; Soko, White; Gumbo, Felicity; Kristiansen, Knut; Mapingure, Munyaradzi; Stray-Pedersen, Babill; Muller, Fredrik

    2011-04-01

    We sought to predict virus coreceptor utilization using a simple bioinformatics method based on genotypic analysis of human immunodeficiency virus types 1 (HIV-1) env V3 loop sequences of 28 infected but drug-naive women during pregnancy and their infected infants and to better understand coreceptor usage in vertical transmission dynamics. The HIV-1 env V3 loop was sequenced from plasma samples and analyzed for viral coreceptor usage and subtype in a cohort of HIV-1-infected pregnant women. Predicted maternal frequencies of the X4, R5X4, and R5 genotypes were 7%, 11%, and 82%, respectively. Antenatal plasma viral load was higher, with a mean log(10) (SD) of 4.8 (1.6) and 3.6 (1.2) for women with the X4 and R5 genotypes, respectively, p = 0.078. Amino acid substitution from the conserved V3 loop crown motif GPGQ to GPGR and lymphadenopathy were associated with the X4 genotype, p = 0.031 and 0.043, respectively. The maternal viral coreceptor genotype was generally preserved in vertical transmission and was predictive of the newborn's viral genotype. Infants born to mothers with X4 genotypes were more likely to have lower birth weights relative to those born to mothers with the R5 genotype, with a mean weight (SD) of 2870 (±332) and 3069 (±300) g, respectively. These data show that at least in HIV-1 subtype C, R5 coreceptor usage is the most predominant genotype, which is generally preserved following vertical transmission and is associated with the V3 GPGQ crown motif. Therefore, antiretroviral-naive pregnant women and their infants can benefit from ARV combination therapies that include R5 entry inhibitors following prediction of their coreceptor genotype using simple bioinformatics methods.

  12. Ready to use bioinformatics analysis as a tool to predict immobilisation strategies for protein direct electron transfer (DET).

    PubMed

    Cazelles, R; Lalaoui, N; Hartmann, T; Leimkühler, S; Wollenberger, U; Antonietti, M; Cosnier, S

    2016-11-15

    Direct electron transfer (DET) to proteins is of considerable interest for the development of biosensors and bioelectrocatalysts. While protein structure is mainly used as a method of attaching the protein to the electrode surface, we employed bioinformatics analysis to predict the suitable orientation of the enzymes to promote DET. Structure similarity and secondary structure prediction were combined underlying localized amino-acids able to direct one of the enzyme's electron relays toward the electrode surface by creating a suitable bioelectrocatalytic nanostructure. The electro-polymerization of pyrene pyrrole onto a fluorine-doped tin oxide (FTO) electrode allowed the targeted orientation of the formate dehydrogenase enzyme from Rhodobacter capsulatus (RcFDH) by means of hydrophobic interactions. Its electron relays were directed to the FTO surface, thus promoting DET. The reduction of nicotinamide adenine dinucleotide (NAD(+)) generating a maximum current density of 1μAcm(-2) with 10mM NAD(+) leads to a turnover number of 0.09electron/s/molRcFDH. This work represents a practical approach to evaluate electrode surface modification strategies in order to create valuable bioelectrocatalysts. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

    PubMed

    Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

    2009-07-01

    Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.

  14. Cloning and bioinformatic analysis of lovastatin biosynthesis regulatory gene lovE.

    PubMed

    Huang, Xin; Li, Hao-ming

    2009-08-05

    Lovastatin is an effective drug for treatment of hyperlipidemia. This study aimed to clone lovastatin biosynthesis regulatory gene lovE and analyze the structure and function of its encoding protein. According to the lovastatin synthase gene sequence from genebank, primers were designed to amplify and clone the lovastatin biosynthesis regulatory gene lovE from Aspergillus terrus genomic DNA. Bioinformatic analysis of lovE and its encoding animo acid sequence was performed through internet resources and software like DNAMAN. Target fragment lovE, almost 1500 bp in length, was amplified from Aspergillus terrus genomic DNA and the secondary and three-dimensional structures of LovE protein were predicted. In the lovastatin biosynthesis process lovE is a regulatory gene and LovE protein is a GAL4-like transcriptional factor.

  15. Bioinformatic Analysis of Strawberry GSTF12 Gene

    NASA Astrophysics Data System (ADS)

    Wang, Xiran; Jiang, Leiyu; Tang, Haoru

    2018-01-01

    GSTF12 has always been known as a key factor of proanthocyanins accumulate in plant testa. Through bioinformatics analysis of the nucleotide and encoded protein sequence of GSTF12, it is more advantageous to the study of genes related to anthocyanin biosynthesis accumulation pathway. Therefore, we chosen GSTF12 gene of 11 kinds species, downloaded their nucleotide and protein sequence from NCBI as the research object, found strawberry GSTF12 gene via bioinformation analyse, constructed phylogenetic tree. At the same time, we analysed the strawberry GSTF12 gene of physical and chemical properties and its protein structure and so on. The phylogenetic tree showed that Strawberry and petunia were closest relative. By the protein prediction, we found that the protein owed one proper signal peptide without obvious transmembrane regions.

  16. Cancer Bioinformatics for Updating Anticancer Drug Developments and Personalized Therapeutics.

    PubMed

    Lu, Da-Yong; Qu, Rong-Xin; Lu, Ting-Ren; Wu, Hong-Ying

    2017-01-01

    Last two to three decades, this world witnesses a rapid progress of biomarkers and bioinformatics technologies. Cancer bioinformatics is one of such important omics branches for experimental/clinical studies and applications. Same as other biological techniques or systems, bioinformatics techniques will be widely used. But they are presently not omni-potent. Despite great popularity and improvements, cancer bioinformatics has its own limitations and shortcomings at this stage of technical advancements. This article will offer a panorama of bioinformatics in cancer researches and clinical therapeutic applications-possible advantages and limitations relating to cancer therapeutics. A lot of beneficial capabilities and outcomes have been described. As a result, a successful new era for cancer bioinformatics is waiting for us if we can adhere on scientific studies of cancer bioinformatics in malignant- origin mining, medical verifications and clinical diagnostic applications. Cancer bioinformatics gave a great significance in disease diagnosis and therapeutic predictions. Many creative ideas and future perspectives are highlighted. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation

    PubMed Central

    Hériché, Jean-Karim; Lees, Jon G.; Morilla, Ian; Walter, Thomas; Petrova, Boryana; Roberti, M. Julia; Hossain, M. Julius; Adler, Priit; Fernández, José M.; Krallinger, Martin; Haering, Christian H.; Vilo, Jaak; Valencia, Alfonso; Ranea, Juan A.; Orengo, Christine; Ellenberg, Jan

    2014-01-01

    The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest. PMID:24943848

  18. Bioinformatics tools in predictive ecology: applications to fisheries

    PubMed Central

    Tucker, Allan; Duplisea, Daniel

    2012-01-01

    There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse. PMID:22144390

  19. Bioinformatics tools in predictive ecology: applications to fisheries.

    PubMed

    Tucker, Allan; Duplisea, Daniel

    2012-01-19

    There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their 'crossover potential' with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse.

  20. Bioinformatics analysis of the phytoene synthase gene in cabbage (Brassica oleracea var. capitata)

    NASA Astrophysics Data System (ADS)

    Sun, Bo; Jiang, Min; Xue, Shengling; Zheng, Aihong; Zhang, Fen; Tang, Haoru

    2018-04-01

    Phytoene Synthase (PSY) is an important enzyme in carotenoid biosynthesis. Here, the Brassica oleracea var. capitata PSY (BocPSY) gene sequences were obtained from Brassica database (BRAD), and preformed for bioinformatics analysis. The BocPSY1, BocPSY2 and BocPSY3 genes mapped to chromosomes 2,3 and 9, and contains an open reading frame of 1,248 bp, 1,266 bp and 1,275 bp that encodes a 415, 421, 424 amino acid protein, respectively. Subcellular localization predicted all BocPSY genes were in the chloroplast. The conserved domain of the BocPSY protein is PLN02632. Homology analysis indicates that the levels of identity among BocPSYs were all more than 85%, and the PSY protein is apparently conserved during plant evolution. The findings of the present study provide a molecular basis for the elucidation of PSY gene function in cabbage.

  1. SoMART, a web server for miRNA, tasiRNA and target gene analysis in Solanaceae plants

    USDA-ARS?s Scientific Manuscript database

    Plant micro(mi)RNAs and trans-acting small interfering (tasi)RNAs mediate posttranscriptional silencing of genes and play important roles in a variety of biological processes. Although bioinformatics prediction and small (s)RNA cloning are the key approaches used for identification of miRNAs, tasiRN...

  2. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology

    PubMed Central

    Grüning, Björn A.; Paszkiewicz, Konrad; Pritchard, Leighton

    2013-01-01

    The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of “effector” proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen’s predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLAST+ wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browser-based form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting. Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu). PMID:24109552

  3. Quantitative proteome-based systematic identification of SIRT7 substrates.

    PubMed

    Zhang, Chaohua; Zhai, Zichao; Tang, Ming; Cheng, Zhongyi; Li, Tingting; Wang, Haiying; Zhu, Wei-Guo

    2017-07-01

    SIRT7 is a class III histone deacetylase that is involved in numerous cellular processes. Only six substrates of SIRT7 have been reported thus far, so we aimed to systematically identify SIRT7 substrates using stable-isotope labeling with amino acids in cell culture (SILAC) coupled with quantitative mass spectrometry (MS). Using SIRT7 +/+ and SIRT7 -/- mouse embryonic fibroblasts as our model system, we identified and quantified 1493 acetylation sites in 789 proteins, of which 261 acetylation sites in 176 proteins showed ≥2-fold change in acetylation state between SIRT7 -/- and SIRT7 +/+ cells. These proteins were considered putative SIRT7 substrates and were carried forward for further analysis. We then validated the predictive efficiency of the SILAC-MS experiment by assessing substrate acetylation status in vitro in six predicted proteins. We also performed a bioinformatic analysis of the MS data, which indicated that many of the putative protein substrates were involved in metabolic processes. Finally, we expanded our list of candidate substrates by performing a bioinformatics-based prediction analysis of putative SIRT7 substrates, using our list of putative substrates as a positive training set, and again validated a subset of the proteins in vitro. In summary, we have generated a comprehensive list of SIRT7 candidate substrates. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Survey of Natural Language Processing Techniques in Bioinformatics.

    PubMed

    Zeng, Zhiqiang; Shi, Hua; Wu, Yun; Hong, Zhiling

    2015-01-01

    Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

  5. Analysis of microRNA profile of Anopheles sinensis by deep sequencing and bioinformatic approaches.

    PubMed

    Feng, Xinyu; Zhou, Xiaojian; Zhou, Shuisen; Wang, Jingwen; Hu, Wei

    2018-03-12

    microRNAs (miRNAs) are small non-coding RNAs widely identified in many mosquitoes. They are reported to play important roles in development, differentiation and innate immunity. However, miRNAs in Anopheles sinensis, one of the Chinese malaria mosquitoes, remain largely unknown. We investigated the global miRNA expression profile of An. sinensis using Illumina Hiseq 2000 sequencing. Meanwhile, we applied a bioinformatic approach to identify potential miRNAs in An. sinensis. The identified miRNA profiles were compared and analyzed by two approaches. The selected miRNAs from the sequencing result and the bioinformatic approach were confirmed with qRT-PCR. Moreover, target prediction, GO annotation and pathway analysis were carried out to understand the role of miRNAs in An. sinensis. We identified 49 conserved miRNAs and 12 novel miRNAs by next-generation high-throughput sequencing technology. In contrast, 43 miRNAs were predicted by the bioinformatic approach, of which two were assigned as novel. Comparative analysis of miRNA profiles by two approaches showed that 21 miRNAs were shared between them. Twelve novel miRNAs did not match any known miRNAs of any organism, indicating that they are possibly species-specific. Forty miRNAs were found in many mosquito species, indicating that these miRNAs are evolutionally conserved and may have critical roles in the process of life. Both the selected known and novel miRNAs (asi-miR-281, asi-miR-184, asi-miR-14, asi-miR-nov5, asi-miR-nov4, asi-miR-9383, and asi-miR-2a) could be detected by quantitative real-time PCR (qRT-PCR) in the sequenced sample, and the expression patterns of these miRNAs measured by qRT-PCR were in concordance with the original miRNA sequencing data. The predicted targets for the known and the novel miRNAs covered many important biological roles and pathways indicating the diversity of miRNA functions. We also found 21 conserved miRNAs and eight counterparts of target immune pathway genes in An. sinensis based on the analysis of An. gambiae. Our results provide the first lead to the elucidation of the miRNA profile in An. sinensis. Unveiling the roles of mosquito miRNAs will undoubtedly lead to a better understanding of mosquito biology and mosquito-pathogen interactions. This work lays the foundation for the further functional study of An. sinensis miRNAs and will facilitate their application in vector control.

  6. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    PubMed

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  7. Harnessing pain heterogeneity and RNA transcriptome to identify blood–based pain biomarkers: a novel correlational study design and bioinformatics approach in a graded chronic constriction injury model

    PubMed Central

    Grace, Peter M.; Hurley, Daniel; Barratt, Daniel T.; Tsykin, Anna; Watkins, Linda R.; Rolan, Paul E.; Hutchinson, Mark R.

    2017-01-01

    A quantitative, peripherally accessible biomarker for neuropathic pain has great potential to improve clinical outcomes. Based on the premise that peripheral and central immunity contribute to neuropathic pain mechanisms, we hypothesized that biomarkers could be identified from the whole blood of adult male rats, by integrating graded chronic constriction injury (CCI), ipsilateral lumbar dorsal quadrant (iLDQ) and whole blood transcriptomes, and pathway analysis with pain behavior. Correlational bioinformatics identified a range of putative biomarker genes for allodynia intensity, many encoding for proteins with a recognized role in immune/nociceptive mechanisms. A selection of these genes was validated in a separate replication study. Pathway analysis of the iLDQ transcriptome identified Fcγ and Fcε signaling pathways, among others. This study is the first to employ the whole blood transcriptome to identify pain biomarker panels. The novel correlational bioinformatics, developed here, selected such putative biomarkers based on a correlation with pain behavior and formation of signaling pathways with iLDQ genes. Future studies may demonstrate the predictive ability of these biomarker genes across other models and additional variables. PMID:22697386

  8. MetaDP: a comprehensive web server for disease prediction of 16S rRNA metagenomic datasets.

    PubMed

    Xu, Xilin; Wu, Aiping; Zhang, Xinlei; Su, Mingming; Jiang, Taijiao; Yuan, Zhe-Ming

    2016-01-01

    High-throughput sequencing-based metagenomics has garnered considerable interest in recent years. Numerous methods and tools have been developed for the analysis of metagenomic data. However, it is still a daunting task to install a large number of tools and complete a complicated analysis, especially for researchers with minimal bioinformatics backgrounds. To address this problem, we constructed an automated software named MetaDP for 16S rRNA sequencing data analysis, including data quality control, operational taxonomic unit clustering, diversity analysis, and disease risk prediction modeling. Furthermore, a support vector machine-based prediction model for intestinal bowel syndrome (IBS) was built by applying MetaDP to microbial 16S sequencing data from 108 children. The success of the IBS prediction model suggests that the platform may also be applied to other diseases related to gut microbes, such as obesity, metabolic syndrome, or intestinal cancer, among others (http://metadp.cn:7001/).

  9. [Evaluation of performance of five bioinformatics software for the prediction of missense mutations].

    PubMed

    Chen, Qianting; Dai, Congling; Zhang, Qianjun; Du, Juan; Li, Wen

    2016-10-01

    To study the prediction performance evaluation with five kinds of bioinformatics software (SIFT, PolyPhen2, MutationTaster, Provean, MutationAssessor). From own database for genetic mutations collected over the past five years, Chinese literature database, Human Gene Mutation Database, and dbSNP, 121 missense mutations confirmed by functional studies, and 121 missense mutations suspected to be pathogenic by pedigree analysis were used as positive gold standard, while 242 missense mutations with minor allele frequency (MAF)>5% in dominant hereditary diseases were used as negative gold standard. The selected mutations were predicted with the five software. Based on the results, the performance of the five software was evaluated for their sensitivity, specificity, positive predict value, false positive rate, negative predict value, false negative rate, false discovery rate, accuracy, and receiver operating characteristic curve (ROC). In terms of sensitivity, negative predictive value and false negative rate, the rank was MutationTaster, PolyPhen2, Provean, SIFT, and MutationAssessor. For specificity and false positive rate, the rank was MutationTaster, Provean, MutationAssessor, SIFT, and PolyPhen2. For positive predict value and false discovery rate, the rank was MutationTaster, Provean, MutationAssessor, PolyPhen2, and SIFT. For area under the ROC curve (AUC) and accuracy, the rank was MutationTaster, Provean, PolyPhen2, MutationAssessor, and SIFT. The prediction performance of software may be different when using different parameters. Among the five software, MutationTaster has the best prediction performance.

  10. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  11. Prediction of target genes for miR-140-5p in pulmonary arterial hypertension using bioinformatics methods.

    PubMed

    Li, Fangwei; Shi, Wenhua; Wan, Yixin; Wang, Qingting; Feng, Wei; Yan, Xin; Wang, Jian; Chai, Limin; Zhang, Qianqian; Li, Manxiang

    2017-12-01

    The expression of microRNA (miR)-140-5p is known to be reduced in both pulmonary arterial hypertension (PAH) patients and monocrotaline-induced PAH models in rat. Identification of target genes for miR-140-5p with bioinformatics analysis may reveal new pathways and connections in PAH. This study aimed to explore downstream target genes and relevant signaling pathways regulated by miR-140-5p to provide theoretical evidences for further researches on role of miR-140-5p in PAH. Multiple downstream target genes and upstream transcription factors (TFs) of miR-140-5p were predicted in the analysis. Gene ontology (GO) enrichment analysis indicated that downstream target genes of miR-140-5p were enriched in many biological processes, such as biological regulation, signal transduction, response to chemical stimulus, stem cell proliferation, cell surface receptor signaling pathways. Kyoto Encyclopedia of Genes and Genome (KEGG) pathway analysis found that downstream target genes were mainly located in Notch, TGF-beta, PI3K/Akt, and Hippo signaling pathway. According to TF-miRNA-mRNA network, the important downstream target genes of miR-140-5p were PPI, TGF-betaR1, smad4, JAG1, ADAM10, FGF9, PDGFRA, VEGFA, LAMC1, TLR4, and CREB. After thoroughly reviewing published literature, we found that 23 target genes and seven signaling pathways were truly inhibited by miR-140-5p in various tissues or cells; most of these verified targets were in accordance with our present prediction. Other predicted targets still need further verification in vivo and in vitro .

  12. DNA mimic proteins: functions, structures, and bioinformatic analysis.

    PubMed

    Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J

    2014-05-13

    DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.

  13. Public data and open source tools for multi-assay genomic investigation of disease.

    PubMed

    Kannan, Lavanya; Ramos, Marcel; Re, Angela; El-Hachem, Nehme; Safikhani, Zhaleh; Gendoo, Deena M A; Davis, Sean; Gomez-Cabrero, David; Castelo, Robert; Hansen, Kasper D; Carey, Vincent J; Morgan, Martin; Culhane, Aedín C; Haibe-Kains, Benjamin; Waldron, Levi

    2016-07-01

    Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods. © The Author 2015. Published by Oxford University Press.

  14. In Silico Prediction and Validation of Gfap as an miR-3099 Target in Mouse Brain.

    PubMed

    Abidin, Shahidee Zainal; Leong, Jia-Wen; Mahmoudi, Marzieh; Nordin, Norshariza; Abdullah, Syahril; Cheah, Pike-See; Ling, King-Hwa

    2017-08-01

    MicroRNAs are small non-coding RNAs that play crucial roles in the regulation of gene expression and protein synthesis during brain development. MiR-3099 is highly expressed throughout embryogenesis, especially in the developing central nervous system. Moreover, miR-3099 is also expressed at a higher level in differentiating neurons in vitro, suggesting that it is a potential regulator during neuronal cell development. This study aimed to predict the target genes of miR-3099 via in-silico analysis using four independent prediction algorithms (miRDB, miRanda, TargetScan, and DIANA-micro-T-CDS) with emphasis on target genes related to brain development and function. Based on the analysis, a total of 3,174 miR-3099 target genes were predicted. Those predicted by at least three algorithms (324 genes) were subjected to DAVID bioinformatics analysis to understand their overall functional themes and representation. The analysis revealed that nearly 70% of the target genes were expressed in the nervous system and a significant proportion were associated with transcriptional regulation and protein ubiquitination mechanisms. Comparison of in situ hybridization (ISH) expression patterns of miR-3099 in both published and in-house-generated ISH sections with the ISH sections of target genes from the Allen Brain Atlas identified 7 target genes (Dnmt3a, Gabpa, Gfap, Itga4, Lxn, Smad7, and Tbx18) having expression patterns complementary to miR-3099 in the developing and adult mouse brain samples. Of these, we validated Gfap as a direct downstream target of miR-3099 using the luciferase reporter gene system. In conclusion, we report the successful prediction and validation of Gfap as an miR-3099 target gene using a combination of bioinformatics resources with enrichment of annotations based on functional ontologies and a spatio-temporal expression dataset.

  15. Biomarker microRNAs for prostate cancer metastasis: screened with a network vulnerability analysis model.

    PubMed

    Lin, Yuxin; Chen, Feifei; Shen, Li; Tang, Xiaoyu; Du, Cui; Sun, Zhandong; Ding, Huijie; Chen, Jiajia; Shen, Bairong

    2018-05-21

    Prostate cancer (PCa) is a fatal malignant tumor among males in the world and the metastasis is a leading cause for PCa death. Biomarkers are therefore urgently needed to detect PCa metastatic signature at the early time. MicroRNAs are small non-coding RNAs with the potential to be biomarkers for disease prediction. In addition, computer-aided biomarker discovery is now becoming an attractive paradigm for precision diagnosis and prognosis of complex diseases. In this study, we identified key microRNAs as biomarkers for predicting PCa metastasis based on network vulnerability analysis. We first extracted microRNAs and mRNAs that were differentially expressed between primary PCa and metastatic PCa (MPCa) samples. Then we constructed the MPCa-specific microRNA-mRNA network and screened microRNA biomarkers by a novel bioinformatics model. The model emphasized the characterization of systems stability changes and the network vulnerability with three measurements, i.e. the structurally single-line regulation, the functional importance of microRNA targets and the percentage of transcription factor genes in microRNA unique targets. With this model, we identified five microRNAs as putative biomarkers for PCa metastasis. Among them, miR-101-3p and miR-145-5p have been previously reported as biomarkers for PCa metastasis and the remaining three, i.e. miR-204-5p, miR-198 and miR-152, were screened as novel biomarkers for PCa metastasis. The results were further confirmed by the assessment of their predictive power and biological function analysis. Five microRNAs were identified as candidate biomarkers for predicting PCa metastasis based on our network vulnerability analysis model. The prediction performance, literature exploration and functional enrichment analysis convinced our findings. This novel bioinformatics model could be applied to biomarker discovery for other complex diseases.

  16. Bioinformatics functional analysis of let-7a, miR-34a, and miR-199a/b reveals novel insights into immune system pathways and cancer hallmarks for hepatocellular carcinoma.

    PubMed

    Soliman, Bangly; Salem, Ahmed; Ghazy, Mohamed; Abu-Shahba, Nourhan; El Hefnawi, Mahmoud

    2018-05-01

    Let-7a, miR-34a, and miR-199 a/b have gained a great attention as master regulators for cellular processes. In particular, these three micro-RNAs act as potential onco-suppressors for hepatocellular carcinoma. Bioinformatics can reveal the functionality of these micro-RNAs through target prediction and functional annotation analysis. In the current study, in silico analysis using innovative servers (miRror Suite, DAVID, miRGator V3.0, GeneTrail) has demonstrated the combinatorial and the individual target genes of these micro-RNAs and further explored their roles in hepatocellular carcinoma progression. There were 87 common target messenger RNAs (p ≤ 0.05) that were predicted to be regulated by the three micro-RNAs using miRror 2.0 target prediction tool. In addition, the functional enrichment analysis of these targets that was performed by DAVID functional annotation and REACTOME tools revealed two major immune-related pathways, eight hepatocellular carcinoma hallmarks-linked pathways, and two pathways that mediate interconnected processes between immune system and hepatocellular carcinoma hallmarks. Moreover, protein-protein interaction network for the predicted common targets was obtained by using STRING database. The individual analysis of target genes and pathways for the three micro-RNAs of interest using miRGator V3.0 and GeneTrail servers revealed some novel predicted target oncogenes such as SOX4, which we validated experimentally, in addition to some regulated pathways of immune system and hepatocarcinogenesis such as insulin signaling pathway and adipocytokine signaling pathway. In general, our results demonstrate that let-7a, miR-34a, and miR-199 a/b have novel interactions in different immune system pathways and major hepatocellular carcinoma hallmarks. Thus, our findings shed more light on the roles of these miRNAs as cancer silencers.

  17. Bioinformatics analysis of the ς-carotene desaturase gene in cabbage (Brassica oleracea var. capitata)

    NASA Astrophysics Data System (ADS)

    Sun, Bo; Zheng, Aihong; Jiang, Min; Xue, Shengling; Zhang, Fen; Tang, Haoru

    2018-04-01

    ς-carotene desaturase (ZDS) is an important enzyme in carotenoid biosynthesis. Here, the Brassica oleracea var. capitata ZDS (BocZDS) gene sequences were obtained from Brassica database (BRAD), and preformed for bioinformatics analysis. The BocZDS gene mapped to Scaffold000363, and contains an open reading frame of 1,686 bp that encodes a 561-amino acid protein with a calculated molecular mass of 62.00 kD and an isoelectric point (pI) of 8.2. Subcellular localization predicted the BocZDS gene was in the chloroplast. The conserved domain of the BocZDS protein is PLN02487, indicating that it belongs the member of zeta-carotene desaturase. Homology analysis indicates that the ZDS protein is apparently conserved during plant evolution and is most closely related to B. oleracea var. oleracea, B. napus, and B. rapa. The findings of the present study provide a molecular basis for the elucidation of ZDS gene function in cabbage.

  18. Bioinformatics in translational drug discovery.

    PubMed

    Wooller, Sarah K; Benstead-Hume, Graeme; Chen, Xiangrong; Ali, Yusuf; Pearl, Frances M G

    2017-08-31

    Bioinformatics approaches are becoming ever more essential in translational drug discovery both in academia and within the pharmaceutical industry. Computational exploitation of the increasing volumes of data generated during all phases of drug discovery is enabling key challenges of the process to be addressed. Here, we highlight some of the areas in which bioinformatics resources and methods are being developed to support the drug discovery pipeline. These include the creation of large data warehouses, bioinformatics algorithms to analyse 'big data' that identify novel drug targets and/or biomarkers, programs to assess the tractability of targets, and prediction of repositioning opportunities that use licensed drugs to treat additional indications. © 2017 The Author(s).

  19. Web tools for predictive toxicology model building.

    PubMed

    Jeliazkova, Nina

    2012-07-01

    The development and use of web tools in chemistry has accumulated more than 15 years of history already. Powered by the advances in the Internet technologies, the current generation of web systems are starting to expand into areas, traditional for desktop applications. The web platforms integrate data storage, cheminformatics and data analysis tools. The ease of use and the collaborative potential of the web is compelling, despite the challenges. The topic of this review is a set of recently published web tools that facilitate predictive toxicology model building. The focus is on software platforms, offering web access to chemical structure-based methods, although some of the frameworks could also provide bioinformatics or hybrid data analysis functionalities. A number of historical and current developments are cited. In order to provide comparable assessment, the following characteristics are considered: support for workflows, descriptor calculations, visualization, modeling algorithms, data management and data sharing capabilities, availability of GUI or programmatic access and implementation details. The success of the Web is largely due to its highly decentralized, yet sufficiently interoperable model for information access. The expected future convergence between cheminformatics and bioinformatics databases provides new challenges toward management and analysis of large data sets. The web tools in predictive toxicology will likely continue to evolve toward the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access.

  20. PRGdb: a bioinformatics platform for plant resistance gene analysis

    PubMed Central

    Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella

    2010-01-01

    PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694

  1. [Two novel pathogenic mutations of GAN gene identified in a patient with giant axonal neuropathy].

    PubMed

    Wang, Juan; Ma, Qingwen; Cai, Qin; Liu, Yanna; Wang, Wei; Ren, Zhaorui

    2016-06-01

    To explore the disease-causing mutations in a patient suspected for giant axonal neuropathy(GAN). Target sequence capture sequencing was used to screen potential mutations in genomic DNA extracted from peripheral blood sample of the patient. Sanger sequencing was applied to confirm the detected mutation. The mutation was verified among 400 GAN alleles from 200 healthy individuals by Sanger sequencing. The function of the mutations was predicted by bioinformatics analysis. The patient was identified as a compound heterozygote carrying two novel pathogenic GAN mutations, i.e., c.778G>T (p.Glu260Ter) and c.277G>A (p.Gly93Arg). Sanger sequencing confirmed that the c.778G>T (p.Glu260Ter) mutation was inherited from his father, while c.277G>A (p.Gly93Arg) was inherited from his mother. The same mutations was not found in the 200 healthy individuals. Bioinformatics analysis predicted that the two mutations probably caused functional abnormality of gigaxonin. Two novel GAN mutations were detected in a patient with GAN. Both mutations are pathogenic and can cause abnormalities of gigaxonin structure and function, leading to pathogenesis of GAN. The results may also offer valuable information for similar diseases.

  2. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315

  3. BioWarehouse: a bioinformatics database warehouse toolkit.

    PubMed

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

    2006-03-23

    This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

  4. CircRNA-0004904, CircRNA-0001855, and PAPP-A: Potential Novel Biomarkers for the Prediction of Preeclampsia.

    PubMed

    Jiang, Min; Lash, Gendie E; Zhao, Xueqing; Long, Yan; Guo, Caijiao; Yang, Hongling

    2018-05-07

    Circular RNAs (circRNAs) are transcribed prevalently in the genome; however, their potential roles in multiple cardiovascular diseases, particularly preeclampsia (PE), are not yet well understood. This study investigated the expression profiles of circRNAs and explored circRNA-mediated pregnancy-associated plasma protein A (PAPP-A) expression as a potential biomarker for PE before 20 weeks of pregnancy. A nested case-control two-phase screening/validation study was performed in pregnant women before 20 weeks of gestation (before clinical diagnosis) at Guangzhou Women and Children's Medical Center from 2012 to 2015. In the screening phase, circRNA expression profiles of blood cells were assessed using a human circRNA microarray, which was designed to detect simultaneously 5396 circRNAs, in 5 patients with PE and 5 age- and gestational week-matched controls. In the validation phase, 18 circRNAs in blood cells predicted by bioinformatics tools were validated by quantitative reverse transcription PCR in a cohort of 60 patients (PE and age-, gestational week-, and sample storage time-matched controls). Then, we examined the involvement of circRNAs in PE-related pathways via interactions with miRNAs by multiple bioinformatics approaches. Bioinformatics analysis predicted that hsa_circ_0004904 and hsa_circ_0001855 miRNA sponges directly target PAPP-A. PAPP-A was verified in the serum of the same cohort of patients using an enzyme-linked immunosorbent assay. Finally, we combined PAPP-A with circRNAs to create a novel preclinical diagnostic model for PE with logistic regression and evaluated the efficiency of this model with receiver operating curve analysis. Volcano plot analysis using various parameters showed that circRNAs were differentially expressed among both groups (P < 0.01, fold change > 2). In the screening phase, we found that 2178 circRNAs were differentially expressed between the control and PE groups, in which 884 circRNAs were downregulated and 1294 circRNAs were upregulated in the PE group compared with the control group. In the validation phase, two circRNAs, hsa_circ_0004904 and hsa_circ_0001855, were significantly upregulated in PE patients compared with healthy pregnant women (P < 0.05). PAPP-A expression levels, related to the two circRNAs based on bioinformatics prediction, were increased in the PE group compared with the control group. The area under the curve of the combined model was 0.94 in the predicted PE subjects. This is the first study to report circRNA profiling in patients with PE prior to the onset of symptoms. Our data suggested that hsa_circ_0004904 and hsa_circ_0001855 combined with PAPP-A might be promising biomarkers for the detection of PE. Moreover, circRNAs may provide new insights into the potential mechanisms underlying the pathophysiology of PE. © 2018 The Author(s). Published by S. Karger AG, Basel.

  5. Structure prediction, expression, and antigenicity of c-terminal of GRP78.

    PubMed

    Aghamollaei, Hossein; Mousavi Gargari, Seyed Latif; Ghanei, Mostafa; Rasaee, Mohamad Javad; Amani, Jafar; Bakherad, Hamid; Farnoosh, Gholamreza

    2017-01-01

    Glucose-regulated protein 78 (GRP78) is a typical endoplasmic reticulum luminal chaperone having a main role in the activation of the unfolded protein response. Because of hypoxia and nutrient deprivation in the tumor microenvironment, expression of GRP78 in these cells becomes higher than the native cells, which makes it a suitable candidate for cancer targeting. Suppression of survival signals by antibody production against C-terminal domain of GR78 (CGRP) can induce apoptosis of cancer cells. The aim of this study was in silico analysis, recombinant production, and characterization of CGRP in Escherichia coli. Structural prediction of CGRP by bioinformatics tools was done and the construct containing optimized sequence was transferred to E. coli T7 shuffle. Expression was induced by isopropyl-β-d-thiogalactoside, and recombinant protein was purified by Ni-NTA agarose resin. The content of secondary structures was obtained by circular dichroism (CD) spectrum. CGRP immunogenicity was evaluated from the immunized mouse sera. SDS-PAGE analysis showed CGRP expression in E. coli. CD spectrum also confirmed prediction of structures by bioinformatics tools. The enzyme-linked immunosorbent assay using sera from immunized mice revealed CGRP as a good immunogen. The results obtained in this study showed that the structure of truncated CGRP is very similar to its structure in the whole protein context. This protein can be used in cancer researches. © 2015 International Union of Biochemistry and Molecular Biology, Inc.

  6. Great interactions: How binding incorrect partners can teach us about protein recognition and function.

    PubMed

    Vamparys, Lydie; Laurent, Benoist; Carbone, Alessandra; Sacquin-Mora, Sophie

    2016-10-01

    Protein-protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross-docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross-docking predictions using the area under the specificity-sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding-site predictions resulting from the cross-docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408-1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  7. Bioinformatics programs are 31-fold over-represented among the highest impact scientific papers of the past two decades.

    PubMed

    Wren, Jonathan D

    2016-09-01

    To analyze the relative proportion of bioinformatics papers and their non-bioinformatics counterparts in the top 20 most cited papers annually for the past two decades. When defining bioinformatics papers as encompassing both those that provide software for data analysis or methods underlying data analysis software, we find that over the past two decades, more than a third (34%) of the most cited papers in science were bioinformatics papers, which is approximately a 31-fold enrichment relative to the total number of bioinformatics papers published. More than half of the most cited papers during this span were bioinformatics papers. Yet, the average 5-year JIF of top 20 bioinformatics papers was 7.7, whereas the average JIF for top 20 non-bioinformatics papers was 25.8, significantly higher (P < 4.5 × 10(-29)). The 20-year trend in the average JIF between the two groups suggests the gap does not appear to be significantly narrowing. For a sampling of the journals producing top papers, bioinformatics journals tended to have higher Gini coefficients, suggesting that development of novel bioinformatics resources may be somewhat 'hit or miss'. That is, relative to other fields, bioinformatics produces some programs that are extremely widely adopted and cited, yet there are fewer of intermediate success. jdwren@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    PubMed

    Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

    2017-01-01

    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.

  9. Personalized medicine: challenges and opportunities for translational bioinformatics

    PubMed Central

    Overby, Casey Lynnette; Tarczy-Hornoch, Peter

    2013-01-01

    Personalized medicine can be defined broadly as a model of healthcare that is predictive, personalized, preventive and participatory. Two US President’s Council of Advisors on Science and Technology reports illustrate challenges in personalized medicine (in a 2008 report) and in use of health information technology (in a 2010 report). Translational bioinformatics is a field that can help address these challenges and is defined by the American Medical Informatics Association as “the development of storage, analytic and interpretive methods to optimize the transformation of increasing voluminous biomedical data into proactive, predictive, preventative and participatory health.” This article discusses barriers to implementing genomics applications and current progress toward overcoming barriers, describes lessons learned from early experiences of institutions engaged in personalized medicine and provides example areas for translational bioinformatics research inquiry. PMID:24039624

  10. Harnessing pain heterogeneity and RNA transcriptome to identify blood-based pain biomarkers: a novel correlational study design and bioinformatics approach in a graded chronic constriction injury model.

    PubMed

    Grace, Peter M; Hurley, Daniel; Barratt, Daniel T; Tsykin, Anna; Watkins, Linda R; Rolan, Paul E; Hutchinson, Mark R

    2012-09-01

    A quantitative, peripherally accessible biomarker for neuropathic pain has great potential to improve clinical outcomes. Based on the premise that peripheral and central immunity contribute to neuropathic pain mechanisms, we hypothesized that biomarkers could be identified from the whole blood of adult male rats, by integrating graded chronic constriction injury (CCI), ipsilateral lumbar dorsal quadrant (iLDQ) and whole blood transcriptomes, and pathway analysis with pain behavior. Correlational bioinformatics identified a range of putative biomarker genes for allodynia intensity, many encoding for proteins with a recognized role in immune/nociceptive mechanisms. A selection of these genes was validated in a separate replication study. Pathway analysis of the iLDQ transcriptome identified Fcγ and Fcε signaling pathways, among others. This study is the first to employ the whole blood transcriptome to identify pain biomarker panels. The novel correlational bioinformatics, developed here, selected such putative biomarkers based on a correlation with pain behavior and formation of signaling pathways with iLDQ genes. Future studies may demonstrate the predictive ability of these biomarker genes across other models and additional variables. © 2012 The Authors. Journal of Neurochemistry © 2012 International Society for Neurochemistry.

  11. Application of bioinformatics tools and databases in microbial dehalogenation research (a review).

    PubMed

    Satpathy, R; Konkimalla, V B; Ratha, J

    2015-01-01

    Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.

  12. Confirmation of a new conserved linear epitope of Lyssavirus nucleoprotein.

    PubMed

    Xinjun, Lv; Xuejun, Ma; Lihua, Wang; Hao, Li; Xinxin, Shen; Pengcheng, Yu; Qing, Tang; Guodong, Liang

    2012-05-01

    Bioinformatics analysis was used to predict potential epitopes of Lyssavirus nucleoprotein and highlighted some distinct differences in the quantity and localization of the epitopes disclosed by epitope analysis of monoclonal antibodies against Lyssavirus nucleoprotein. Bioinformatics analysis showed that the domain containing residues 152-164 of Lyssavirus nucleoprotein was a conserved linear epitope that had not been reported previously. Immunization of two rabbits with the corresponding synthetic peptide conjugated to the Keyhole Limpe hemocyanin (KLH) macromolecule resulted in a titer of anti-peptide antibody above 1:200,000 in rabbit sera as detected by indirect enzyme-linked immunosorbent assay (ELISA). Western blot analysis demonstrated that the anti-peptide antibody recognized denatured Lyssavirus nucleoprotein in sodium dodecylsulfonate-polyacrylate gel electrophoresis (SDS-PAGE). Affinity chromatography purification and FITC-labeling of the anti-peptide antibody in rabbit sera was performed. FITC-labeled anti-peptide antibody could recognize Lyssavirus nucleoprotein in BSR cells and canine brain tissues even at a 1:200 dilution. Residues 152-164 of Lyssavirus nucleoprotein were verified as a conserved linear epitope in Lyssavirus. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. In Silico Analysis for the Study of Botulinum Toxin Structure

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2010-01-01

    Protein-protein interactions play many important roles in biological function. Knowledge of protein-protein complex structure is required for understanding the function. The determination of protein-protein complex structure by experimental studies remains difficult, therefore computational prediction of protein structures by structure modeling and docking studies is valuable method. In addition, MD simulation is also one of the most popular methods for protein structure modeling and characteristics. Here, we attempt to predict protein-protein complex structure and property using some of bioinformatic methods, and we focus botulinum toxin complex as target structure.

  14. Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome.

    PubMed

    Li, Fuyi; Li, Chen; Marquez-Lago, Tatiana T; Leier, André; Akutsu, Tatsuya; Purcell, Anthony W; Smith, A Ian; Lithgow, Trevor; Daly, Roger J; Song, Jiangning; Chou, Kuo-Chen

    2018-06-27

    Kinase-regulated phosphorylation is a ubiquitous type of post-translational modification (PTM) in both eukaryotic and prokaryotic cells. Phosphorylation plays fundamental roles in many signalling pathways and biological processes, such as protein degradation and protein-protein interactions. Experimental studies have revealed that signalling defects caused by aberrant phosphorylation are highly associated with a variety of human diseases, especially cancers. In light of this, a number of computational methods aiming to accurately predict protein kinase family-specific or kinase-specific phosphorylation sites have been established, thereby facilitating phosphoproteomic data analysis. In this work, we present Quokka, a novel bioinformatics tool that allows users to rapidly and accurately identify human kinase family-regulated phosphorylation sites. Quokka was developed by using a variety of sequence scoring functions combined with an optimized logistic regression algorithm. We evaluated Quokka based on well-prepared up-to-date benchmark and independent test datasets, curated from the Phospho.ELM and UniProt databases, respectively. The independent test demonstrates that Quokka improves the prediction performance compared with state-of-the-art computational tools for phosphorylation prediction. In summary, our tool provides users with high-quality predicted human phosphorylation sites for hypothesis generation and biological validation. The Quokka webserver and datasets are freely available at http://quokka.erc.monash.edu/. Supplementary data are available at Bioinformatics online.

  15. MetCCS predictor: a web server for predicting collision cross-section values of metabolites in ion mobility-mass spectrometry based metabolomics.

    PubMed

    Zhou, Zhiwei; Xiong, Xin; Zhu, Zheng-Jiang

    2017-07-15

    In metabolomics, rigorous structural identification of metabolites presents a challenge for bioinformatics. The use of collision cross-section (CCS) values of metabolites derived from ion mobility-mass spectrometry effectively increases the confidence of metabolite identification, but this technique suffers from the limit number of available CCS values. Currently, there is no software available for rapidly generating the metabolites' CCS values. Here, we developed the first web server, namely, MetCCS Predictor, for predicting CCS values. It can predict the CCS values of metabolites using molecular descriptors within a few seconds. Common users with limited background on bioinformatics can benefit from this software and effectively improve the metabolite identification in metabolomics. The web server is freely available at: http://www.metabolomics-shanghai.org/MetCCS/ . jiangzhu@sioc.ac.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  16. Bioinformatics/biostatistics: microarray analysis.

    PubMed

    Eichler, Gabriel S

    2012-01-01

    The quantity and complexity of the molecular-level data generated in both research and clinical settings require the use of sophisticated, powerful computational interpretation techniques. It is for this reason that bioinformatic analysis of complex molecular profiling data has become a fundamental technology in the development of personalized medicine. This chapter provides a high-level overview of the field of bioinformatics and outlines several, classic bioinformatic approaches. The highlighted approaches can be aptly applied to nearly any sort of high-dimensional genomic, proteomic, or metabolomic experiments. Reviewed technologies in this chapter include traditional clustering analysis, the Gene Expression Dynamics Inspector (GEDI), GoMiner (GoMiner), Gene Set Enrichment Analysis (GSEA), and the Learner of Functional Enrichment (LeFE).

  17. A rapid and accurate approach for prediction of interactomes from co-elution data (PrInCE).

    PubMed

    Stacey, R Greg; Skinnider, Michael A; Scott, Nichollas E; Foster, Leonard J

    2017-10-23

    An organism's protein interactome, or complete network of protein-protein interactions, defines the protein complexes that drive cellular processes. Techniques for studying protein complexes have traditionally applied targeted strategies such as yeast two-hybrid or affinity purification-mass spectrometry to assess protein interactions. However, given the vast number of protein complexes, more scalable methods are necessary to accelerate interaction discovery and to construct whole interactomes. We recently developed a complementary technique based on the use of protein correlation profiling (PCP) and stable isotope labeling in amino acids in cell culture (SILAC) to assess chromatographic co-elution as evidence of interacting proteins. Importantly, PCP-SILAC is also capable of measuring protein interactions simultaneously under multiple biological conditions, allowing the detection of treatment-specific changes to an interactome. Given the uniqueness and high dimensionality of co-elution data, new tools are needed to compare protein elution profiles, control false discovery rates, and construct an accurate interactome. Here we describe a freely available bioinformatics pipeline, PrInCE, for the analysis of co-elution data. PrInCE is a modular, open-source library that is computationally inexpensive, able to use label and label-free data, and capable of detecting tens of thousands of protein-protein interactions. Using a machine learning approach, PrInCE offers greatly reduced run time, more predicted interactions at the same stringency, prediction of protein complexes, and greater ease of use over previous bioinformatics tools for co-elution data. PrInCE is implemented in Matlab (version R2017a). Source code and standalone executable programs for Windows and Mac OSX are available at https://github.com/fosterlab/PrInCE , where usage instructions can be found. An example dataset and output are also provided for testing purposes. PrInCE is the first fast and easy-to-use data analysis pipeline that predicts interactomes and protein complexes from co-elution data. PrInCE allows researchers without bioinformatics expertise to analyze high-throughput co-elution datasets.

  18. Meta-learning framework applied in bioinformatics inference system design.

    PubMed

    Arredondo, Tomás; Ormazábal, Wladimir

    2015-01-01

    This paper describes a meta-learner inference system development framework which is applied and tested in the implementation of bioinformatic inference systems. These inference systems are used for the systematic classification of the best candidates for inclusion in bacterial metabolic pathway maps. This meta-learner-based approach utilises a workflow where the user provides feedback with final classification decisions which are stored in conjunction with analysed genetic sequences for periodic inference system training. The inference systems were trained and tested with three different data sets related to the bacterial degradation of aromatic compounds. The analysis of the meta-learner-based framework involved contrasting several different optimisation methods with various different parameters. The obtained inference systems were also contrasted with other standard classification methods with accurate prediction capabilities observed.

  19. PredictProtein—an open resource for online prediction of protein structural and functional features

    PubMed Central

    Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

    2014-01-01

    PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431

  20. TAPIR, a web server for the prediction of plant microRNA targets, including target mimics.

    PubMed

    Bonnet, Eric; He, Ying; Billiau, Kenny; Van de Peer, Yves

    2010-06-15

    We present a new web server called TAPIR, designed for the prediction of plant microRNA targets. The server offers the possibility to search for plant miRNA targets using a fast and a precise algorithm. The precise option is much slower but guarantees to find less perfectly paired miRNA-target duplexes. Furthermore, the precise option allows the prediction of target mimics, which are characterized by a miRNA-target duplex having a large loop, making them undetectable by traditional tools. The TAPIR web server can be accessed at: http://bioinformatics.psb.ugent.be/webtools/tapir. Supplementary data are available at Bioinformatics online.

  1. Some of the most interesting CASP11 targets through the eyes of their authors.

    PubMed

    Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K; Edwards, Robert A; Fass, Deborah; Hartmann, Marcus D; Korycinski, Mateusz; Lewis, Richard J; Lorimer, Donald; Lupas, Andrei N; Newman, Janet; Peat, Thomas S; Piepenbrink, Kurt H; Prahlad, Janani; van Raaij, Mark J; Rohwer, Forest; Segall, Anca M; Seguritan, Victor; Sundberg, Eric J; Singh, Abhimanyu K; Wilson, Mark A; Schwede, Torsten

    2016-09-01

    The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34-50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.

  2. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    ERIC Educational Resources Information Center

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  3. Identification of legionella effectors using bioinformatic approaches.

    PubMed

    Segal, Gil

    2013-01-01

    Legionella pneumophila the causative agent of Legionnaires' disease, actively manipulates host cell processes to establish a replication niche inside host cells. The establishment of its replication niche requires a functional Icm/Dot type IV secretion system which translocates about 300 effector proteins into host cells during infection. Many of these effectors were first identified as effector candidates by several bioinformatic approaches, and these predicted effectors were later examined experimentally for translocation and a large number of which were validated as effector proteins. Here, I summarized the bioinformatic approaches that were used to identify these effectors.

  4. A Web-based assessment of bioinformatics end-user support services at US universities.

    PubMed

    Messersmith, Donna J; Benson, Dennis A; Geer, Renata C

    2006-07-01

    This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group.

  5. Application of machine learning methods in bioinformatics

    NASA Astrophysics Data System (ADS)

    Yang, Haoyu; An, Zheng; Zhou, Haotian; Hou, Yawen

    2018-05-01

    Faced with the development of bioinformatics, high-throughput genomic technology have enabled biology to enter the era of big data. [1] Bioinformatics is an interdisciplinary, including the acquisition, management, analysis, interpretation and application of biological information, etc. It derives from the Human Genome Project. The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets.[2]. This paper analyzes and compares various algorithms of machine learning and their applications in bioinformatics.

  6. 5th HUPO BPP Bioinformatics Meeting at the European Bioinformatics Institute in Hinxton, UK--Setting the analysis frame.

    PubMed

    Stephan, Christian; Hamacher, Michael; Blüggel, Martin; Körting, Gerhard; Chamrad, Daniel; Scheer, Christian; Marcus, Katrin; Reidegeld, Kai A; Lohaus, Christiane; Schäfer, Heike; Martens, Lennart; Jones, Philip; Müller, Michael; Auyeung, Kevin; Taylor, Chris; Binz, Pierre-Alain; Thiele, Herbert; Parkinson, David; Meyer, Helmut E; Apweiler, Rolf

    2005-09-01

    The Bioinformatics Committee of the HUPO Brain Proteome Project (HUPO BPP) meets regularly to execute the post-lab analyses of the data produced in the HUPO BPP pilot studies. On July 7, 2005 the members came together for the 5th time at the European Bioinformatics Institute (EBI) in Hinxton, UK, hosted by Rolf Apweiler. As a main result, the parameter set of the semi-automated data re-analysis of MS/MS spectra has been elaborated and the subsequent work steps have been defined.

  7. A class-based link prediction using Distance Dependent Chinese Restaurant Process

    NASA Astrophysics Data System (ADS)

    Andalib, Azam; Babamir, Seyed Morteza

    2016-08-01

    One of the important tasks in relational data analysis is link prediction which has been successfully applied on many applications such as bioinformatics, information retrieval, etc. The link prediction is defined as predicting the existence or absence of edges between nodes of a network. In this paper, we propose a novel method for link prediction based on Distance Dependent Chinese Restaurant Process (DDCRP) model which enables us to utilize the information of the topological structure of the network such as shortest path and connectivity of the nodes. We also propose a new Gibbs sampling algorithm for computing the posterior distribution of the hidden variables based on the training data. Experimental results on three real-world datasets show the superiority of the proposed method over other probabilistic models for link prediction problem.

  8. Bringing Web 2.0 to bioinformatics.

    PubMed

    Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.

  9. Analysis of mass spectrometry data from the secretome of an explant model of articular cartilage exposed to pro-inflammatory and anti-inflammatory stimuli using machine learning

    PubMed Central

    2013-01-01

    Background Osteoarthritis (OA) is an inflammatory disease of synovial joints involving the loss and degeneration of articular cartilage. The gold standard for evaluating cartilage loss in OA is the measurement of joint space width on standard radiographs. However, in most cases the diagnosis is made well after the onset of the disease, when the symptoms are well established. Identification of early biomarkers of OA can facilitate earlier diagnosis, improve disease monitoring and predict responses to therapeutic interventions. Methods This study describes the bioinformatic analysis of data generated from high throughput proteomics for identification of potential biomarkers of OA. The mass spectrometry data was generated using a canine explant model of articular cartilage treated with the pro-inflammatory cytokine interleukin 1 β (IL-1β). The bioinformatics analysis involved the application of machine learning and network analysis to the proteomic mass spectrometry data. A rule based machine learning technique, BioHEL, was used to create a model that classified the samples into their relevant treatment groups by identifying those proteins that separated samples into their respective groups. The proteins identified were considered to be potential biomarkers. Protein networks were also generated; from these networks, proteins pivotal to the classification were identified. Results BioHEL correctly classified eighteen out of twenty-three samples, giving a classification accuracy of 78.3% for the dataset. The dataset included the four classes of control, IL-1β, carprofen, and IL-1β and carprofen together. This exceeded the other machine learners that were used for a comparison, on the same dataset, with the exception of another rule-based method, JRip, which performed equally well. The proteins that were most frequently used in rules generated by BioHEL were found to include a number of relevant proteins including matrix metalloproteinase 3, interleukin 8 and matrix gla protein. Conclusions Using this protocol, combining an in vitro model of OA with bioinformatics analysis, a number of relevant extracellular matrix proteins were identified, thereby supporting the application of these bioinformatics tools for analysis of proteomic data from in vitro models of cartilage degradation. PMID:24330474

  10. Screening circular RNA related to chemotherapeutic resistance in breast cancer.

    PubMed

    Gao, Danfeng; Zhang, Xiufen; Liu, Beibei; Meng, Dong; Fang, Kai; Guo, Zijian; Li, Lihua

    2017-09-01

    We aimed to identify circular RNAs (circRNAs) associated with breast cancer chemoresistance. CircRNA microarray expression profiles were obtained from Adriamycin (ADM) resistant MCF-7 breast cancer cells (MCF-7/ADM) and parental MCF-7 cells and were validated using quantitative real-time reverse transcription PCR. The expression data were analyzed bioinformatically. We detected 3093 circRNAs and identified 18 circRNAs that are differentially expressed between MCF-7/ADM and MCF-7 cells; after validating by quantitative real-time reverse transcription PCR, we predicted the possible miRNAs and potential target genes of the seven upregulated circRNAs using TargetScan and miRanda. The bioinformatics analysis revealed several target genes related to cancer-related signaling pathways. Additionally, we discovered a regulatory role of the circ_0006528-miR-7-5p-Raf1 axis in ADM-resistant breast cancer. These results revealed that circRNAs may play a role in breast cancer chemoresistance and that hsa_circ_0006528 might be a promising candidate for further functional analysis.

  11. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  12. Xuefu zhuyu decoction improves cognitive impairment in experimental traumatic brain injury via synaptic regulation

    PubMed Central

    Zhou, Jing; Liu, Tao; Cui, Hanjin; Fan, Rong; Zhang, Chunhu; Peng, Weijun; Yang, Ali; Zhu, Lin; Wang, Yang; Tang, Tao

    2017-01-01

    An overarching consequence of traumatic brain injury (TBI) is the cognitive impairment. It may hinder individual performance of daily tasks and determine people's subjective well-being. The damage to synaptic plasticity, one of the key mechanisms of cognitive dysfunction, becomes the potential therapeutic strategy of TBI. In this study, we aimed to investigate whether Xuefu Zhuyu Decoction (XFZYD), a traditional Chinese medicine, provided a synaptic regulation to improve cognitive disorder following TBI. Morris water maze and modified neurological severity scores were performed to assess the neurological and cognitive abilities. The PubChem Compound IDs of the major compounds of XFZYD were submitted into BATMAN-TCM, an online bioinformatics analysis tool, to predict the druggable targets related to synaptic function. Furthermore, we validated the prediction through immunohistochemical, RT-PCR and western blot analyses. We found that XFZYD enhanced neuroprotection, simultaneously improved learning and memory performances in controlled cortical impact rats. Bioinformatics analysis revealed that the improvements of XFZYD implied the Long-term potentiation relative proteins including NMDAR1, CaMKII and GAP-43. The further confirmation of molecular biological studies confirmed that XFZYD upregulated the mRNA and protein levels of NMDAR1, CaMKII and GAP-43. Pharmacological synaptic regulation of XFZYD could provide a novel therapeutic strategy for cognitive impairment following TBI. PMID:29069769

  13. Nesprin-2 epsilon: A novel nesprin isoform expressed in human ovary and Ntera-2 cells

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lam, Le Thanh; Boehm, Sabrina V.; Roberts, Roland G.

    2011-08-26

    Highlights: {yields} A novel epsilon isoform of nesprin-2 has been discovered. {yields} This 120 kDa protein was predicted by bioinformatic analysis, but has not previously been observed. {yields} It is the main isoform expressed in a teratocarcinoma cell line and is also found in ovary. {yields} Like other nesprins, it is located at the nuclear envelope. {yields} We suggest it may have a role in very early development or in some ovary-specific function. -- Abstract: The nuclear envelope-associated cytoskeletal protein, nesprin-2, is encoded by a large gene containing several internal promoters that produce shorter isoforms. In a study of Ntera-2more » teratocarcinoma cells, a novel isoform, nesprin-2-epsilon, was found to be the major mRNA and protein product of the nesprin-2 gene. Its existence was predicted by bioinformatic analysis, but this is the first direct demonstration of both the mRNA and the 120 kDa protein which is located at the nuclear envelope. In a panel of 21 adult and foetal human tissues, the nesprin-2-epsilon mRNA was strongly expressed in ovary but was a minor isoform elsewhere. The expression pattern suggests a possible link with very early development and a likely physiological role in ovary.« less

  14. Metabolomics, Standards, and Metabolic Modeling for Synthetic Biology in Plants

    PubMed Central

    Hill, Camilla Beate; Czauderna, Tobias; Klapperstück, Matthias; Roessner, Ute; Schreiber, Falk

    2015-01-01

    Life on earth depends on dynamic chemical transformations that enable cellular functions, including electron transfer reactions, as well as synthesis and degradation of biomolecules. Biochemical reactions are coordinated in metabolic pathways that interact in a complex way to allow adequate regulation. Biotechnology, food, biofuel, agricultural, and pharmaceutical industries are highly interested in metabolic engineering as an enabling technology of synthetic biology to exploit cells for the controlled production of metabolites of interest. These approaches have only recently been extended to plants due to their greater metabolic complexity (such as primary and secondary metabolism) and highly compartmentalized cellular structures and functions (including plant-specific organelles) compared with bacteria and other microorganisms. Technological advances in analytical instrumentation in combination with advances in data analysis and modeling have opened up new approaches to engineer plant metabolic pathways and allow the impact of modifications to be predicted more accurately. In this article, we review challenges in the integration and analysis of large-scale metabolic data, present an overview of current bioinformatics methods for the modeling and visualization of metabolic networks, and discuss approaches for interfacing bioinformatics approaches with metabolic models of cellular processes and flux distributions in order to predict phenotypes derived from specific genetic modifications or subjected to different environmental conditions. PMID:26557642

  15. 2010 Translational bioinformatics year in review

    PubMed Central

    Miller, Katharine S

    2011-01-01

    A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future. PMID:21672905

  16. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods.

    PubMed

    Liu, Bin; Wu, Hao; Zhang, Deyuan; Wang, Xiaolong; Chou, Kuo-Chen

    2017-02-21

    To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.

  17. In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    PubMed Central

    Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

    2008-01-01

    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319

  18. Characterization and prediction of residues determining protein functional specificity.

    PubMed

    Capra, John A; Singh, Mona

    2008-07-01

    Within a homologous protein family, proteins may be grouped into subtypes that share specific functions that are not common to the entire family. Often, the amino acids present in a small number of sequence positions determine each protein's particular functional specificity. Knowledge of these specificity determining positions (SDPs) aids in protein function prediction, drug design and experimental analysis. A number of sequence-based computational methods have been introduced for identifying SDPs; however, their further development and evaluation have been hindered by the limited number of known experimentally determined SDPs. We combine several bioinformatics resources to automate a process, typically undertaken manually, to build a dataset of SDPs. The resulting large dataset, which consists of SDPs in enzymes, enables us to characterize SDPs in terms of their physicochemical and evolutionary properties. It also facilitates the large-scale evaluation of sequence-based SDP prediction methods. We present a simple sequence-based SDP prediction method, GroupSim, and show that, surprisingly, it is competitive with a representative set of current methods. We also describe ConsWin, a heuristic that considers sequence conservation of neighboring amino acids, and demonstrate that it improves the performance of all methods tested on our large dataset of enzyme SDPs. Datasets and GroupSim code are available online at http://compbio.cs.princeton.edu/specificity/. Supplementary data are available at Bioinformatics online.

  19. Indentification and Analysis of Occludin Phosphosites: A Combined Mass Spectroscoy and Bioinformatics Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sundstrom, J.; Tash, B; Murakami, T

    2009-01-01

    The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.

  20. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  1. Progress and challenges in bioinformatics approaches for enhancer identification

    PubMed Central

    Kleftogiannis, Dimitrios; Kalnis, Panos

    2016-01-01

    Enhancers are cis-acting DNA elements that play critical roles in distal regulation of gene expression. Identifying enhancers is an important step for understanding distinct gene expression programs that may reflect normal and pathogenic cellular conditions. Experimental identification of enhancers is constrained by the set of conditions used in the experiment. This requires multiple experiments to identify enhancers, as they can be active under specific cellular conditions but not in different cell types/tissues or cellular states. This has opened prospects for computational prediction methods that can be used for high-throughput identification of putative enhancers to complement experimental approaches. Potential functions and properties of predicted enhancers have been catalogued and summarized in several enhancer-oriented databases. Because the current methods for the computational prediction of enhancers produce significantly different enhancer predictions, it will be beneficial for the research community to have an overview of the strategies and solutions developed in this field. In this review, we focus on the identification and analysis of enhancers by bioinformatics approaches. First, we describe a general framework for computational identification of enhancers, present relevant data types and discuss possible computational solutions. Next, we cover over 30 existing computational enhancer identification methods that were developed since 2000. Our review highlights advantages, limitations and potentials, while suggesting pragmatic guidelines for development of more efficient computational enhancer prediction methods. Finally, we discuss challenges and open problems of this topic, which require further consideration. PMID:26634919

  2. Nano-LC-ESI MS/MS analysis of proteins in dried sea dragon Solenognathus hardwickii and bioinformatic analysis of its protein expression profiling.

    PubMed

    Zhang, Dong-Mei; Feng, Li-Xing; Li, Lu; Liu, Miao; Jiang, Bao-Hong; Yang, Min; Li, Guo-Qiang; Wu, Wan-Ying; Guo, De-An; Liu, Xuan

    2016-09-01

    The sea dragon Solenognathus hardwickii has long been used as a traditional Chinese medicine for the treatment of various diseases, such as male impotency. To gain a comprehensive insight into the protein components of the sea dragon, shotgun proteomic analysis of its protein expression profiling was conducted in the present study. Proteins were extracted from dried sea dragon using a trichloroacetic acid/acetone precipitation method and then separated by SDS-PAGE. The protein bands were cut from the gel and digested by trypsin to generate peptide mixture. The peptide fragments were then analyzed using nano liquid chromatography tandem mass spectrometry (nano-LC-ESI MS/MS). 810 proteins and 1 577 peptides were identified in the dried sea dragon. The identified proteins exhibited molecular weight values ranging from 1 900 to 3 516 900 Da and pI values from 3.8 to 12.18. Bioinformatic analysis was conducted using the DAVID Bioinformatics Resources 6.7 Gene Ontology (GO) analysis tool to explore possible functions of the identified proteins. Ascribed functions of the proteins mainly included intracellular non-membrane-bound organelle, non-membrane-bounded organelle, cytoskeleton, structural molecule activity, calcium ion binding and etc. Furthermore, possible signal networks of the identified proteins were predicted using STRING (Search Tool for the Retrieval of Interacting Genes) database. Ribosomal protein synthesis was found to play an important role in the signal network. The results of this study, to best of our knowledge, were the first to provide a reference proteome profile for the sea dragon, and would aid in the understanding of the expression and functions of the identified proteins. Copyright © 2016 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.

  3. Promoting synergistic research and education in genomics and bioinformatics.

    PubMed

    Yang, Jack Y; Yang, Mary Qu; Zhu, Mengxia Michelle; Arabnia, Hamid R; Deng, Youping

    2008-01-01

    Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology.High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine (http://www.isibm.org) and its official journals, the International Journal of Functional Informatics and Personalized Medicine (http://www.inderscience.com/ijfipm) and the International Journal of Computational Biology and Drug Design (http://www.inderscience.com/ijcbdd) in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and scientific achievements by bridging these two very important disciplines into an interactive and attractive forum. Keeping this objective in mind, Biocomp 2007 aims to promote interdisciplinary and multidisciplinary education and research. 25 high quality peer-reviewed papers were selected from 400+ submissions for this supplementary issue of BMC Genomics. Those papers contributed to a wide-range of important research fields including gene expression data analysis and applications, high-throughput genome mapping, sequence analysis, gene regulation, protein structure prediction, disease prediction by machine learning techniques, systems biology, database and biological software development. We always encourage participants submitting proposals for genomics sessions, special interest research sessions, workshops and tutorials to Professor Hamid R. Arabnia (hra@cs.uga.edu) in order to ensure that Biocomp continuously plays the leadership role in promoting inter/multidisciplinary research and education in the fields. Biocomp received top conference ranking with a high score of 0.95/1.00. Biocomp is academically co-sponsored by the International Society of Intelligent Biological Medicine and the Research Laboratories and Centers of Harvard University--Massachusetts Institute of Technology, Indiana University--Purdue University, Georgia Tech--Emory University, UIUC, UCLA, Columbia University, University of Texas at Austin and University of Iowa etc. Biocomp--Worldcomp brings leading scientists together across the nation and all over the world and aims to promote synergistic components such as keynote lectures, special interest sessions, workshops and tutorials in response to the advances of cutting-edge research.

  4. The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2011-01-01

    Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.

  5. Preliminary Study of Bioinformatics Patents and Their Classifications Registered in the KIPRIS Database.

    PubMed

    Park, Hyun-Seok

    2012-12-01

    Whereas a vast amount of new information on bioinformatics is made available to the public through patents, only a small set of patents are cited in academic papers. A detailed analysis of registered bioinformatics patents, using the existing patent search system, can provide valuable information links between science and technology. However, it is extremely difficult to select keywords to capture bioinformatics patents, reflecting the convergence of several underlying technologies. No single word or even several words are sufficient to identify such patents. The analysis of patent subclasses can provide valuable information. In this paper, I did a preliminary study of the current status of bioinformatics patents and their International Patent Classification (IPC) groups registered in the Korea Intellectual Property Rights Information Service (KIPRIS) database.

  6. Development of Bioinformatics Infrastructure for Genomics Research.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Adebiyi, Marion; Adeyemi, Seun; Ahmed, Azza; Ahmed, Rehab; Akanle, Bola; Alibi, Mohamed; Armstrong, Don L; Aron, Shaun; Ashano, Efejiro; Baichoo, Shakuntala; Benkahla, Alia; Brown, David K; Chimusa, Emile R; Fadlelmola, Faisal M; Falola, Dare; Fatumo, Segun; Ghedira, Kais; Ghouila, Amel; Hazelhurst, Scott; Isewon, Itunuoluwa; Jung, Segun; Kassim, Samar Kamal; Kayondo, Jonathan K; Mbiyavanga, Mamana; Meintjes, Ayton; Mohammed, Somia; Mosaku, Abayomi; Moussa, Ahmed; Muhammd, Mustafa; Mungloo-Dilmohamud, Zahra; Nashiru, Oyekanmi; Odia, Trust; Okafor, Adaobi; Oladipo, Olaleye; Osamor, Victor; Oyelade, Jellili; Sadki, Khalid; Salifu, Samson Pandam; Soyemi, Jumoke; Panji, Sumir; Radouani, Fouzia; Souiai, Oussama; Tastan Bishop, Özlem

    2017-06-01

    Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa. Copyright © 2017 World Heart Federation (Geneva). Published by Elsevier B.V. All rights reserved.

  7. [Application of bioinformatics in researches of industrial biocatalysis].

    PubMed

    Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao

    2004-05-01

    Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial biocatalysis in response to the waves of genomic revolution. One of the key applications of bioinformatics in biocatalysis is the discovery and identification of the new biocatalyst through advanced DNA and protein sequence search, comparison and analyses in Internet database using different algorithm and software. The unknown genes of microbial enzymes can also be simply harvested by primer design on the basis of bioinformatics analyses. The other key applications of bioinformatics in biocatalysis are the modification and improvement of existing industrial biocatalyst. In this aspect, bioinformatics is of great importance in both rational design and directed evolution of microbial enzymes. Based on the successful prediction of tertiary structures of enzymes using the tool of bioinformatics, the undermentioned experiments, i.e. site-directed mutagenesis, fusion protein construction, DNA family shuffling and saturation mutagenesis, etc, are usually of very high efficiency. On all accounts, bioinformatics will be an essential tool for either biologist or biological engineer in the future researches of industrial biocatalysis, due to its significant function in guiding and quickening the step of discovery and/or improvement of novel biocatalysts.

  8. LDAP: a web server for lncRNA-disease association prediction.

    PubMed

    Lan, Wei; Li, Min; Zhao, Kaijie; Liu, Jin; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin

    2017-02-01

    Increasing evidences have demonstrated that long noncoding RNAs (lncRNAs) play important roles in many human diseases. Therefore, predicting novel lncRNA-disease associations would contribute to dissect the complex mechanisms of disease pathogenesis. Some computational methods have been developed to infer lncRNA-disease associations. However, most of these methods infer lncRNA-disease associations only based on single data resource. In this paper, we propose a new computational method to predict lncRNA-disease associations by integrating multiple biological data resources. Then, we implement this method as a web server for lncRNA-disease association prediction (LDAP). The input of the LDAP server is the lncRNA sequence. The LDAP predicts potential lncRNA-disease associations by using a bagging SVM classifier based on lncRNA similarity and disease similarity. The web server is available at http://bioinformatics.csu.edu.cn/ldap jxwang@mail.csu.edu.cn. Supplementary data are available at Bioinformatics online.

  9. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  10. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. © The Author 2014. Published by Oxford University Press.

  11. Identification of genetic variants predictive of early onset pancreatic cancer through a population science analysis of functional genomic datasets

    PubMed Central

    Chen, Jinyun; Wu, Xifeng; Huang, Yujing; Chen, Wei; Brand, Randall E.; Killary, Ann M.; Sen, Subrata; Frazier, Marsha L.

    2016-01-01

    Biomarkers are critically needed for the early detection of pancreatic cancer (PC) are urgently needed. Our purpose was to identify a panel of genetic variants that, combined, can predict increased risk for early-onset PC and thereby identify individuals who should begin screening at an early age. Previously, we identified genes using a functional genomic approach that were aberrantly expressed in early pathways to PC tumorigenesis. We now report the discovery of single nucleotide polymorphisms (SNPs) in these genes associated with early age at diagnosis of PC using a two-phase study design. In silico and bioinformatics tools were used to examine functional relevance of the identified SNPs. Eight SNPs were consistently associated with age at diagnosis in the discovery phase, validation phase and pooled analysis. Further analysis of the joint effects of these 8 SNPs showed that, compared to participants carrying none of these unfavorable genotypes (median age at PC diagnosis 70 years), those carrying 1–2, 3–4, or 5 or more unfavorable genotypes had median ages at diagnosis of 64, 63, and 62 years, respectively (P = 3.0E–04). A gene-dosage effect was observed, with age at diagnosis inversely related to number of unfavorable genotypes (Ptrend = 1.0E–04). Using bioinformatics tools, we found that all of the 8 SNPs were predicted to play functional roles in the disruption of transcription factor and/or enhancer binding sites and most of them were expression quantitative trait loci (eQTL) of the target genes. The panel of genetic markers identified may serve as susceptibility markers for earlier PC diagnosis. PMID:27486767

  12. Studying epigenetic DNA modifications in undergraduate laboratories using complementary bioinformatic and molecular approaches.

    PubMed

    Militello, Kevin T

    2013-01-01

    Epigenetic inheritance is the inheritance of genetic information that is not based on DNA sequence alone. One type of epigenetic information that has come to the forefront in the last few years is modified DNA bases. The most common modified DNA base in nature is 5-methylcytosine. Herein, we describe a laboratory experiment that combines bioinformatic and molecular approaches to study the presence and abundance of 5-methylcytosine in different organisms. Students were originally provided with the protein sequence of the Xenopus laevis DNMT1 cytosine-5 DNA methyltransferase and used BLASTP searches to detect the presence of protein orthologs in the genomes of several organisms including Homo sapiens, Mus musculus, Plasmodium falciparum, Drosophila melanogaster, Saccharomyces cerevisiae, Arabidopsis thaliana, and Caenorhabditis elegans. Students generated hypotheses regarding the presence and abundance of 5-methylcytosine in these organisms based on their bioinformatics data, and directly tested their predictions on a subset of DNAs using restriction enzyme isoschizomer assays. A southern blotting assay to answer the same question is also presented. In addition to exposure to the field of epigenetics, the strengths of the laboratory are students are able to make predictions using bioinformatic tools and quickly test them in the laboratory. In addition, students are exposed to two potential misinterpretations of bioinformatic search data. The laboratory is easily modified to incorporate outside research interests in epigenetics. © 2013 by The International Union of Biochemistry and Molecular Biology.

  13. Is there room for ethics within bioinformatics education?

    PubMed

    Taneri, Bahar

    2011-07-01

    When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.

  14. Edge Bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen ormore » co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  15. Bioinformatic analysis of Msx1 and Msx2 involved in craniofacial development.

    PubMed

    Dai, Jiewen; Mou, Zhifang; Shen, Shunyao; Dong, Yuefu; Yang, Tong; Shen, Steve Guofang

    2014-01-01

    Msx1 and Msx2 were revealed to be candidate genes for some craniofacial deformities, such as cleft lip with/without cleft palate (CL/P) and craniosynostosis. Many other genes were demonstrated to have a cross-talk with MSX genes in causing these defects. However, there is no systematic evaluation for these MSX gene-related factors. In this study, we performed systematic bioinformatic analysis for MSX genes by combining using GeneDecks, DAVID, and STRING database, and the results showed that there were numerous genes related to MSX genes, such as Irf6, TP63, Dlx2, Dlx5, Pax3, Pax9, Bmp4, Tgf-beta2, and Tgf-beta3 that have been demonstrated to be involved in CL/P, and Fgfr2, Fgfr1, Fgfr3, and Twist1 that were involved in craniosynostosis. Many of these genes could be enriched into different gene groups involved in different signaling ways, different craniofacial deformities, and different biological process. These findings could make us analyze the function of MSX gens in a gene network. In addition, our findings showed that Sumo, a novel gene whose polymorphisms were demonstrated to be associated with nonsyndromic CL/P by genome-wide association study, has protein-protein interaction with MSX1, which may offer us an alternative method to perform bioinformatic analysis for genes found by genome-wide association study and can make us predict the disrupted protein function due to the mutation in a gene DNA sequence. These findings may guide us to perform further functional studies in the future.

  16. AnaBench: a Web/CORBA-based workbench for biomolecular sequence analysis

    PubMed Central

    Badidi, Elarbi; De Sousa, Cristina; Lang, B Franz; Burger, Gertraud

    2003-01-01

    Background Sequence data analyses such as gene identification, structure modeling or phylogenetic tree inference involve a variety of bioinformatics software tools. Due to the heterogeneity of bioinformatics tools in usage and data requirements, scientists spend much effort on technical issues including data format, storage and management of input and output, and memorization of numerous parameters and multi-step analysis procedures. Results In this paper, we present the design and implementation of AnaBench, an interactive, Web-based bioinformatics Analysis workBench allowing streamlined data analysis. Our philosophy was to minimize the technical effort not only for the scientist who uses this environment to analyze data, but also for the administrator who manages and maintains the workbench. With new bioinformatics tools published daily, AnaBench permits easy incorporation of additional tools. This flexibility is achieved by employing a three-tier distributed architecture and recent technologies including CORBA middleware, Java, JDBC, and JSP. A CORBA server permits transparent access to a workbench management database, which stores information about the users, their data, as well as the description of all bioinformatics applications that can be launched from the workbench. Conclusion AnaBench is an efficient and intuitive interactive bioinformatics environment, which offers scientists application-driven, data-driven and protocol-driven analysis approaches. The prototype of AnaBench, managed by a team at the Université de Montréal, is accessible on-line at: . Please contact the authors for details about setting up a local-network AnaBench site elsewhere. PMID:14678565

  17. Green Fluorescent Protein-Focused Bioinformatics Laboratory Experiment Suitable for Undergraduates in Biochemistry Courses

    ERIC Educational Resources Information Center

    Rowe, Laura

    2017-01-01

    An introductory bioinformatics laboratory experiment focused on protein analysis has been developed that is suitable for undergraduate students in introductory biochemistry courses. The laboratory experiment is designed to be potentially used as a "stand-alone" activity in which students are introduced to basic bioinformatics tools and…

  18. Multifactorial Likelihood Assessment of BRCA1 and BRCA2 Missense Variants Confirms That BRCA1:c.122A>G(p.His41Arg) Is a Pathogenic Mutation

    PubMed Central

    Whiley, Phillip J.; Parsons, Michael T.; Leary, Jennifer; Tucker, Kathy; Warwick, Linda; Dopita, Belinda; Thorne, Heather; Lakhani, Sunil R.; Goldgar, David E.; Brown, Melissa A.; Spurdle, Amanda B.

    2014-01-01

    Rare exonic, non-truncating variants in known cancer susceptibility genes such as BRCA1 and BRCA2 are problematic for genetic counseling and clinical management of relevant families. This study used multifactorial likelihood analysis and/or bioinformatically-directed mRNA assays to assess pathogenicity of 19 BRCA1 or BRCA2 variants identified following patient referral to clinical genetic services. Two variants were considered to be pathogenic (Class 5). BRCA1:c.4484G> C(p.Arg1495Thr) was shown to result in aberrant mRNA transcripts predicted to encode truncated proteins. The BRCA1:c.122A>G(p.His41Arg) RING-domain variant was found from multifactorial likelihood analysis to have a posterior probability of pathogenicity of 0.995, a result consistent with existing protein functional assay data indicating lost BARD1 binding and ubiquitin ligase activity. Of the remaining variants, seven were determined to be not clinically significant (Class 1), nine were likely not pathogenic (Class 2), and one was uncertain (Class 3).These results have implications for genetic counseling and medical management of families carrying these specific variants. They also provide additional multifactorial likelihood variant classifications as reference to evaluate the sensitivity and specificity of bioinformatic prediction tools and/or functional assay data in future studies. PMID:24489791

  19. Bioinformatic identification and experimental validation of miRNAs from foxtail millet (Setaria italica).

    PubMed

    Han, Jun; Xie, Hao; Sun, Qingpeng; Wang, Jun; Lu, Min; Wang, Weixiang; Guo, Erhu; Pan, Jinbao

    2014-08-10

    MiRNAs are a novel group of non-coding small RNAs that negatively regulate gene expression. Many miRNAs have been identified and investigated extensively in plant species with sequenced genomes. However, few miRNAs have been identified in foxtail millet (Setaria italica), which is an ancient cereal crop of great importance for dry land agriculture. In this study, 271 foxtail millet miRNAs belonging to 44 families were identified using a bioinformatics approach. Twenty-three pairs of sense/antisense miRNAs belonging to 13 families, and 18 miRNA clusters containing members of 8 families were discovered in foxtail millet. We identified 432 potential targets for 38 miRNA families, most of which were predicted to be involved in plant development, signal transduction, metabolic pathways, disease resistance, and environmental stress responses. Gene ontology (GO) analysis revealed that 101, 56, and 23 target genes were involved in molecular functions, biological processes, and cellular components, respectively. We investigated the expression patterns of 43 selected miRNAs using qRT-PCR analysis. All of the miRNAs were expressed ubiquitously with many exhibiting different expression levels in different tissues. We validated five predicted targets of four miRNAs using the RNA ligase mediated rapid amplification of cDNA end (5'-RLM-RACE) method. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Bio-knowledge based filters improve residue-residue contact prediction accuracy.

    PubMed

    Wozniak, P P; Pelc, J; Skrzypecki, M; Vriend, G; Kotulska, M

    2018-05-29

    Residue-residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut-off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test-set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/FPfilter/. malgorzata.kotulska@pwr.edu.pl. Supplementary data are available at Bioinformatics online.

  1. Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium.

    PubMed

    Yan, Hong-Bin; Lou, Zhong-Zi; Li, Li; Brindley, Paul J; Zheng, Yadong; Luo, Xuenong; Hou, Junling; Guo, Aijiang; Jia, Wan-Zhong; Cai, Xuepeng

    2014-06-04

    Cysticercosis remains a major neglected tropical disease of humanity in many regions, especially in sub-Saharan Africa, Central America and elsewhere. Owing to the emerging drug resistance and the inability of current drugs to prevent re-infection, identification of novel vaccines and chemotherapeutic agents against Taenia solium and related helminth pathogens is a public health priority. The T. solium genome and the predicted proteome were reported recently, providing a wealth of information from which new interventional targets might be identified. In order to characterize and classify the entire repertoire of protease-encoding genes of T. solium, which act fundamental biological roles in all life processes, we analyzed the predicted proteins of this cestode through a combination of bioinformatics tools. Functional annotation was performed to yield insights into the signaling processes relevant to the complex developmental cycle of this tapeworm and to highlight a suite of the proteases as potential intervention targets. Within the genome of this helminth parasite, we identified 200 open reading frames encoding proteases from five clans, which correspond to 1.68% of the 11,902 protein-encoding genes predicted to be present in its genome. These proteases include calpains, cytosolic, mitochondrial signal peptidases, ubiquitylation related proteins, and others. Many not only show significant similarity to proteases in the Conserved Domain Database but have conserved active sites and catalytic domains. KEGG Automatic Annotation Server (KAAS) analysis indicated that ~60% of these proteases share strong sequence identities with proteins of the KEGG database, which are involved in human disease, metabolic pathways, genetic information processes, cellular processes, environmental information processes and organismal systems. Also, we identified signal peptides and transmembrane helices through comparative analysis with classes of important regulatory proteases. Phylogenetic analysis using Bayes approach provided support for inferring functional divergence among regulatory cysteine and serine proteases. Numerous putative proteases were identified for the first time in T. solium, and important regulatory proteases have been predicted. This comprehensive analysis not only complements the growing knowledge base of proteolytic enzymes, but also provides a platform from which to expand knowledge of cestode proteases and to explore their biochemistry and potential as intervention targets.

  2. Content of intrinsic disorder influences the outcome of cell-free protein synthesis.

    PubMed

    Tokmakov, Alexander A; Kurotani, Atsushi; Ikeda, Mariko; Terazawa, Yumiko; Shirouzu, Mikako; Stefanov, Vasily; Sakurai, Tetsuya; Yokoyama, Shigeyuki

    2015-09-11

    Cell-free protein synthesis is used to produce proteins with various structural traits. Recent bioinformatics analyses indicate that more than half of eukaryotic proteins possess long intrinsically disordered regions. However, no systematic study concerning the connection between intrinsic disorder and expression success of cell-free protein synthesis has been presented until now. To address this issue, we examined correlations of the experimentally observed cell-free protein expression yields with the contents of intrinsic disorder bioinformatically predicted in the expressed sequences. This analysis revealed strong relationships between intrinsic disorder and protein amenability to heterologous cell-free expression. On the one hand, elevated disorder content was associated with the increased ratio of soluble expression. On the other hand, overall propensity for detectable protein expression decreased with disorder content. We further demonstrated that these tendencies are rooted in some distinct features of intrinsically disordered regions, such as low hydrophobicity, elevated surface accessibility and high abundance of sequence motifs for proteolytic degradation, including sites of ubiquitination and PEST sequences. Our findings suggest that identification of intrinsically disordered regions in the expressed amino acid sequences can be of practical use for predicting expression success and optimizing cell-free protein synthesis.

  3. Contribution of bioinformatics prediction in microRNA-based cancer therapeutics.

    PubMed

    Banwait, Jasjit K; Bastola, Dhundy R

    2015-01-01

    Despite enormous efforts, cancer remains one of the most lethal diseases in the world. With the advancement of high throughput technologies massive amounts of cancer data can be accessed and analyzed. Bioinformatics provides a platform to assist biologists in developing minimally invasive biomarkers to detect cancer, and in designing effective personalized therapies to treat cancer patients. Still, the early diagnosis, prognosis, and treatment of cancer are an open challenge for the research community. MicroRNAs (miRNAs) are small non-coding RNAs that serve to regulate gene expression. The discovery of deregulated miRNAs in cancer cells and tissues has led many to investigate the use of miRNAs as potential biomarkers for early detection, and as a therapeutic agent to treat cancer. Here we describe advancements in computational approaches to predict miRNAs and their targets, and discuss the role of bioinformatics in studying miRNAs in the context of human cancer. Published by Elsevier B.V.

  4. Bioinformatics tools for the analysis of NMR metabolomics studies focused on the identification of clinically relevant biomarkers.

    PubMed

    Puchades-Carrasco, Leonor; Palomino-Schätzlein, Martina; Pérez-Rambla, Clara; Pineda-Lucena, Antonio

    2016-05-01

    Metabolomics, a systems biology approach focused on the global study of the metabolome, offers a tremendous potential in the analysis of clinical samples. Among other applications, metabolomics enables mapping of biochemical alterations involved in the pathogenesis of diseases, and offers the opportunity to noninvasively identify diagnostic, prognostic and predictive biomarkers that could translate into early therapeutic interventions. Particularly, metabolomics by Nuclear Magnetic Resonance (NMR) has the ability to simultaneously detect and structurally characterize an abundance of metabolic components, even when their identities are unknown. Analysis of the data generated using this experimental approach requires the application of statistical and bioinformatics tools for the correct interpretation of the results. This review focuses on the different steps involved in the metabolomics characterization of biofluids for clinical applications, ranging from the design of the study to the biological interpretation of the results. Particular emphasis is devoted to the specific procedures required for the processing and interpretation of NMR data with a focus on the identification of clinically relevant biomarkers. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. TnpPred: A Web Service for the Robust Prediction of Prokaryotic Transposases

    PubMed Central

    Riadi, Gonzalo; Medina-Moenne, Cristobal; Holmes, David S.

    2012-01-01

    Transposases (Tnps) are enzymes that participate in the movement of insertion sequences (ISs) within and between genomes. Genes that encode Tnps are amongst the most abundant and widely distributed genes in nature. However, they are difficult to predict bioinformatically and given the increasing availability of prokaryotic genomes and metagenomes, it is incumbent to develop rapid, high quality automatic annotation of ISs. This need prompted us to develop a web service, termed TnpPred for Tnp discovery. It provides better sensitivity and specificity for Tnp predictions than given by currently available programs as determined by ROC analysis. TnpPred should be useful for improving genome annotation. The TnpPred web service is freely available for noncommercial use. PMID:23251097

  6. Bioinformatic Analyses of Unique (Orphan) Core Genes of the Genus Acidithiobacillus: Functional Inferences and Use As Molecular Probes for Genomic and Metagenomic/Transcriptomic Interrogation

    PubMed Central

    González, Carolina; Lazcano, Marcelo; Valdés, Jorge; Holmes, David S.

    2016-01-01

    Using phylogenomic and gene compositional analyses, five highly conserved gene families have been detected in the core genome of the phylogenetically coherent genus Acidithiobacillus of the class Acidithiobacillia. These core gene families are absent in the closest extant genus Thermithiobacillus tepidarius that subtends the Acidithiobacillus genus and roots the deepest in this class. The predicted proteins encoded by these core gene families are not detected by a BLAST search in the NCBI non-redundant database of more than 90 million proteins using a relaxed cut-off of 1.0e−5. None of the five families has a clear functional prediction. However, bioinformatic scrutiny, using pI prediction, motif/domain searches, cellular location predictions, genomic context analyses, and chromosome topology studies together with previously published transcriptomic and proteomic data, suggests that some may have functions associated with membrane remodeling during cell division perhaps in response to pH stress. Despite the high level of amino acid sequence conservation within each family, there is sufficient nucleotide variation of the respective genes to permit the use of the DNA sequences to distinguish different species of Acidithiobacillus, making them useful additions to the armamentarium of tools for phylogenetic analysis. Since the protein families are unique to the Acidithiobacillus genus, they can also be leveraged as probes to detect the genus in environmental metagenomes and metatranscriptomes, including industrial biomining operations, and acid mine drainage (AMD). PMID:28082953

  7. Bioinformatic Analyses of Unique (Orphan) Core Genes of the Genus Acidithiobacillus: Functional Inferences and Use As Molecular Probes for Genomic and Metagenomic/Transcriptomic Interrogation.

    PubMed

    González, Carolina; Lazcano, Marcelo; Valdés, Jorge; Holmes, David S

    2016-01-01

    Using phylogenomic and gene compositional analyses, five highly conserved gene families have been detected in the core genome of the phylogenetically coherent genus Acidithiobacillus of the class Acidithiobacillia . These core gene families are absent in the closest extant genus Thermithiobacillus tepidarius that subtends the Acidithiobacillus genus and roots the deepest in this class. The predicted proteins encoded by these core gene families are not detected by a BLAST search in the NCBI non-redundant database of more than 90 million proteins using a relaxed cut-off of 1.0e -5 . None of the five families has a clear functional prediction. However, bioinformatic scrutiny, using pI prediction, motif/domain searches, cellular location predictions, genomic context analyses, and chromosome topology studies together with previously published transcriptomic and proteomic data, suggests that some may have functions associated with membrane remodeling during cell division perhaps in response to pH stress. Despite the high level of amino acid sequence conservation within each family, there is sufficient nucleotide variation of the respective genes to permit the use of the DNA sequences to distinguish different species of Acidithiobacillus , making them useful additions to the armamentarium of tools for phylogenetic analysis. Since the protein families are unique to the Acidithiobacillus genus, they can also be leveraged as probes to detect the genus in environmental metagenomes and metatranscriptomes, including industrial biomining operations, and acid mine drainage (AMD).

  8. First venom gland transcriptomic analysis of Iranian yellow scorpion "Odonthubuthus doriae" with some new findings.

    PubMed

    NaderiSoorki, Maryam; Galehdari, Hamid; Baradaran, Masomeh; Jalali, Amir

    2016-09-15

    Scorpion venom contains mixture of biologic molecules including selective toxins with medical capability. Odonthubuthus doriae (O. doriae) belonged to Buthidae family of scorpions and gained more interest among Iranian dangerous scorpion since 2005. We constructed the first cDNA library to explore the transcriptomic composition of this Iranian scorpiontelson. Then by used of bioinformatic software each expression sequence taq (EST) from the library analyzed and its quiddity was clear. Analysis showed that toxins (42%) had more venom transcript than other component such as antimicrobial peptides, venom peptides and cell proteins. Over 16% of transcripts didn't have any open reading frames (ORF), however their sequences showed similarity by other scorpion sequences. One EST didn't have any similarity by known scorpion peptides. For the first time; we report a comprehensive study of an Iranian scorpion with interesting and novel findings. We characterized a new putative sodium channel modifier in scorpions by some bioinformatics software, and then predicted its structure and function. Copyright © 2016. Published by Elsevier Ltd.

  9. Isothiocyanates from Broccolini seeds induce apoptosis in human colon cancer cells: proteomic and bioinformatic analyses.

    PubMed

    Yang, Yanjing; Yan, Huidan; Li, Yuqin; Yang, Shang-Tian; Zhang, Xuewu

    2011-05-01

    Isothiocyanates (ITCs) have been shown to possess antitumor activity in colon cancer, however, the detailed mechanism is still unclear. The objective of this study was to investigate apoptosis-inducing activity of ITCs from Broccolini seeds and proteomic changes in SW480 cells, and to identify the molecular pathways responsible for the anticancer action of ITCs. We found that ITCs induces SW480 cells apoptosis in a dose-dependent manner by using MTT assay, phase contrast microscope and flow cytometry, and the IC50 was calculated to be 77.72 microg/ml, superior to the chemotherapeutical drug 5-flurouracil. Subsequently, 15 altered proteins in ITCs treated SW480 cells were identified. Further bioinformatics analysis predicted the potential pathways for ITCs to induce apoptosis of SW480 cells. In conclusion, this is the first report to investigate anticancer activity of ITCs from Broccolini seeds and its mechanism of action by proteomics analysis. Our observations provide potential therapeutic targets for colon cancer inhibitor intervention and implicate the development of novel anti-cancer therapeutic strategies.

  10. Teaching bioinformatics and neuroinformatics by using free web-based tools.

    PubMed

    Grisham, William; Schottler, Natalie A; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson

    2010-01-01

    This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes-narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics.

  11. Analysis of requirements for teaching materials based on the course bioinformatics for plant metabolism

    NASA Astrophysics Data System (ADS)

    Balqis, Widodo, Lukiati, Betty; Amin, Mohamad

    2017-05-01

    A way to improve the quality of learning in the course of Plant Metabolism in the Department of Biology, State University of Malang, is to develop teaching materials. This research evaluates the needs of bioinformatics-based teaching material in the course Plant Metabolism by the Analyze, Design, Develop, Implement, and Evaluate (ADDIE) development model. Data were collected through questionnaires distributed to the students in the Plant Metabolism course of the Department of Biology, University of Malang, and analysis of the plan of lectures semester (RPS). Learning gains of this course show that it is not yet integrated into the field of bioinformatics. All respondents stated that plant metabolism books do not include bioinformatics and fail to explain the metabolism of a chemical compound of a local plant in Indonesia. Respondents thought that bioinformatics can explain examples and metabolism of a secondary metabolite analysis techniques and discuss potential medicinal compounds from local plants. As many as 65% of the respondents said that the existing metabolism book could not be used to understand secondary metabolism in lectures of plant metabolism. Therefore, the development of teaching materials including plant metabolism-based bioinformatics is important to improve the understanding of the lecture material in plant metabolism.

  12. Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives.

    PubMed

    Meinhardt, Sarah; Swint-Kruse, Liskin

    2008-12-01

    In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.

  13. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    PubMed Central

    2010-01-01

    Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976

  14. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

    PubMed

    Taylor, Ronald C

    2010-12-21

    Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms.

  15. A Bioinformatics Approach to the Structure, Function, and Evolution of the Nucleoprotein of the Order Mononegavirales

    PubMed Central

    Cleveland, Sean B.; Davies, John; McClure, Marcella A.

    2011-01-01

    The goal of this Bioinformatic study is to investigate sequence conservation in relation to evolutionary function/structure of the nucleoprotein of the order Mononegavirales. In the combined analysis of 63 representative nucleoprotein (N) sequences from four viral families (Bornaviridae, Filoviridae, Rhabdoviridae, and Paramyxoviridae) we predict the regions of protein disorder, intra-residue contact and co-evolving residues. Correlations between location and conservation of predicted regions illustrate a strong division between families while high- lighting conservation within individual families. These results suggest the conserved regions among the nucleoproteins, specifically within Rhabdoviridae and Paramyxoviradae, but also generally among all members of the order, reflect an evolutionary advantage in maintaining these sites for the viral nucleoprotein as part of the transcription/replication machinery. Results indicate conservation for disorder in the C-terminus region of the representative proteins that is important for interacting with the phosphoprotein and the large subunit polymerase during transcription and replication. Additionally, the C-terminus region of the protein preceding the disordered region, is predicted to be important for interacting with the encapsidated genome. Portions of the N-terminus are responsible for N∶N stability and interactions identified by the presence or lack of co-evolving intra-protein contact predictions. The validation of these prediction results by current structural information illustrates the benefits of the Disorder, Intra-residue contact and Compensatory mutation Correlator (DisICC) pipeline as a method for quickly characterizing proteins and providing the most likely residues and regions necessary to target for disruption in viruses that have little structural information available. PMID:21559282

  16. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  17. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  18. Bioinformatics.

    PubMed

    Moore, Jason H

    2007-11-01

    Bioinformatics is an interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences such as biochemistry, cell biology, developmental biology, genetics, genomics, and physiology. An important goal of bioinformatics is to facilitate the management, analysis, and interpretation of data from biological experiments and observational studies. The goal of this review is to introduce some of the important concepts in bioinformatics that must be considered when planning and executing a modern biological research study. We review database resources as well as data mining software tools.

  19. Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.

    PubMed

    Hanson, Jack; Yang, Yuedong; Paliwal, Kuldip; Zhou, Yaoqi

    2017-03-01

    Capturing long-range interactions between structural but not sequence neighbors of proteins is a long-standing challenging problem in bioinformatics. Recently, long short-term memory (LSTM) networks have significantly improved the accuracy of speech and image classification problems by remembering useful past information in long sequential events. Here, we have implemented deep bidirectional LSTM recurrent neural networks in the problem of protein intrinsic disorder prediction. The new method, named SPOT-Disorder, has steadily improved over a similar method using a traditional, window-based neural network (SPINE-D) in all datasets tested without separate training on short and long disordered regions. Independent tests on four other datasets including the datasets from critical assessment of structure prediction (CASP) techniques and >10 000 annotated proteins from MobiDB, confirmed SPOT-Disorder as one of the best methods in disorder prediction. Moreover, initial studies indicate that the method is more accurate in predicting functional sites in disordered regions. These results highlight the usefulness combining LSTM with deep bidirectional recurrent neural networks in capturing non-local, long-range interactions for bioinformatics applications. SPOT-disorder is available as a web server and as a standalone program at: http://sparks-lab.org/server/SPOT-disorder/index.php . j.hanson@griffith.edu.au or yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au. Supplementary data is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. RNA sequencing uncovers antisense RNAs and novel small RNAs in Streptococcus pyogenes.

    PubMed

    Le Rhun, Anaïs; Beer, Yan Yan; Reimegård, Johan; Chylinski, Krzysztof; Charpentier, Emmanuelle

    2016-01-01

    Streptococcus pyogenes is a human pathogen responsible for a wide spectrum of diseases ranging from mild to life-threatening infections. During the infectious process, the temporal and spatial expression of pathogenicity factors is tightly controlled by a complex network of protein and RNA regulators acting in response to various environmental signals. Here, we focus on the class of small RNA regulators (sRNAs) and present the first complete analysis of sRNA sequencing data in S. pyogenes. In the SF370 clinical isolate (M1 serotype), we identified 197 and 428 putative regulatory RNAs by visual inspection and bioinformatics screening of the sequencing data, respectively. Only 35 from the 197 candidates identified by visual screening were assigned a predicted function (T-boxes, ribosomal protein leaders, characterized riboswitches or sRNAs), indicating how little is known about sRNA regulation in S. pyogenes. By comparing our list of predicted sRNAs with previous S. pyogenes sRNA screens using bioinformatics or microarrays, 92 novel sRNAs were revealed, including antisense RNAs that are for the first time shown to be expressed in this pathogen. We experimentally validated the expression of 30 novel sRNAs and antisense RNAs. We show that the expression profile of 9 sRNAs including 2 predicted regulatory elements is affected by the endoribonucleases RNase III and/or RNase Y, highlighting the critical role of these enzymes in sRNA regulation.

  1. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  2. Bioinformatics clouds for big data manipulation.

    PubMed

    Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  3. Neonatal Informatics: Transforming Neonatal Care Through Translational Bioinformatics

    PubMed Central

    Palma, Jonathan P.; Benitz, William E.; Tarczy-Hornoch, Peter; Butte, Atul J.; Longhurst, Christopher A.

    2012-01-01

    The future of neonatal informatics will be driven by the availability of increasingly vast amounts of clinical and genetic data. The field of translational bioinformatics is concerned with linking and learning from these data and applying new findings to clinical care to transform the data into proactive, predictive, preventive, and participatory health. As a result of advances in translational informatics, the care of neonates will become more data driven, evidence based, and personalized. PMID:22924023

  4. eSBMTools 1.0: enhanced native structure-based modeling tools.

    PubMed

    Lutz, Benjamin; Sinner, Claude; Heuermann, Geertje; Verma, Abhinav; Schug, Alexander

    2013-11-01

    Molecular dynamics simulations provide detailed insights into the structure and function of biomolecular systems. Thus, they complement experimental measurements by giving access to experimentally inaccessible regimes. Among the different molecular dynamics techniques, native structure-based models (SBMs) are based on energy landscape theory and the principle of minimal frustration. Typically used in protein and RNA folding simulations, they coarse-grain the biomolecular system and/or simplify the Hamiltonian resulting in modest computational requirements while achieving high agreement with experimental data. eSBMTools streamlines running and evaluating SBM in a comprehensive package and offers high flexibility in adding experimental- or bioinformatics-derived restraints. We present a software package that allows setting up, modifying and evaluating SBM for both RNA and proteins. The implemented workflows include predicting protein complexes based on bioinformatics-derived inter-protein contact information, a standardized setup of protein folding simulations based on the common PDB format, calculating reaction coordinates and evaluating the simulation by free-energy calculations with weighted histogram analysis method or by phi-values. The modules interface with the molecular dynamics simulation program GROMACS. The package is open source and written in architecture-independent Python2. http://sourceforge.net/projects/esbmtools/. alexander.schug@kit.edu. Supplementary data are available at Bioinformatics online.

  5. An integrative computational approach for prioritization of genomic variants

    DOE PAGES

    Dubchak, Inna; Balasubramanian, Sandhya; Wang, Sheng; ...

    2014-12-15

    An essential step in the discovery of molecular mechanisms contributing to disease phenotypes and efficient experimental planning is the development of weighted hypotheses that estimate the functional effects of sequence variants discovered by high-throughput genomics. With the increasing specialization of the bioinformatics resources, creating analytical workflows that seamlessly integrate data and bioinformatics tools developed by multiple groups becomes inevitable. Here we present a case study of a use of the distributed analytical environment integrating four complementary specialized resources, namely the Lynx platform, VISTA RViewer, the Developmental Brain Disorders Database (DBDB), and the RaptorX server, for the identification of high-confidence candidatemore » genes contributing to pathogenesis of spina bifida. The analysis resulted in prediction and validation of deleterious mutations in the SLC19A placental transporter in mothers of the affected children that causes narrowing of the outlet channel and therefore leads to the reduced folate permeation rate. The described approach also enabled correct identification of several genes, previously shown to contribute to pathogenesis of spina bifida, and suggestion of additional genes for experimental validations. This study demonstrates that the seamless integration of bioinformatics resources enables fast and efficient prioritization and characterization of genomic factors and molecular networks contributing to the phenotypes of interest.« less

  6. Combining Flux Balance and Energy Balance Analysis for Large-Scale Metabolic Network: Biochemical Circuit Theory for Analysis of Large-Scale Metabolic Networks

    NASA Technical Reports Server (NTRS)

    Beard, Daniel A.; Liang, Shou-Dan; Qian, Hong; Biegel, Bryan (Technical Monitor)

    2001-01-01

    Predicting behavior of large-scale biochemical metabolic networks represents one of the greatest challenges of bioinformatics and computational biology. Approaches, such as flux balance analysis (FBA), that account for the known stoichiometry of the reaction network while avoiding implementation of detailed reaction kinetics are perhaps the most promising tools for the analysis of large complex networks. As a step towards building a complete theory of biochemical circuit analysis, we introduce energy balance analysis (EBA), which compliments the FBA approach by introducing fundamental constraints based on the first and second laws of thermodynamics. Fluxes obtained with EBA are thermodynamically feasible and provide valuable insight into the activation and suppression of biochemical pathways.

  7. Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

    PubMed Central

    Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

    2007-01-01

    The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876

  8. A longitudinal social network analysis of the editorial boards of medical informatics and bioinformatics journals.

    PubMed

    Malin, Bradley; Carley, Kathleen

    2007-01-01

    The goal of this research is to learn how the editorial staffs of bioinformatics and medical informatics journals provide support for cross-community exposure. Models such as co-citation and co-author analysis measure the relationships between researchers; but they do not capture how environments that support knowledge transfer across communities are organized. In this paper, we propose a social network analysis model to study how editorial boards integrate researchers from disparate communities. We evaluate our model by building relational networks based on the editorial boards of approximately 40 journals that serve as research outlets in medical informatics and bioinformatics. We track the evolution of editorial relationships through a longitudinal investigation over the years 2000 through 2005. Our findings suggest that there are research journals that support the collocation of editorial board members from the bioinformatics and medical informatics communities. Network centrality metrics indicate that editorial board members are located in the intersection of the communities and that the number of individuals in the intersection is growing with time. Social network analysis methods provide insight into the relationships between the medical informatics and bioinformatics communities. The number of editorial board members facilitating the publication intersection of the communities has grown, but the intersection remains dependent on a small group of individuals and fragile.

  9. Bioinformatic mapping and production of recombinant N-terminal domains of human cardiac ryanodine receptor 2

    PubMed Central

    Bauerová-Hlinková, Vladena; Hostinová, Eva; Gašperík, Juraj; Beck, Konrad; Borko, Ľubomír; Lai, F. Anthony; Zahradníková, Alexandra; Ševčík, Jozef

    2010-01-01

    We report the domain analysis of the N-terminal region (residues 1–759) of the human cardiac ryanodine receptor (RyR2) that encompasses one of the discrete RyR2 mutation clusters associated with catecholaminergic polymorphic ventricular tachycardia (CPVT1) and arrhythmogenic right ventricular dysplasia (ARVD2). Our strategy utilizes a bioinformatics approach complemented by protein expression, solubility analysis and limited proteolytic digestion. Based on the bioinformatics analysis, we designed a series of specific RyR2 N-terminal fragments for cloning and overexpression in Escherichia coli. High yields of soluble proteins were achieved for fragments RyR21–606·His6, RyR2391–606·His6, RyR2409–606·His6, Trx·RyR2384–606·His6, Trx·RyR2391-606·His6 and Trx·RyR2409–606·His6. The folding of RyR21–606·His6 was analyzed by circular dichroism spectroscopy resulting in α-helix and β-sheet content of ∼23% and ∼29%, respectively, at temperatures up to 35 °C, which is in agreement with sequence based secondary structure predictions. Tryptic digestion of the largest recombinant protein, RyR21–606·His6, resulted in the appearance of two specific subfragments of ∼40 and 25 kDa. The 25 kDa fragment exhibited greater stability. Hybridization with anti-His6·Tag antibody indicated that RyR21–606·His6 is cleaved from the N-terminus and amino acid sequencing of the proteolytic fragments revealed that digestion occurred after residues 259 and 384, respectively. PMID:20045464

  10. Support vector machines for prediction and analysis of beta and gamma-turns in proteins.

    PubMed

    Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

    2005-04-01

    Tight turns have long been recognized as one of the three important features of proteins, together with alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns and most of the rest are gamma-turns. Analysis and prediction of beta-turns and gamma-turns is very useful for design of new molecules such as drugs, pesticides, and antigens. In this paper we investigated two aspects of applying support vector machine (SVM), a promising machine learning method for bioinformatics, to prediction and analysis of beta-turns and gamma-turns. First, we developed two SVM-based methods, called BTSVM and GTSVM, which predict beta-turns and gamma-turns in a protein from its sequence. When compared with other methods, BTSVM has a superior performance and GTSVM is competitive. Second, we used SVMs with a linear kernel to estimate the support of amino acids for the formation of beta-turns and gamma-turns depending on their position in a protein. Our analysis results are more comprehensive and easier to use than the previous results in designing turns in proteins.

  11. Analysis and Prediction of Exon Skipping Events from RNA-Seq with Sequence Information Using Rotation Forest.

    PubMed

    Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping

    2017-12-12

    In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.

  12. No-boundary thinking in bioinformatics research

    PubMed Central

    2013-01-01

    Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339

  13. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    PubMed Central

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of Gagne's Conditions of Learning instructional design theory. This theory, although first published in the early 1970s, is still fundamental in instructional design and instructional technology. First, top-level as well as prerequisite learning objectives for a microarray analysis workshop and a primer design workshop were defined. Then a hierarchy of objectives for each workshop was created. Hands-on tutorials were designed to meet these objectives. Finally, events of learning proposed by Gagne's theory were incorporated into the hands-on tutorials. The resultant manuals were tested on a small number of trainees, revised, and applied in 1-day bioinformatics workshops. Based on this experience and on observations made during the workshops, we conclude that Gagne's Conditions of Learning instructional design theory provides a useful framework for developing bioinformatics training, but may not be optimal as a method for teaching it. PMID:16220141

  14. Revealing biological information using data structuring and automated learning.

    PubMed

    Mohorianu, Irina; Moulton, Vincent

    2010-11-01

    The intermediary steps between a biological hypothesis, concretized in the input data, and meaningful results, validated using biological experiments, commonly employ bioinformatics tools. Starting with storage of the data and ending with a statistical analysis of the significance of the results, every step in a bioinformatics analysis has been intensively studied and the resulting methods and models patented. This review summarizes the bioinformatics patents that have been developed mainly for the study of genes, and points out the universal applicability of bioinformatics methods to other related studies such as RNA interference. More specifically, we overview the steps undertaken in the majority of bioinformatics analyses, highlighting, for each, various approaches that have been developed to reveal details from different perspectives. First we consider data warehousing, the first task that has to be performed efficiently, optimizing the structure of the database, in order to facilitate both the subsequent steps and the retrieval of information. Next, we review data mining, which occupies the central part of most bioinformatics analyses, presenting patents concerning differential expression, unsupervised and supervised learning. Last, we discuss how networks of interactions of genes or other players in the cell may be created, which help draw biological conclusions and have been described in several patents.

  15. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species

    USDA-ARS?s Scientific Manuscript database

    Remarkable advances in next-generation sequencing (NGS) technologies, bioinformatics algorithms, and computational technologies have significantly accelerated genomic research. However, complicated NGS data analysis still remains as a major bottleneck. RNA-seq, as one of the major area in the NGS fi...

  16. Assessing an effective undergraduate module teaching applied bioinformatics to biology students

    PubMed Central

    2018-01-01

    Applied bioinformatics skills are becoming ever more indispensable for biologists, yet incorporation of these skills into the undergraduate biology curriculum is lagging behind, in part due to a lack of instructors willing and able to teach basic bioinformatics in classes that don’t specifically focus on quantitative skill development, such as statistics or computer sciences. To help undergraduate course instructors who themselves did not learn bioinformatics as part of their own education and are hesitant to plunge into teaching big data analysis, a module was developed that is written in plain-enough language, using publicly available computing tools and data, to allow novice instructors to teach next-generation sequence analysis to upper-level undergraduate students. To determine if the module allowed students to develop a better understanding of and appreciation for applied bioinformatics, various tools were developed and employed to assess the impact of the module. This article describes both the module and its assessment. Students found the activity valuable for their education and, in focus group discussions, emphasized that they saw a need for more and earlier instruction of big data analysis as part of the undergraduate biology curriculum. PMID:29324777

  17. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  18. Prediction of anticancer peptides against MCF-7 breast cancer cells from the peptidomes of Achatina fulica mucus fractions.

    PubMed

    E-Kobon, Teerasak; Thongararm, Pennapa; Roytrakul, Sittiruk; Meesuk, Ladda; Chumnanpuen, Pramote

    2016-01-01

    Several reports have shown antimicrobial and anticancer activities of mucous glycoproteins extracted from the giant African snail Achatina fulica. Anticancer properties of the snail mucous peptides remain incompletely revealed. The aim of this study was to predict anticancer peptides from A. fulica mucus. Two of HPLC-separated mucous fractions (F2 and F5) showed in vitro cytotoxicity against the breast cancer cell line (MCF-7) and normal epithelium cell line (Vero). According to the mass spectrometric analysis, 404 and 424 peptides from the F2 and F5 fractions were identified. Our comprehensive bioinformatics workflow predicted 16 putative cationic and amphipathic anticancer peptides with diverse structures from these two peptidome data. These peptides would be promising molecules for new anti-breast cancer drug development.

  19. Advantages of mixing bioinformatics and visualization approaches for analyzing sRNA-mediated regulatory bacterial networks

    PubMed Central

    Bourqui, Romain; Benchimol, William; Gaspin, Christine; Sirand-Pugnet, Pascal; Uricaru, Raluca; Dutour, Isabelle

    2015-01-01

    The revolution in high-throughput sequencing technologies has enabled the acquisition of gigabytes of RNA sequences in many different conditions and has highlighted an unexpected number of small RNAs (sRNAs) in bacteria. Ongoing exploitation of these data enables numerous applications for investigating bacterial transacting sRNA-mediated regulation networks. Focusing on sRNAs that regulate mRNA translation in trans, recent works have noted several sRNA-based regulatory pathways that are essential for key cellular processes. Although the number of known bacterial sRNAs is increasing, the experimental validation of their interactions with mRNA targets remains challenging and involves expensive and time-consuming experimental strategies. Hence, bioinformatics is crucial for selecting and prioritizing candidates before designing any experimental work. However, current software for target prediction produces a prohibitive number of candidates because of the lack of biological knowledge regarding the rules governing sRNA–mRNA interactions. Therefore, there is a real need to develop new approaches to help biologists focus on the most promising predicted sRNA–mRNA interactions. In this perspective, this review aims at presenting the advantages of mixing bioinformatics and visualization approaches for analyzing predicted sRNA-mediated regulatory bacterial networks. PMID:25477348

  20. Advantages of mixing bioinformatics and visualization approaches for analyzing sRNA-mediated regulatory bacterial networks.

    PubMed

    Thébault, Patricia; Bourqui, Romain; Benchimol, William; Gaspin, Christine; Sirand-Pugnet, Pascal; Uricaru, Raluca; Dutour, Isabelle

    2015-09-01

    The revolution in high-throughput sequencing technologies has enabled the acquisition of gigabytes of RNA sequences in many different conditions and has highlighted an unexpected number of small RNAs (sRNAs) in bacteria. Ongoing exploitation of these data enables numerous applications for investigating bacterial transacting sRNA-mediated regulation networks. Focusing on sRNAs that regulate mRNA translation in trans, recent works have noted several sRNA-based regulatory pathways that are essential for key cellular processes. Although the number of known bacterial sRNAs is increasing, the experimental validation of their interactions with mRNA targets remains challenging and involves expensive and time-consuming experimental strategies. Hence, bioinformatics is crucial for selecting and prioritizing candidates before designing any experimental work. However, current software for target prediction produces a prohibitive number of candidates because of the lack of biological knowledge regarding the rules governing sRNA-mRNA interactions. Therefore, there is a real need to develop new approaches to help biologists focus on the most promising predicted sRNA-mRNA interactions. In this perspective, this review aims at presenting the advantages of mixing bioinformatics and visualization approaches for analyzing predicted sRNA-mediated regulatory bacterial networks. © The Author 2014. Published by Oxford University Press.

  1. SYMBIOmatics: synergies in Medical Informatics and Bioinformatics--exploring current scientific literature for emerging topics.

    PubMed

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-03-08

    The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to http://www.symbiomatics.org). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000-2005 ("recent") and 1990-1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science.

  2. SYMBIOmatics: Synergies in Medical Informatics and Bioinformatics – exploring current scientific literature for emerging topics

    PubMed Central

    Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Coatrieux, Jean-Louis; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Porro, Ivan; Beltrame, Francesco; Tollis, Ioannis; Van der Lei, Johan

    2007-01-01

    Background The SYMBIOmatics Specific Support Action (SSA) is "an information gathering and dissemination activity" that seeks "to identify synergies between the bioinformatics and the medical informatics" domain to improve collaborative progress between both domains (ref. to ). As part of the project experts in both research fields will be identified and approached through a survey. To provide input to the survey, the scientific literature was analysed to extract topics relevant to both medical informatics and bioinformatics. Results This paper presents results of a systematic analysis of the scientific literature from medical informatics research and bioinformatics research. In the analysis pairs of words (bigrams) from the leading bioinformatics and medical informatics journals have been used as indication of existing and emerging technologies and topics over the period 2000–2005 ("recent") and 1990–1990 ("past"). We identified emerging topics that were equally important to bioinformatics and medical informatics in recent years such as microarray experiments, ontologies, open source, text mining and support vector machines. Emerging topics that evolved only in bioinformatics were system biology, protein interaction networks and statistical methods for microarray analyses, whereas emerging topics in medical informatics were grid technology and tissue microarrays. Conclusion We conclude that although both fields have their own specific domains of interest, they share common technological developments that tend to be initiated by new developments in biotechnology and computer science. PMID:17430562

  3. Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.

    PubMed

    Heo, Go Eun; Kang, Keun Young; Song, Min; Lee, Jeong-Hoon

    2017-05-31

    Bioinformatics is an interdisciplinary field at the intersection of molecular biology and computing technology. To characterize the field as convergent domain, researchers have used bibliometrics, augmented with text-mining techniques for content analysis. In previous studies, Latent Dirichlet Allocation (LDA) was the most representative topic modeling technique for identifying topic structure of subject areas. However, as opposed to revealing the topic structure in relation to metadata such as authors, publication date, and journals, LDA only displays the simple topic structure. In this paper, we adopt the Tang et al.'s Author-Conference-Topic (ACT) model to study the field of bioinformatics from the perspective of keyphrases, authors, and journals. The ACT model is capable of incorporating the paper, author, and conference into the topic distribution simultaneously. To obtain more meaningful results, we use journals and keyphrases instead of conferences and bag-of-words.. For analysis, we use PubMed to collected forty-six bioinformatics journals from the MEDLINE database. We conducted time series topic analysis over four periods from 1996 to 2015 to further examine the interdisciplinary nature of bioinformatics. We analyze the ACT Model results in each period. Additionally, for further integrated analysis, we conduct a time series analysis among the top-ranked keyphrases, journals, and authors according to their frequency. We also examine the patterns in the top journals by simultaneously identifying the topical probability in each period, as well as the top authors and keyphrases. The results indicate that in recent years diversified topics have become more prevalent and convergent topics have become more clearly represented. The results of our analysis implies that overtime the field of bioinformatics becomes more interdisciplinary where there is a steady increase in peripheral fields such as conceptual, mathematical, and system biology. These results are confirmed by integrated analysis of topic distribution as well as top ranked keyphrases, authors, and journals.

  4. Specific regions of the brain are capable of fructose metabolism.

    PubMed

    Oppelt, Sarah A; Zhang, Wanming; Tolan, Dean R

    2017-02-15

    High fructose consumption in the Western diet correlates with disease states such as obesity and metabolic syndrome complications, including type II diabetes, chronic kidney disease, and non-alcoholic fatty acid liver disease. Liver and kidneys are responsible for metabolism of 40-60% of ingested fructose, while the physiological fate of the remaining fructose remains poorly understood. The primary metabolic pathway for fructose includes the fructose-transporting solute-like carrier transport proteins 2a (SLC2a or GLUT), including GLUT5 and GLUT9, ketohexokinase (KHK), and aldolase. Bioinformatic analysis of gene expression encoding these proteins (glut5, glut9, khk, and aldoC, respectively) identifies other organs capable of this fructose metabolism. This analysis predicts brain, lymphoreticular tissue, placenta, and reproductive tissues as possible additional organs for fructose metabolism. While expression of these genes is highest in liver, the brain is predicted to have expression levels of these genes similar to kidney. RNA in situ hybridization of coronal slices of adult mouse brains validate the in silico expression of glut5, glut9, khk, and aldoC, and show expression across many regions of the brain, with the most notable expression in the cerebellum, hippocampus, cortex, and olfactory bulb. Dissected samples of these brain regions show KHK and aldolase enzyme activity 5-10 times the concentration of that in liver. Furthermore, rates of fructose oxidation in these brain regions are 15-150 times that of liver slices, confirming the bioinformatics prediction and in situ hybridization data. This suggests that previously unappreciated regions across the brain can use fructose, in addition to glucose, for energy production. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Specific regions of the brain are capable of fructose metabolism

    PubMed Central

    Oppelt, Sarah A.; Zhang, Wanming; Tolan, Dean R.

    2017-01-01

    High fructose consumption in the Western diet correlates with disease states such as obesity and metabolic syndrome complications, including type II diabetes, chronic kidney disease, and nonalcoholic fatty acid liver disease. Liver and kidneys are responsible for metabolism of 40–60% of ingested fructose, while the physiological fate of the remaining fructose remains poorly understood. The primary metabolic pathway for fructose includes the fructose-transporting solute-like carrier transport proteins 2a (SLC2a or GLUT), including GLUT5 and GLUT9, ketohexokinase (KHK), and aldolase. Bioinformatic analysis of gene expression encoding these proteins (glut5, glut9, khk, and aldoC, respectively) identifies other organs capable of this fructose metabolism. This analysis predicts brain, lymphoreticular tissue, placenta, and reproductive tissues as possible additional organs for fructose metabolism. While expression of these genes is highest in liver, the brain is predicted to have expression levels of these genes similar to kidney. RNA in situ hybridization of coronal slices of adult mouse brains validate the in silico expression of glut5, glut9, khk, and aldoC, and show expression across many regions of the brain, with the most notable expression in the cerebellum, hippocampus, cortex, and olfactory bulb. Dissected samples of these brain regions show KHK and aldolase enzyme activity 5–10 times the concentration of that in liver. Furthermore, rates of fructose oxidation in these brain regions are 15–150 times that of liver slices, confirming the bioinformatics prediction and in situ hybridization data. This suggests that previously unappreciated regions across the brain can use fructose, in addition to glucose, for energy production. PMID:28034722

  6. Bioinformatic Analysis of Pathogenic Missense Mutations of Activin Receptor Like Kinase 1 Ectodomain

    PubMed Central

    Scotti, Claudia; Olivieri, Carla; Boeri, Laura; Canzonieri, Cecilia; Ornati, Federica; Buscarini, Elisabetta; Pagella, Fabio; Danesino, Cesare

    2011-01-01

    Activin A receptor, type II-like kinase 1 (also called ALK1), is a serine-threonine kinase predominantly expressed on endothelial cells surface. Mutations in its ACVRL1 encoding gene (12q11-14) cause type 2 Hereditary Haemorrhagic Telangiectasia (HHT2), an autosomal dominant multisystem vascular dysplasia. The study of the structural effects of mutations is crucial to understand their pathogenic mechanism. However, while an X-ray structure of ALK1 intracellular domain has recently become available (PDB ID: 3MY0), structure determination of ALK1 ectodomain (ALK1EC) has been elusive so far. We here describe the building of a homology model for ALK1EC, followed by an extensive bioinformatic analysis, based on a set of 38 methods, of the effect of missense mutations at the sequence and structural level. ALK1EC potential interaction mode with its ligand BMP9 was then predicted combining modelling and docking data. The calculated model of the ALK1EC allowed mapping and a preliminary characterization of HHT2 associated mutations. Major structural changes and loss of stability of the protein were predicted for several mutations, while others were found to interfere mainly with binding to BMP9 or other interactors, like Endoglin (CD105), whose encoding ENG gene (9q34) mutations are known to cause type 1 HHT. This study gives a preliminary insight into the potential structure of ALK1EC and into the structural effects of HHT2 associated mutations, which can be useful to predict the potential effect of each single mutation, to devise new biological experiments and to interpret the biological significance of new mutations, private mutations, or non-synonymous polymorphisms. PMID:22028876

  7. Bioinformatics analysis of single and multi-hybrid epitopes of GRA-1, GRA-4, GRA-6 and GRA-7 proteins to improve DNA vaccine design against Toxoplasma gondii.

    PubMed

    Shaddel, Minoo; Ebrahimi, Mansour; Tabandeh, Mohammad Reza

    2018-06-01

    Toxoplasma gondii , is a causative agent of morbidity and mortality in immunocompromised and congenitally-infected individuals. Attempts to construct DNA vaccines against T. gondii using surface proteins are increasing. The dense granule antigens are highly expressed in the acute and chronic phases of T. gondii infection and considered as suitable DNA vaccine candidates to control toxoplasmosis. In the present study, bioinformatics tools and online software were used to predict, analyze and compare the structural, physical and chemical characters and immunogenicity of the GRA-1, GRA-4, GRA-6 and GRA-7 proteins. Sequence alignment results indicated that the GRA-1, GRA-4, GRA-6 and GRA-7 proteins had low similarity. The secondary structure prediction demonstrated that among the four proteins, GRA-1 and GRA-6 had similar secondary structure except for a little discrepancy. Hydrophilicity/hydrophobicity analysis showed multiple hydrophilic regions and some classical high hydrophilic domains for each protein sequence. Immunogenic epitope prediction results demonstrated that the GRA-1 and GRA-4 epitopes were stable and GRA-4 showed the highest degree of antigenicity. Although the GRA-7 epitope had the highest score of immunogenicity, this epitope was instable and had the lowest degree of antigenicity and half-time in eukaryotic cell. Also, the results indicated that GRA4-GRA7 epitope and GRA6-GRA7 had the highest degree of antigenicity and immunogenicity among multi-hybrid epitopes, respectively. Totally, in the present study, single epitopes showed the highest degree of antigenicity compared with multi-hybrid epitopes. Given the results, it can be concluded that GRA-4 and GRA-7 can be powerful DNA vaccine candidates against T. gondii .

  8. Bayesian and Phylogenic Approaches for Studying Relationships among Table Olive Cultivars.

    PubMed

    Ben Ayed, Rayda; Ennouri, Karim; Ben Amar, Fathi; Moreau, Fabienne; Triki, Mohamed Ali; Rebai, Ahmed

    2017-08-01

    To enhance table olive tree authentication, relationship, and productivity, we consider the analysis of 18 worldwide table olive cultivars (Olea europaea L.) based on morphological, biological, and physicochemical markers analyzed by bioinformatic and biostatistic tools. Accordingly, we assess the relationships between the studied varieties, on the one hand, and the potential productivity-quantitative parameter links on the other hand. The bioinformatic analysis based on the graphical representation of the matrix of Euclidean distances, the principal components analysis, unweighted pair group method with arithmetic mean, and principal coordinate analysis (PCoA) revealed three major clusters which were not correlated with the geographic origin. The statistical analysis based on Kendall's and Spearman correlation coefficients suggests two highly significant associations with both fruit color and pollinization and the productivity character. These results are confirmed by the multiple linear regression prediction models. In fact, based on the coefficient of determination (R 2 ) value, the best model demonstrated the power of the pollinization on the tree productivity (R 2  = 0.846). Moreover, the derived directed acyclic graph showed that only two direct influences are detected: effect of tolerance on fruit and stone symmetry on side and effect of tolerance on stone form and oil content on the other side. This work provides better understanding of the diversity available in worldwide table olive cultivars and supplies an important contribution for olive breeding and authenticity.

  9. MEGANTE: A Web-Based System for Integrated Plant Genome Annotation

    PubMed Central

    Numa, Hisataka; Itoh, Takeshi

    2014-01-01

    The recent advancement of high-throughput genome sequencing technologies has resulted in a considerable increase in demands for large-scale genome annotation. While annotation is a crucial step for downstream data analyses and experimental studies, this process requires substantial expertise and knowledge of bioinformatics. Here we present MEGANTE, a web-based annotation system that makes plant genome annotation easy for researchers unfamiliar with bioinformatics. Without any complicated configuration, users can perform genomic sequence annotations simply by uploading a sequence and selecting the species to query. MEGANTE automatically runs several analysis programs and integrates the results to select the appropriate consensus exon–intron structures and to predict open reading frames (ORFs) at each locus. Functional annotation, including a similarity search against known proteins and a functional domain search, are also performed for the predicted ORFs. The resultant annotation information is visualized with a widely used genome browser, GBrowse. For ease of analysis, the results can be downloaded in Microsoft Excel format. All of the query sequences and annotation results are stored on the server side so that users can access their own data from virtually anywhere on the web. The current release of MEGANTE targets 24 plant species from the Brassicaceae, Fabaceae, Musaceae, Poaceae, Salicaceae, Solanaceae, Rosaceae and Vitaceae families, and it allows users to submit a sequence up to 10 Mb in length and to save up to 100 sequences with the annotation information on the server. The MEGANTE web service is available at https://megante.dna.affrc.go.jp/. PMID:24253915

  10. Computational prediction and biochemical characterization of novel RNA aptamers to Rift Valley fever virus nucleocapsid protein.

    PubMed

    Ellenbecker, Mary; St Goddard, Jeremy; Sundet, Alec; Lanchy, Jean-Marc; Raiford, Douglas; Lodmell, J Stephen

    2015-10-01

    Rift Valley fever virus (RVFV) is a potent human and livestock pathogen endemic to sub-Saharan Africa and the Arabian Peninsula that has potential to spread to other parts of the world. Although there is no proven effective and safe treatment for RVFV infections, a potential therapeutic target is the virally encoded nucleocapsid protein (N). During the course of infection, N binds to viral RNA, and perturbation of this interaction can inhibit viral replication. To gain insight into how N recognizes viral RNA specifically, we designed an algorithm that uses a distance matrix and multidimensional scaling to compare the predicted secondary structures of known N-binding RNAs, or aptamers, that were isolated and characterized in previous in vitro evolution experiment. These aptamers did not exhibit overt sequence or predicted structure similarity, so we employed bioinformatic methods to propose novel aptamers based on analysis and clustering of secondary structures. We screened and scored the predicted secondary structures of novel randomly generated RNA sequences in silico and selected several of these putative N-binding RNAs whose secondary structures were similar to those of known N-binding RNAs. We found that overall the in silico generated RNA sequences bound well to N in vitro. Furthermore, introduction of these RNAs into cells prior to infection with RVFV inhibited viral replication in cell culture. This proof of concept study demonstrates how the predictive power of bioinformatics and the empirical power of biochemistry can be jointly harnessed to discover, synthesize, and test new RNA sequences that bind tightly to RVFV N protein. The approach would be easily generalizable to other applications. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. In silico analysis of subtilisin from Glaciozyma antarctica PI12

    NASA Astrophysics Data System (ADS)

    Mustafha, Siti Mardhiah; Murad, Abdul Munir Abdul; Mahadi, Nor Muhammad; Kamaruddin, Shazilah; Bakar, Farah Diba Abu

    2015-09-01

    Subtilisin constitute as a major player in industrial enzymes that has a wide range of application especially in the detergent industry. In this study, a cDNA encoding for subtilisin (GaSUBT) was extracted from the psychrophilic yeast, Glaciozyma antarctica PI12, PCR amplified and sequenced. Various bioinformatics tools were used to characterize the GaSUBT. GaSUBT contains 1587 bp nucleotides encoding for 529 amino acids. The predicted molecular weight of the deduced protein is 55.34 kDa with an isoelectric point of 6.25. GaSUBT was predicted to possess a signal peptide and pro-peptide consisting of a peptidase inhibitor I9 sequence. From the sequence alignment analysis of deduced amino acids with other subtilisins in the NCBI database showed that the sequences surrounding the catalytic triad that forms the catalytic domain are well conserved.

  12. "Plasmo2D": an ancillary proteomic tool to aid identification of proteins from Plasmodium falciparum.

    PubMed

    Khachane, Amit; Kumar, Ranjit; Jain, Sanyam; Jain, Samta; Banumathy, Gowrishankar; Singh, Varsha; Nagpal, Saurabh; Tatu, Utpal

    2005-01-01

    Bioinformatics tools to aid gene and protein sequence analysis have become an integral part of biology in the post-genomic era. Release of the Plasmodium falciparum genome sequence has allowed biologists to define the gene and the predicted protein content as well as their sequences in the parasite. Using pI and molecular weight as characteristics unique to each protein, we have developed a bioinformatics tool to aid identification of proteins from Plasmodium falciparum. The tool makes use of a Virtual 2-DE generated by plotting all of the proteins from the Plasmodium database on a pI versus molecular weight scale. Proteins are identified by comparing the position of migration of desired protein spots from an experimental 2-DE and that on a virtual 2-DE. The procedure has been automated in the form of user-friendly software called "Plasmo2D". The tool can be downloaded from http://144.16.89.25/Plasmo2D.zip.

  13. Two interactive Bioinformatics courses at the Bielefeld University Bioinformatics Server.

    PubMed

    Sczyrba, Alexander; Konermann, Susanne; Giegerich, Robert

    2008-05-01

    Conferences in computational biology continue to provide tutorials on classical and new methods in the field. This can be taken as an indicator that education is still a bottleneck in our field's process of becoming an established scientific discipline. Bielefeld University has been one of the early providers of bioinformatics education, both locally and via the internet. The Bielefeld Bioinformatics Server (BiBiServ) offers a variety of older and new materials. Here, we report on two online courses made available recently, one introductory and one on the advanced level: (i) SADR: Sequence Analysis with Distributed Resources (http://bibiserv.techfak.uni-bielefeld.de/sadr/) and (ii) ADP: Algebraic Dynamic Programming in Bioinformatics (http://bibiserv.techfak.uni-bielefeld.de/dpcourse/).

  14. Bioinformatics-based tools in drug discovery: the cartography from single gene to integrative biological networks.

    PubMed

    Ramharack, Pritika; Soliman, Mahmoud E S

    2018-06-01

    Originally developed for the analysis of biological sequences, bioinformatics has advanced into one of the most widely recognized domains in the scientific community. Despite this technological evolution, there is still an urgent need for nontoxic and efficient drugs. The onus now falls on the 'omics domain to meet this need by implementing bioinformatics techniques that will allow for the introduction of pioneering approaches in the rational drug design process. Here, we categorize an updated list of informatics tools and explore the capabilities of integrative bioinformatics in disease control. We believe that our review will serve as a comprehensive guide toward bioinformatics-oriented disease and drug discovery research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  15. Cellular automata and its applications in protein bioinformatics.

    PubMed

    Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen

    2011-09-01

    With the explosion of protein sequences generated in the postgenomic era, it is highly desirable to develop high-throughput tools for rapidly and reliably identifying various attributes of uncharacterized proteins based on their sequence information alone. The knowledge thus obtained can help us timely utilize these newly found protein sequences for both basic research and drug discovery. Many bioinformatics tools have been developed by means of machine learning methods. This review is focused on the applications of a new kind of science (cellular automata) in protein bioinformatics. A cellular automaton (CA) is an open, flexible and discrete dynamic model that holds enormous potentials in modeling complex systems, in spite of the simplicity of the model itself. Researchers, scientists and practitioners from different fields have utilized cellular automata for visualizing protein sequences, investigating their evolution processes, and predicting their various attributes. Owing to its impressive power, intuitiveness and relative simplicity, the CA approach has great potential for use as a tool for bioinformatics.

  16. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Taylor, Ronald C.

    Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBasemore » project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.« less

  17. From cheminformatics to structure-based design: Web services and desktop applications based on the NAOMI library.

    PubMed

    Bietz, Stefan; Inhester, Therese; Lauck, Florian; Sommer, Kai; von Behren, Mathias M; Fährrolfes, Rainer; Flachsenberg, Florian; Meyder, Agnes; Nittinger, Eva; Otto, Thomas; Hilbig, Matthias; Schomburg, Karen T; Volkamer, Andrea; Rarey, Matthias

    2017-11-10

    Nowadays, computational approaches are an integral part of life science research. Problems related to interpretation of experimental results, data analysis, or visualization tasks highly benefit from the achievements of the digital era. Simulation methods facilitate predictions of physicochemical properties and can assist in understanding macromolecular phenomena. Here, we will give an overview of the methods developed in our group that aim at supporting researchers from all life science areas. Based on state-of-the-art approaches from structural bioinformatics and cheminformatics, we provide software covering a wide range of research questions. Our all-in-one web service platform ProteinsPlus (http://proteins.plus) offers solutions for pocket and druggability prediction, hydrogen placement, structure quality assessment, ensemble generation, protein-protein interaction classification, and 2D-interaction visualization. Additionally, we provide a software package that contains tools targeting cheminformatics problems like file format conversion, molecule data set processing, SMARTS editing, fragment space enumeration, and ligand-based virtual screening. Furthermore, it also includes structural bioinformatics solutions for inverse screening, binding site alignment, and searching interaction patterns across structure libraries. The software package is available at http://software.zbh.uni-hamburg.de. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  18. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine.

    PubMed

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-02-16

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM's diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients' target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ's cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the "multi-component, multi-target and multi-pathway" combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM's molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.

  19. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

    PubMed

    Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

    2015-05-19

    Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.

  20. Tcof1-Related Molecular Networks in Treacher Collins Syndrome.

    PubMed

    Dai, Jiewen; Si, Jiawen; Wang, Minjiao; Huang, Li; Fang, Bing; Shi, Jun; Wang, Xudong; Shen, Guofang

    2016-09-01

    Treacher Collins syndrome (TCS) is a rare, autosomal-dominant disorder characterized by craniofacial deformities, and is primarily caused by mutations in the Tcof1 gene. This article was aimed to perform a comprehensive literature review and systematic bioinformatic analysis of Tcof1-related molecular networks in TCS. First, the up- and down-regulated genes in Tcof1 heterozygous haploinsufficient mutant mice embryos and Tcof1 knockdown and Tcof1 over-expressed neuroblastoma N1E-115 cells were obtained from the Gene Expression Omnibus database. The GeneDecks database was used to calculate the 500 genes most closely related to Tcof1. Then, the relationships between 4 gene sets (a predicted set and sets comparing the wildtype with the 3 Gene Expression Omnibus datasets) were analyzed using the DAVID, GeneMANIA and STRING databases. The analysis results showed that the Tcof1-related genes were enriched in various biological processes, including cell proliferation, apoptosis, cell cycle, differentiation, and migration. They were also enriched in several signaling pathways, such as the ribosome, p53, cell cycle, and WNT signaling pathways. Additionally, these genes clearly had direct or indirect interactions with Tcof1 and between each other. Literature review and bioinformatic analysis finds imply that special attention should be given to these pathways, as they may offer target points for TCS therapies.

  1. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR)

    PubMed Central

    Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J.; Laclette, Juan P.; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

    2015-01-01

    Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest. PMID:25989346

  2. Lost in Translation: Bioinformatic Analysis of Variations Affecting the Translation Initiation Codon in the Human Genome.

    PubMed

    Abad, Francisco; de la Morena-Barrio, María Eugenia; Fernández-Breis, Jesualdo Tomás; Corral, Javier

    2018-06-01

    Translation is a key biological process controlled in eukaryotes by the initiation AUG codon. Variations affecting this codon may have pathological consequences by disturbing the correct initiation of translation. Unfortunately, there is no systematic study describing these variations in the human genome. Moreover, we aimed to develop new tools for in silico prediction of the pathogenicity of gene variations affecting AUG codons, because to date, these gene defects have been wrongly classified as missense. Whole-exome analysis revealed the mean of 12 gene variations per person affecting initiation codons, mostly with high (> 0:01) minor allele frequency (MAF). Moreover, analysis of Ensembl data (December 2017) revealed 11,261 genetic variations affecting the initiation AUG codon of 7,205 genes. Most of these variations (99.5%) have low or unknown MAF, probably reflecting deleterious consequences. Only 62 variations had high MAF. Genetic variations with high MAF had closer alternative AUG downstream codons than did those with low MAF. Besides, the high-MAF group better maintained both the signal peptide and reading frame. These differentiating elements could help to determine the pathogenicity of this kind of variation. Data and scripts in Perl and R are freely available at https://github.com/fanavarro/hemodonacion. jfernand@um.es. Supplementary data are available at Bioinformatics online.

  3. Bioinformatic analysis of phage AB3, a phiKMV-like virus infecting Acinetobacter baumannii.

    PubMed

    Zhang, J; Liu, X; Li, X-J

    2015-01-16

    The phages of Acinetobacter baumannii has drawn increasing attention because of the multi-drug resistance of A. baumanni. The aim of this study was to sequence Acinetobacter baumannii phage AB3 and conduct bioinformatic analysis to lay a foundation for genome remodeling and phage therapy. We isolated and sequenced A. baumannii phage AB3 and attempted to annotate and analyze its genome. The results showed that the genome is a double-stranded DNA with a total length of 31,185 base pairs (bp) and 97 open reading frames greater than 100 bp. The genome includes 28 predicted genes, of which 24 are homologous to phage AB1. The entire coding sequence is located on the negative strand, representing 90.8% of the total length. The G+C mol% was 39.18%, without areas of high G+C content over 200 bp in length. No GC island, tRNA gene, or repeated sequence was identified. Gene lengths were 120-3099 bp, with an average of 1011 bp. Six genes were found to be greater than 2000 bp in length. Genomic alignment and phylogenetic analysis of the RNA polymerase gene showed that similar to phage AB1, phage AB3 is a phiKMV-like virus in the T7 phage family.

  4. ENFIN a network to enhance integrative systems biology.

    PubMed

    Kahlem, Pascal; Birney, Ewan

    2007-12-01

    Integration of biological data of various types and development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing both an adapted infrastructure to connect databases and platforms to enable the generation of new bioinformatics tools as well as the experimental validation of computational predictions. We will give an overview of the projects tackled within ENFIN and discuss the challenges associated with integration for systems biology.

  5. Design and Development of ChemInfoCloud: An Integrated Cloud Enabled Platform for Virtual Screening.

    PubMed

    Karthikeyan, Muthukumarasamy; Pandit, Deepak; Bhavasar, Arvind; Vyas, Renu

    2015-01-01

    The power of cloud computing and distributed computing has been harnessed to handle vast and heterogeneous data required to be processed in any virtual screening protocol. A cloud computing platorm ChemInfoCloud was built and integrated with several chemoinformatics and bioinformatics tools. The robust engine performs the core chemoinformatics tasks of lead generation, lead optimisation and property prediction in a fast and efficient manner. It has also been provided with some of the bioinformatics functionalities including sequence alignment, active site pose prediction and protein ligand docking. Text mining, NMR chemical shift (1H, 13C) prediction and reaction fingerprint generation modules for efficient lead discovery are also implemented in this platform. We have developed an integrated problem solving cloud environment for virtual screening studies that also provides workflow management, better usability and interaction with end users using container based virtualization, OpenVz.

  6. From Marine Venoms to Drugs: Efficiently Supported by a Combination of Transcriptomics and Proteomics

    PubMed Central

    Xie, Bing; Huang, Yu; Baumann, Kate; Fry, Bryan Grieg; Shi, Qiong

    2017-01-01

    The potential of marine natural products to become new drugs is vast; however, research is still in its infancy. The chemical and biological diversity of marine toxins is immeasurable and as such an extraordinary resource for the discovery of new drugs. With the rapid development of next-generation sequencing (NGS) and liquid chromatography–tandem mass spectrometry (LC-MS/MS), it has been much easier and faster to identify more toxins and predict their functions with bioinformatics pipelines, which pave the way for novel drug developments. Here we provide an overview of related bioinformatics pipelines that have been supported by a combination of transcriptomics and proteomics for identification and function prediction of novel marine toxins. PMID:28358320

  7. From Marine Venoms to Drugs: Efficiently Supported by a Combination of Transcriptomics and Proteomics.

    PubMed

    Xie, Bing; Huang, Yu; Baumann, Kate; Fry, Bryan Grieg; Shi, Qiong

    2017-03-30

    The potential of marine natural products to become new drugs is vast; however, research is still in its infancy. The chemical and biological diversity of marine toxins is immeasurable and as such an extraordinary resource for the discovery of new drugs. With the rapid development of next-generation sequencing (NGS) and liquid chromatography-tandem mass spectrometry (LC-MS/MS), it has been much easier and faster to identify more toxins and predict their functions with bioinformatics pipelines, which pave the way for novel drug developments. Here we provide an overview of related bioinformatics pipelines that have been supported by a combination of transcriptomics and proteomics for identification and function prediction of novel marine toxins.

  8. miRToolsGallery: a tag-based and rankable microRNA bioinformatics resources database portal

    PubMed Central

    Chen, Liang; Heikkinen, Liisa; Wang, ChangLiang; Yang, Yang; Knott, K Emily

    2018-01-01

    Abstract Hundreds of bioinformatics tools have been developed for MicroRNA (miRNA) investigations including those used for identification, target prediction, structure and expression profile analysis. However, finding the correct tool for a specific application requires the tedious and laborious process of locating, downloading, testing and validating the appropriate tool from a group of nearly a thousand. In order to facilitate this process, we developed a novel database portal named miRToolsGallery. We constructed the portal by manually curating > 950 miRNA analysis tools and resources. In the portal, a query to locate the appropriate tool is expedited by being searchable, filterable and rankable. The ranking feature is vital to quickly identify and prioritize the more useful from the obscure tools. Tools are ranked via different criteria including the PageRank algorithm, date of publication, number of citations, average of votes and number of publications. miRToolsGallery provides links and data for the comprehensive collection of currently available miRNA tools with a ranking function which can be adjusted using different criteria according to specific requirements. Database URL: http://www.mirtoolsgallery.org PMID:29688355

  9. MicroRNAs as Potential Regulators of Glutathione Peroxidases Expression and Their Role in Obesity and Related Pathologies.

    PubMed

    Matoušková, Petra; Hanousková, Barbora; Skálová, Lenka

    2018-04-14

    Glutathione peroxidases (GPxs) belong to the eight-member family of phylogenetically related enzymes with different cellular localization, but distinct antioxidant function. Several GPxs are important selenoproteins. Dysregulated GPx expression is connected with severe pathologies, including obesity and diabetes. We performed a comprehensive bioinformatic analysis using the programs miRDB, miRanda, TargetScan, and Diana in the search for hypothetical microRNAs targeting 3'untranslated regions (3´UTR) of GPxs. We cross-referenced the literature for possible intersections between our results and available reports on identified microRNAs, with a special focus on the microRNAs related to oxidative stress, obesity, and related pathologies. We identified many microRNAs with an association with oxidative stress and obesity as putative regulators of GPxs. In particular, miR-185-5p was predicted by a larger number of programs to target six GPxs and thus could play the role as their master regulator. This microRNA was altered by selenium deficiency and can play a role as a feedback control of selenoproteins' expression. Through the bioinformatics analysis we revealed the potential connection of microRNAs, GPxs, obesity, and other redox imbalance related diseases.

  10. Organellar proteome analyses of ricin toxin-treated HeLa cells.

    PubMed

    Liao, Peng; Li, Yunhu; Li, Hongyang; Liu, Wensen

    2016-07-01

    Apoptosis triggered by ricin toxin (RT) has previously been associated with certain cellular organellar compartments, but the diversity in the composition of the organellar proteins remains unclear. Here, we applied a shotgun proteomics strategy to examine the differential expression of proteins in the mitochondria, nuclei, and cytoplasm of HeLa cells treated and not treated with RT. Data were combined with a global bioinformatics analysis and experimental confirmations. A total of 3107 proteins were identified. Bioinformatics predictors (Proteome Analyst, WoLF PSORT, TargetP, MitoPred, Nucleo, MultiLoc, and k-nearest neighbor) and a Bayesian model that integrated these predictors were used to predict the locations of 1349 distinct organellar proteins. Our data indicate that the Bayesian model was more efficient than the individual implementation of these predictors. Additionally, a Biomolecular Interaction Network (BIN) analysis was used to identify 149 BIN subnetworks. Our experimental confirmations indicate that certain apoptosis-related proteins (e.g. cytochrome c, enolase, lamin B, Bax, and Drp1) were found to be translocated and had variable expression levels. These results provide new insights for the systematic understanding of RT-induced apoptosis responses. © The Author(s) 2014.

  11. Functional analysis of the Arabidopsis PHT4 family of intracellular phosphate transporters.

    PubMed

    Guo, B; Jin, Y; Wussler, C; Blancaflor, E B; Motes, C M; Versaw, W K

    2008-01-01

    The transport of phosphate (Pi) between subcellular compartments is central to metabolic regulation. Although some of the transporters involved in controlling the intracellular distribution of Pi have been identified in plants, others are predicted from genetic, biochemical and bioinformatics studies. Heterologous expression in yeast, and gene expression and localization in plants were used to characterize all six members of an Arabidopsis thaliana membrane transporter family designated here as PHT4. PHT4 proteins share similarity with SLC17/type I Pi transporters, a diverse group of animal proteins involved in the transport of Pi, organic anions and chloride. All of the PHT4 proteins mediate Pi transport in yeast with high specificity. Bioinformatic analysis and localization of PHT4-GFP fusion proteins indicate that five of the proteins are targeted to the plastid envelope, and the sixth resides in the Golgi apparatus. PHT4 genes are expressed in both roots and leaves, although two of the genes are expressed predominantly in leaves and one mostly in roots. These expression patterns, together with Pi transport activities and subcellular locations, suggest roles for PHT4 proteins in the transport of Pi between the cytosol and chloroplasts, heterotrophic plastids and the Golgi apparatus.

  12. RNA sequencing uncovers antisense RNAs and novel small RNAs in Streptococcus pyogenes

    PubMed Central

    Le Rhun, Anaïs; Beer, Yan Yan; Reimegård, Johan; Chylinski, Krzysztof; Charpentier, Emmanuelle

    2016-01-01

    ABSTRACT Streptococcus pyogenes is a human pathogen responsible for a wide spectrum of diseases ranging from mild to life-threatening infections. During the infectious process, the temporal and spatial expression of pathogenicity factors is tightly controlled by a complex network of protein and RNA regulators acting in response to various environmental signals. Here, we focus on the class of small RNA regulators (sRNAs) and present the first complete analysis of sRNA sequencing data in S. pyogenes. In the SF370 clinical isolate (M1 serotype), we identified 197 and 428 putative regulatory RNAs by visual inspection and bioinformatics screening of the sequencing data, respectively. Only 35 from the 197 candidates identified by visual screening were assigned a predicted function (T-boxes, ribosomal protein leaders, characterized riboswitches or sRNAs), indicating how little is known about sRNA regulation in S. pyogenes. By comparing our list of predicted sRNAs with previous S. pyogenes sRNA screens using bioinformatics or microarrays, 92 novel sRNAs were revealed, including antisense RNAs that are for the first time shown to be expressed in this pathogen. We experimentally validated the expression of 30 novel sRNAs and antisense RNAs. We show that the expression profile of 9 sRNAs including 2 predicted regulatory elements is affected by the endoribonucleases RNase III and/or RNase Y, highlighting the critical role of these enzymes in sRNA regulation. PMID:26580233

  13. Differentiating Human Multipotent Mesenchymal Stromal Cells Regulate microRNAs: Prediction of microRNA Regulation by PDGF During Osteogenesis

    PubMed Central

    Goff, Loyal A.; Boucher, Shayne; Ricupero, Christopher L.; Fenstermacher, Sara; Swerdel, Mavis; Chase, Lucas; Adams, Christopher; Chesnut, Jonathan; Lakshmipathy, Uma; Hart, Ronald P.

    2009-01-01

    Objective Human multipotent mesenchymal stromal cells (MSC) have the potential to differentiate into multiple cell types, although little is known about factors that control their fate. Differentiation-specific microRNAs may play a key role in stem cell self renewal and differentiation. We propose that specific intracellular signalling pathways modulate gene expression during differentiation by regulating microRNA expression. Methods Illumina mRNA and NCode microRNA expression analyses were performed on MSC and their differentiated progeny. A combination of bioinformatic prediction and pathway inhibition was used to identify microRNAs associated with PDGF signalling. Results The pattern of microRNA expression in MSC is distinct from that in pluripotent stem cells such as human embryonic stem cells. Specific populations of microRNAs are regulated in MSC during differentiation targeted towards specific cell types. Complementary mRNA expression analysis increases the pool of markers characteristic of MSC or differentiated progeny. To identify microRNA expression patterns affected by signalling pathways, we examined the PDGF pathway found to be regulated during osteogenesis by microarray studies. A set of microRNAs bioinformatically predicted to respond to PDGF signalling was experimentally confirmed by direct PDGF inhibition. Conclusion Our results demonstrate that a subset of microRNAs regulated during osteogenic differentiation of MSCs is responsive to perturbation of the PDGF pathway. This approach not only identifies characteristic classes of differentiation-specific mRNAs and microRNAs, but begins to link regulated molecules with specific cellular pathways. PMID:18657893

  14. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator.

    PubMed

    Garcia Castro, Alexander; Thoraval, Samuel; Garcia, Leyla J; Ragan, Mark A

    2005-04-07

    Computational methods for problem solving need to interleave information access and algorithm execution in a problem-specific workflow. The structures of these workflows are defined by a scaffold of syntactic, semantic and algebraic objects capable of representing them. Despite the proliferation of GUIs (Graphic User Interfaces) in bioinformatics, only some of them provide workflow capabilities; surprisingly, no meta-analysis of workflow operators and components in bioinformatics has been reported. We present a set of syntactic components and algebraic operators capable of representing analytical workflows in bioinformatics. Iteration, recursion, the use of conditional statements, and management of suspend/resume tasks have traditionally been implemented on an ad hoc basis and hard-coded; by having these operators properly defined it is possible to use and parameterize them as generic re-usable components. To illustrate how these operations can be orchestrated, we present GPIPE, a prototype graphic pipeline generator for PISE that allows the definition of a pipeline, parameterization of its component methods, and storage of metadata in XML formats. This implementation goes beyond the macro capacities currently in PISE. As the entire analysis protocol is defined in XML, a complete bioinformatic experiment (linked sets of methods, parameters and results) can be reproduced or shared among users. http://if-web1.imb.uq.edu.au/Pise/5.a/gpipe.html (interactive), ftp://ftp.pasteur.fr/pub/GenSoft/unix/misc/Pise/ (download). From our meta-analysis we have identified syntactic structures and algebraic operators common to many workflows in bioinformatics. The workflow components and algebraic operators can be assimilated into re-usable software components. GPIPE, a prototype implementation of this framework, provides a GUI builder to facilitate the generation of workflows and integration of heterogeneous analytical tools.

  15. Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions

    PubMed Central

    Tang, Rongying; Prosser, Debra O.; Love, Donald R.

    2016-01-01

    The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions −4 and +7 within consensus splice sites being largely uninformative. PMID:27313609

  16. Microarray‑based bioinformatics analysis of the prospective target gene network of key miRNAs influenced by long non‑coding RNA PVT1 in HCC.

    PubMed

    Zhang, Yu; Mo, Wei-Jia; Wang, Xiao; Zhang, Tong-Tong; Qin, Yuan; Wang, Han-Lin; Chen, Gang; Wei, Dan-Ming; Dang, Yi-Wu

    2018-05-02

    The long non‑coding RNA (lncRNA) PVT1 plays vital roles in the tumorigenesis and development of various types of cancer. However, the potential expression profiling, functions and pathways of PVT1 in HCC remain unknown. PVT1 was knocked down in SMMC‑7721 cells, and a miRNA microarray analysis was performed to detect the differentially expressed miRNAs. Twelve target prediction algorithms were used to predict the underlying targets of these differentially expressed miRNAs. Bioinformatics analysis was performed to explore the underlying functions, pathways and networks of the targeted genes. Furthermore, the relationship between PVT1 and the clinical parameters in HCC was confirmed based on the original data in the TCGA database. Among the differentially expressed miRNAs, the top two upregulated and downregulated miRNAs were selected for further analysis based on the false discovery rate (FDR), fold‑change (FC) and P‑values. Based on the TCGA database, PVT1 was obviously highly expressed in HCC, and a statistically higher PVT1 expression was found for sex (male), ethnicity (Asian) and pathological grade (G3+G4) compared to the control groups (P<0.05). Furthermore, Gene Ontology (GO) analysis revealed that the target genes were involved in complex cellular pathways, such as the macromolecule biosynthetic process, compound metabolic process, and transcription. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that the MAPK and Wnt signaling pathways may be correlated with the regulation of the four candidate miRNAs. The results therefore provide significant information on the differentially expressed miRNAs associated with PVT1 in HCC, and we hypothesized that PVT1 may play vital roles in HCC by regulating different miRNAs or target gene expression (particularly MAPK8) via the MAPK or Wnt signaling pathways. Thus, further investigation of the molecular mechanism of PVT1 in HCC is needed.

  17. A Trans-omics Mathematical Analysis Reveals Novel Functions of the Ornithine Metabolic Pathway in Cancer Stem Cells

    NASA Astrophysics Data System (ADS)

    Koseki, Jun; Matsui, Hidetoshi; Konno, Masamitsu; Nishida, Naohiro; Kawamoto, Koichi; Kano, Yoshihiro; Mori, Masaki; Doki, Yuichiro; Ishii, Hideshi

    2016-02-01

    Bioinformatics and computational modelling are expected to offer innovative approaches in human medical science. In the present study, we performed computational analyses and made predictions using transcriptome and metabolome datasets obtained from fluorescence-based visualisations of chemotherapy-resistant cancer stem cells (CSCs) in the human oesophagus. This approach revealed an uncharacterized role for the ornithine metabolic pathway in the survival of chemotherapy-resistant CSCs. The present study fastens this rationale for further characterisation that may lead to the discovery of innovative drugs against robust CSCs.

  18. Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.

    PubMed

    Kaczor-Urbanowicz, Karolina Elzbieta; Kim, Yong; Li, Feng; Galeev, Timur; Kitchen, Rob R; Gerstein, Mark; Koyano, Kikuye; Jeong, Sung-Hee; Wang, Xiaoyan; Elashoff, David; Kang, So Young; Kim, Su Mi; Kim, Kyoung; Kim, Sung; Chia, David; Xiao, Xinshu; Rozowsky, Joel; Wong, David T W

    2018-01-01

    Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. dtww@ucla.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  19. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    NASA Astrophysics Data System (ADS)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  20. Bioinformatics core competencies for undergraduate life sciences education.

    PubMed

    Wilson Sayres, Melissa A; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G; Smith, Todd M; Triplett, Eric W; Williams, Jason J; Dinsdale, Elizabeth; Morgan, William R; Burnette, James M; Donovan, Samuel S; Drew, Jennifer C; Elgin, Sarah C R; Fowlks, Edison R; Galindo-Gonzalez, Sebastian; Goodman, Anya L; Grandgenett, Nealy F; Goller, Carlos C; Jungck, John R; Newman, Jeffrey D; Pearson, William; Ryder, Elizabeth F; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C; Toro-Martínez, Arlín; Welch, Lonnie R; Wright, Robin; Barone, Lindsay; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C; Pauley, Mark A

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent's degree of training, time since degree earned, and/or the Carnegie Classification of the respondent's institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula.

  1. Bioinformatics core competencies for undergraduate life sciences education

    PubMed Central

    Wilson Sayres, Melissa A.; Hauser, Charles; Sierk, Michael; Robic, Srebrenka; Rosenwald, Anne G.; Smith, Todd M.; Triplett, Eric W.; Williams, Jason J.; Dinsdale, Elizabeth; Morgan, William R.; Burnette, James M.; Donovan, Samuel S.; Drew, Jennifer C.; Elgin, Sarah C. R.; Fowlks, Edison R.; Galindo-Gonzalez, Sebastian; Goodman, Anya L.; Grandgenett, Nealy F.; Goller, Carlos C.; Jungck, John R.; Newman, Jeffrey D.; Pearson, William; Ryder, Elizabeth F.; Tosado-Acevedo, Rafael; Tapprich, William; Tobin, Tammy C.; Toro-Martínez, Arlín; Welch, Lonnie R.; Wright, Robin; Ebenbach, David; McWilliams, Mindy; Olney, Kimberly C.

    2018-01-01

    Although bioinformatics is becoming increasingly central to research in the life sciences, bioinformatics skills and knowledge are not well integrated into undergraduate biology education. This curricular gap prevents biology students from harnessing the full potential of their education, limiting their career opportunities and slowing research innovation. To advance the integration of bioinformatics into life sciences education, a framework of core bioinformatics competencies is needed. To that end, we here report the results of a survey of biology faculty in the United States about teaching bioinformatics to undergraduate life scientists. Responses were received from 1,260 faculty representing institutions in all fifty states with a combined capacity to educate hundreds of thousands of students every year. Results indicate strong, widespread agreement that bioinformatics knowledge and skills are critical for undergraduate life scientists as well as considerable agreement about which skills are necessary. Perceptions of the importance of some skills varied with the respondent’s degree of training, time since degree earned, and/or the Carnegie Classification of the respondent’s institution. To assess which skills are currently being taught, we analyzed syllabi of courses with bioinformatics content submitted by survey respondents. Finally, we used the survey results, the analysis of the syllabi, and our collective research and teaching expertise to develop a set of bioinformatics core competencies for undergraduate biology students. These core competencies are intended to serve as a guide for institutions as they work to integrate bioinformatics into their life sciences curricula. PMID:29870542

  2. Dissecting whole-genome sequencing-based online tools for predicting resistance in Mycobacterium tuberculosis: can we use them for clinical decision guidance?

    PubMed

    Macedo, Rita; Nunes, Alexandra; Portugal, Isabel; Duarte, Sílvia; Vieira, Luís; Gomes, João Paulo

    2018-05-01

    Whole-genome sequencing (WGS)-based bioinformatics platforms for the rapid prediction of resistance will soon be implemented in the Tuberculosis (TB) laboratory, but their accuracy assessment still needs to be strengthened. Here, we fully-sequenced a total of 54 multidrug-resistant (MDR) and five susceptible TB strains and performed, for the first time, a simultaneous evaluation of the major four free online platforms (TB Profiler, PhyResSE, Mykrobe Predictor and TGS-TB). Overall, the sensitivity of resistance prediction ranged from 84.3% using Mykrobe predictor to 95.2% using TB profiler, while specificity was higher and homogeneous among platforms. TB profiler revealed the best performance robustness (sensitivity, specificity, PPV and NPV above 95%), followed by TGS-TB (all parameters above 90%). We also observed a few discrepancies between phenotype and genotype, where, in some cases, it was possible to pin-point some "candidate" mutations (e.g., in the rpsL promoter region) highlighting the need for their confirmation through mutagenesis assays and potential review of the anti-TB genetic databases. The rampant development of the bioinformatics algorithms and the tremendously reduced time-frame until the clinician may decide for a definitive and most effective treatment will certainly trigger the technological transition where WGS-based bioinformatics platforms could replace phenotypic drug susceptibility testing for TB. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  4. India's Computational Biology Growth and Challenges.

    PubMed

    Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Agoramoorthy, Govindasamy

    2016-09-01

    India's computational science is growing swiftly due to the outburst of internet and information technology services. The bioinformatics sector of India has been transforming rapidly by creating a competitive position in global bioinformatics market. Bioinformatics is widely used across India to address a wide range of biological issues. Recently, computational researchers and biologists are collaborating in projects such as database development, sequence analysis, genomic prospects and algorithm generations. In this paper, we have presented the Indian computational biology scenario highlighting bioinformatics-related educational activities, manpower development, internet boom, service industry, research activities, conferences and trainings undertaken by the corporate and government sectors. Nonetheless, this new field of science faces lots of challenges.

  5. Crysalis: an integrated server for computational analysis and design of protein crystallization.

    PubMed

    Wang, Huilin; Feng, Liubin; Zhang, Ziding; Webb, Geoffrey I; Lin, Donghai; Song, Jiangning

    2016-02-24

    The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/.

  6. Crysalis: an integrated server for computational analysis and design of protein crystallization

    PubMed Central

    Wang, Huilin; Feng, Liubin; Zhang, Ziding; Webb, Geoffrey I.; Lin, Donghai; Song, Jiangning

    2016-01-01

    The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/. PMID:26906024

  7. miR-425 inhibits melanoma metastasis through repression of PI3K-Akt pathway by targeting IGF-1.

    PubMed

    Liu, Pei; Hu, Yaotian; Ma, Ling; Du, Min; Xia, Lin; Hu, Zhensheng

    2015-10-01

    miR-425 is a potential tumor suppressor in cancer, but its role in melanoma is still unknown. We aim to investigate miR-425 expression in melanoma tissues and cell lines. Next, cell proliferation, cell cycle, apoptosis and metastasis will be studied using lentivirus-mediated gain-of-function studies. The predicted results are stable miR-425 inhibits cell proliferation and metastasis and induced cell apoptosis. It is predicted that IGF-1 is a potential target gene of miR-495 by bioinformatics analysis. Then luciferase assay analysis identifies IGF-1 as a new direct target gene of miR-425 and miR-425 inhibits melanoma cancer progression via IGF-1. Collectively, our findings suggested that miR-425 may function as a tumor suppressor in melanoma by targeting IGF-1. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  8. Development of a Web-Enabled Informatics Platform for Manipulation of Gene Expression Data

    DTIC Science & Technology

    2004-12-01

    genomic platforms such as metabolomics and proteomics , and to federated databases for knowledge management. A successful SBIR Phase I completed...measurements that require sophisticated bioinformatic platforms for data archival, management, integration, and analysis if researchers are to derive...web-enabled bioinformatic platform consisting of a Laboratory Information Management System (LIMS), an Analysis Information Management System (AIMS

  9. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  10. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows.

    PubMed

    Paraskevopoulou, Maria D; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A G

    2013-07-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA-gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines.

  11. DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows

    PubMed Central

    Paraskevopoulou, Maria D.; Georgakilas, Georgios; Kostoulas, Nikos; Vlachos, Ioannis S.; Vergoulis, Thanasis; Reczko, Martin; Filippidis, Christos; Dalamagas, Theodore; Hatzigeorgiou, A.G.

    2013-01-01

    MicroRNAs (miRNAs) are small endogenous RNA molecules that regulate gene expression through mRNA degradation and/or translation repression, affecting many biological processes. DIANA-microT web server (http://www.microrna.gr/webServer) is dedicated to miRNA target prediction/functional analysis, and it is being widely used from the scientific community, since its initial launch in 2009. DIANA-microT v5.0, the new version of the microT server, has been significantly enhanced with an improved target prediction algorithm, DIANA-microT-CDS. It has been updated to incorporate miRBase version 18 and Ensembl version 69. The in silico-predicted miRNA–gene interactions in Homo sapiens, Mus musculus, Drosophila melanogaster and Caenorhabditis elegans exceed 11 million in total. The web server was completely redesigned, to host a series of sophisticated workflows, which can be used directly from the on-line web interface, enabling users without the necessary bioinformatics infrastructure to perform advanced multi-step functional miRNA analyses. For instance, one available pipeline performs miRNA target prediction using different thresholds and meta-analysis statistics, followed by pathway enrichment analysis. DIANA-microT web server v5.0 also supports a complete integration with the Taverna Workflow Management System (WMS), using the in-house developed DIANA-Taverna Plug-in. This plug-in provides ready-to-use modules for miRNA target prediction and functional analysis, which can be used to form advanced high-throughput analysis pipelines. PMID:23680784

  12. What is bioinformatics? A proposed definition and overview of the field.

    PubMed

    Luscombe, N M; Greenbaum, D; Gerstein, M

    2001-01-01

    The recent flood of data from genome sequences and functional genomics has given rise to new field, bioinformatics, which combines elements of biology and computer science. Here we propose a definition for this new field and review some of the research that is being pursued, particularly in relation to transcriptional regulatory systems. Our definition is as follows: Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical-chemistry) and then applying "informatics" techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Analyses in bioinformatics predominantly focus on three types of large datasets available in molecular biology: macromolecular structures, genome sequences, and the results of functional genomics experiments (e.g. expression data). Additional information includes the text of scientific papers and "relationship data" from metabolic pathways, taxonomy trees, and protein-protein interaction networks. Bioinformatics employs a wide range of computational techniques including sequence and structural alignment, database design and data mining, macromolecular geometry, phylogenetic tree construction, prediction of protein structure and function, gene finding, and expression data clustering. The emphasis is on approaches integrating a variety of computational methods and heterogeneous data sources. Finally, bioinformatics is a practical discipline. We survey some representative applications, such as finding homologues, designing drugs, and performing large-scale censuses. Additional information pertinent to the review is available over the web at http://bioinfo.mbb.yale.edu/what-is-it.

  13. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  14. Planning bioinformatics workflows using an expert system.

    PubMed

    Chen, Xiaoling; Chang, Jeffrey T

    2017-04-15

    Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. https://github.com/jefftc/changlab. jeffrey.t.chang@uth.tmc.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Planning bioinformatics workflows using an expert system

    PubMed Central

    Chen, Xiaoling; Chang, Jeffrey T.

    2017-01-01

    Abstract Motivation: Bioinformatic analyses are becoming formidably more complex due to the increasing number of steps required to process the data, as well as the proliferation of methods that can be used in each step. To alleviate this difficulty, pipelines are commonly employed. However, pipelines are typically implemented to automate a specific analysis, and thus are difficult to use for exploratory analyses requiring systematic changes to the software or parameters used. Results: To automate the development of pipelines, we have investigated expert systems. We created the Bioinformatics ExperT SYstem (BETSY) that includes a knowledge base where the capabilities of bioinformatics software is explicitly and formally encoded. BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the richness of biological data, and an inference engine that reasons on the knowledge base to produce workflows. Currently, the knowledge base is populated with rules to analyze microarray and next generation sequencing data. We evaluated BETSY and found that it could generate workflows that reproduce and go beyond previously published bioinformatics results. Finally, a meta-investigation of the workflows generated from the knowledge base produced a quantitative measure of the technical burden imposed by each step of bioinformatics analyses, revealing the large number of steps devoted to the pre-processing of data. In sum, an expert system approach can facilitate exploratory bioinformatic analysis by automating the development of workflows, a task that requires significant domain expertise. Availability and Implementation: https://github.com/jefftc/changlab Contact: jeffrey.t.chang@uth.tmc.edu PMID:28052928

  16. NGS for the Masses: Empowering Biologists to Improve Bioinformatics Productivity ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Qaadri, Kashef [Biomatters Inc., San Francisco, CA (United States)

    2018-05-21

    Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  17. NGS for the Masses: Empowering Biologists to Improve Bioinformatics Productivity ( 7th Annual SFAF Meeting, 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qaadri, Kashef

    2012-06-01

    Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  18. Advanced colorectal adenoma related gene expression signature may predict prognostic for colorectal cancer patients with adenoma-carcinoma sequence.

    PubMed

    Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun

    2015-01-01

    There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinformatics software, and we analyzed the relationship between gene expression profiles of adenoma-adenocarcinoma sequence and clinical prognosis of colorectal cancer. The mRNA expressions of adenoma-carcinoma sequence were significantly different between high-grade intraepithelial neoplasia group and adenocarcinoma group. The biological process of gene ontology function enrichment analysis on differentially expressed genes between high-grade intraepithelial neoplasia group and adenocarcinoma group showed that genes enriched in the extracellular structure organization, skeletal system development, biological adhesion and itself regulated growth regulation, with the P value after FDR correction of less than 0.05. In addition, IPR-related protein mainly focused on the insulin-like growth factor binding proteins. The variable trends of gene expression profiles for adenoma-carcinoma sequence were mainly concentrated in high-grade intraepithelial neoplasia and adenocarcinoma. The differentially expressed genes are significantly correlated between high-grade intraepithelial neoplasia group and adenocarcinoma group. Bioinformatics analysis is an effective way to study the gene expression profiles in the adenoma-carcinoma sequence, and may provide an effective tool to involve colorectal cancer research strategy into colorectal adenoma or advanced adenoma.

  19. Human L-DOPA decarboxylase mRNA is a target of miR-145: A prediction to validation workflow.

    PubMed

    Papadopoulos, Emmanuel I; Fragoulis, Emmanuel G; Scorilas, Andreas

    2015-01-10

    l-DOPA decarboxylase (DDC) is a multiply-regulated gene which encodes the enzyme that catalyzes the biosynthesis of dopamine in humans. MicroRNAs comprise a novel class of endogenously transcribed small RNAs that can post-transcriptionally regulate the expression of various genes. Given that the mechanism of microRNA target recognition remains elusive, several genes, including DDC, have not yet been identified as microRNA targets. Nevertheless, a number of specifically designed bioinformatic algorithms provide candidate miRNAs for almost every gene, but still their results exhibit moderate accuracy and should be experimentally validated. Motivated by the above, we herein sought to discover a microRNA that regulates DDC expression. By using the current algorithms according to bibliographic recommendations we found that miR-145 could be predicted with high specificity as a candidate regulatory microRNA for DDC expression. Thus, a validation experiment followed by firstly transfecting an appropriate cell culture system with a synthetic miR-145 sequence and sequentially assessing the mRNA and protein levels of DDC via real-time PCR and Western blotting, respectively. Our analysis revealed that miR-145 had no significant impact on protein levels of DDC but managed to dramatically downregulate its mRNA expression. Overall, the experimental and bioinformatic analysis conducted herein indicate that miR-145 has the ability to regulate DDC mRNA expression and potentially this occurs by recognizing its mRNA as a target. Copyright © 2014. Published by Elsevier B.V.

  20. PSMB5 plays a dual role in cancer development and immunosuppression

    PubMed Central

    Wang, Chih-Yang; Li, Chung-Yen; Hsu, Hui-Ping; Cho, Chien-Yu; Yen, Meng-Chi; Weng, Tzu-Yang; Chen, Wei-Ching; Hung, Yu-Hsuan; Lee, Kuo-Ting; Hung, Jui-Hsiang; Chen, Yi-Ling; Lai, Ming-Derg

    2017-01-01

    Tumor progression and metastasis are dependent on the intrinsic properties of tumor cells and the influence of microenvironment including the immune system. It would be important to identify target drug that can inhibit cancer cell and activate immune cells. Proteasome β subunits (PSMB) family, one component of the ubiquitin-proteasome system, has been demonstrated to play an important role in tumor cells and immune cells. Therefore, we used a bioinformatics approach to examine the potential role of PSMB family. Analysis of breast TCGA and METABRIC database revealed that high expression of PSMB5 was observed in breast cancer tissue and that high expression of PSMB5 predicted worse survival. In addition, high expression of PSMB5 was observed in M2 macrophages. Based on our bioinformatics analysis, we hypothesized that PSMB5 contained immunosuppressive and oncogenic characteristics. To study the effects of PSMB5 on the cancer cell and macrophage in vitro, we silenced PSMB5 expression with shRNA in THP-1 monocytes and MDA-MB-231 cells respectively. Knockdown of PSMB5 promoted human THP-1 monocyte differentiation into M1 macrophage. On the other hand, knockdown PSMB5 gene expression inhibited MDA-MB-231 cell growth and migration by colony formation assay and boyden chamber. Collectively, our data demonstrated that delivery of PSMB5 shRNA suppressed cell growth and activated defensive M1 macrophages in vitro. Furthermore, lentiviral delivery of PSMB5 shRNA significantly decreased tumor growth in a subcutaneous mouse model. In conclusion, our bioinformatics study and functional experiments revealed that PSMB5 served as novel cancer therapeutic targets. These results also demonstrated a novel translational approach to improve cancer immunotherapy. PMID:29218236

  1. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  2. Cloning and expression analysis of FaPR-1 gene in strawberry

    NASA Astrophysics Data System (ADS)

    Mo, Fan; Luo, Ya; Ge, Cong; Mo, Qin; Ling, Yajie; Luo, Shu; Tang, Haoru

    2018-04-01

    The FaPR-1 gene was cloned by RT-PCR from `Benihoppe' strawberry and its bioinformatics analysis was conducted. The results showed that the open reading frame was 483 bp encoding encoding l60 amino acids which protein molecular weight and theoretical isoelectricity were 17854.17 and 8.72 respectively. Subcellular localization prediction shows that this gene is located extracellularly. By comparing strawberry FaPR-l and other plant Pathogenesis-related protein, homology and phylogenetic tree construction showed that the homology with grapes, peach is relatively close. In the treatments of ABA, sucrose and the mixture of the two, the expression of FaPR-1 in strawberry fruit were significantly increased.

  3. Prediction and Dissection of Protein-RNA Interactions by Molecular Descriptors.

    PubMed

    Liu, Zhi-Ping; Chen, Luonan

    2016-01-01

    Protein-RNA interactions play crucial roles in numerous biological processes. However, detecting the interactions and binding sites between protein and RNA by traditional experiments is still time consuming and labor costing. Thus, it is of importance to develop bioinformatics methods for predicting protein-RNA interactions and binding sites. Accurate prediction of protein-RNA interactions and recognitions will highly benefit to decipher the interaction mechanisms between protein and RNA, as well as to improve the RNA-related protein engineering and drug design. In this work, we summarize the current bioinformatics strategies of predicting protein-RNA interactions and dissecting protein-RNA interaction mechanisms from local structure binding motifs. In particular, we focus on the feature-based machine learning methods, in which the molecular descriptors of protein and RNA are extracted and integrated as feature vectors of representing the interaction events and recognition residues. In addition, the available methods are classified and compared comprehensively. The molecular descriptors are expected to elucidate the binding mechanisms of protein-RNA interaction and reveal the functional implications from structural complementary perspective.

  4. Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes

    PubMed Central

    Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.

    2013-01-01

    Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: da1@sanger.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469

  5. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    NASA Astrophysics Data System (ADS)

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-02-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.

  6. Novel EDA mutation in X-linked hypohidrotic ectodermal dysplasia and genotype-phenotype correlation.

    PubMed

    Zeng, B; Lu, H; Xiao, X; Zhou, L; Lu, J; Zhu, L; Yu, D; Zhao, W

    2015-11-01

    X-linked hypohidrotic ectodermal dysplasia (XLHED) is characterized by abnormalities of hair, teeth, and sweat glands, while non-syndromic hypodontia (NSH) affects only teeth. Mutations in Ectodysplasin A (EDA) underlie both XLHED and NSH. This study investigated the genetic causes of six hypohidrotic ectodermal dysplasia (HED) patients and genotype-phenotype correlation. The EDA gene of six patients with HED was sequenced. Bioinformatics analysis and structural modeling for the mutations were performed. The records of 134 patients with XLHED and EDA-related NSH regarding numbers of missing permanent teeth from this study and 20 articles were reviewed. Nonparametric tests were used to analyze genotype-phenotype correlations. In four of the six patients, we identified a novel mutation c.852T>G (p.Phe284Leu) and three reported mutations: c.467G>A (p.Arg156His), c.776C>A (p.Ala259Glu), and c.871G>A (p.Gly291Arg). They were predicted to be pathogenic by bioinformatics analysis and structural modeling. Genotype-phenotype correlation analysis revealed that truncating mutations were associated with more missing teeth. Missense mutations and the mutations affecting the TNF homology domain were correlated with fewer missing teeth. This study extended the mutation spectrum of XLHED and revealed the relationship between genotype and the number of missing permanent teeth. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  7. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    PubMed Central

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-01-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm. PMID:26879404

  8. Confirmation of the pathogenicity of a mutation p.G337C in the COL1A2 gene associated with osteogenesis imperfecta

    PubMed Central

    Jia, Mingrui; Shi, Ranran; Zhao, Xuli; Fu, Zhijian; Bai, Zhijing; Sun, Tao; Zhao, Xuejun; Wang, Wenbo; Xu, Chao; Yan, Fang

    2017-01-01

    Abstract Mutation analysis as the gold standard is particularly important in diagnosis of osteogenesis imperfecta (OI) and it may be preventable upon early diagnosis. In this study, we aimed to analyze the clinical and genetic materials of an OI pedigree as well as to confirm the deleterious property of the mutation. A pedigree with OI was identified. All family members received careful clinical examinations and blood was drawn for genetic analyses. Genes implicated in OI were screened for mutation. The function and structure of the mutant protein were predicted using bioinformatics analysis. The proband, a 9-month fetus, showed abnormal sonographic images. Disproportionately short and triangular face with blue sclera was noticed at birth. She can barely walk and suffered multiple fractures till 2-year old. Her mother appeared small stature, frequent fractures, blue sclera, and deformity of extremities. A heterozygous missense mutation c.1009G>T (p.G337C) in the COL1A2 gene was identified in her mother and her. Bioinformatics analysis showed p.G337 was well-conserved among multiple species and the mutation probably changed the structure and damaged the function of collagen. We suggest that the mutation p.G337C in the COL1A2 gene is pathogenic for OI by affecting the protein structure and the function of collagen. PMID:28953610

  9. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters.

    PubMed

    Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.

  10. BGDMdocker: a Docker workflow for data mining and visualization of bacterial pan-genomes and biosynthetic gene clusters

    PubMed Central

    Cheng, Gong; Zhang, Guocai; Xu, Liang

    2017-01-01

    Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317

  11. NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    PubMed Central

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

    2011-01-01

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign. PMID:22073191

  12. NNAlign: a web-based prediction method allowing non-expert end-user discovery of sequence motifs in quantitative peptide data.

    PubMed

    Andreatta, Massimo; Schafer-Nielsen, Claus; Lund, Ole; Buus, Søren; Nielsen, Morten

    2011-01-01

    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new "omics"-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points. NNAlign is available online at http://www.cbs.dtu.dk/services/NNAlign.

  13. Bioinformatic analysis for allergenicity assessment of Bacillus thuringiensis Cry proteins expressed in insect-resistant food crops.

    PubMed

    Randhawa, Gurinder Jit; Singh, Monika; Grover, Monendra

    2011-02-01

    The novel proteins introduced into the genetically modified (GM) crops need to be evaluated for the potential allergenicity before their introduction into the food chain to address the safety concerns of consumers. At present, there is no single definitive test that can be relied upon to predict allergic response in humans to a new protein; hence a composite approach to allergic response prediction is described in this study. The present study reports on the evaluation of the Cry proteins, encoded by cry1Ac, cry1Ab, cry2Ab, cry1Ca, cry1Fa/cry1Ca hybrid, being expressed in Bt food crops that are under field trials in India, for potential allergenic cross-reactivity using bioinformatics search tools. The sequence identity of amino acids was analyzed using FASTA3 of AllergenOnline version 10.0 and BLASTX of NCBI Entrez to identify any potential sequence matches to allergen proteins. As a step further in the detection of allergens, an independent database of domains in the allergens available in the AllergenOnline database was also developed. The results indicated no significant alignment and similarity of Cry proteins at domain level with any of the known allergens revealing that there is no potential risk of allergenic cross-reactivity. Copyright © 2010 Elsevier Ltd. All rights reserved.

  14. Analysis of Bioactive Amino Acids from Fish Hydrolysates with a New Bioinformatic Intelligent System Approach.

    PubMed

    Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu

    2017-09-07

    The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.

  15. Modeling positional effects of regulatory sequences with spline transformations increases prediction accuracy of deep neural networks

    PubMed Central

    Avsec, Žiga; Cheng, Jun; Gagneur, Julien

    2018-01-01

    Abstract Motivation Regulatory sequences are not solely defined by their nucleic acid sequence but also by their relative distances to genomic landmarks such as transcription start site, exon boundaries or polyadenylation site. Deep learning has become the approach of choice for modeling regulatory sequences because of its strength to learn complex sequence features. However, modeling relative distances to genomic landmarks in deep neural networks has not been addressed. Results Here we developed spline transformation, a neural network module based on splines to flexibly and robustly model distances. Modeling distances to various genomic landmarks with spline transformations significantly increased state-of-the-art prediction accuracy of in vivo RNA-binding protein binding sites for 120 out of 123 proteins. We also developed a deep neural network for human splice branchpoint based on spline transformations that outperformed the current best, already distance-based, machine learning model. Compared to piecewise linear transformation, as obtained by composition of rectified linear units, spline transformation yields higher prediction accuracy as well as faster and more robust training. As spline transformation can be applied to further quantities beyond distances, such as methylation or conservation, we foresee it as a versatile component in the genomics deep learning toolbox. Availability and implementation Spline transformation is implemented as a Keras layer in the CONCISE python package: https://github.com/gagneurlab/concise. Analysis code is available at https://github.com/gagneurlab/Manuscript_Avsec_Bioinformatics_2017. Contact avsec@in.tum.de or gagneur@in.tum.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29155928

  16. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  17. Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling

    PubMed Central

    Fang, H; Tong, W; Perkins, R; Shi, L; Hong, H; Cao, X; Xie, Q; Yim, SH; Ward, JM; Pitot, HC; Dragan, YP

    2005-01-01

    Background The completion of the sequencing of human, mouse and rat genomes and knowledge of cross-species gene homologies enables studies of differential gene expression in animal models. These types of studies have the potential to greatly enhance our understanding of diseases such as liver cancer in humans. Genes co-expressed across multiple species are most likely to have conserved functions. We have used various bioinformatics approaches to examine microarray expression profiles from liver neoplasms that arise in albumin-SV40 transgenic rats to elucidate genes, chromosome aberrations and pathways that might be associated with human liver cancer. Results In this study, we first identified 2223 differentially expressed genes by comparing gene expression profiles for two control, two adenoma and two carcinoma samples using an F-test. These genes were subsequently mapped to the rat chromosomes using a novel visualization tool, the Chromosome Plot. Using the same plot, we further mapped the significant genes to orthologous chromosomal locations in human and mouse. Many genes expressed in rat 1q that are amplified in rat liver cancer map to the human chromosomes 10, 11 and 19 and to the mouse chromosomes 7, 17 and 19, which have been implicated in studies of human and mouse liver cancer. Using Comparative Genomics Microarray Analysis (CGMA), we identified regions of potential aberrations in human. Lastly, a pathway analysis was conducted to predict altered human pathways based on statistical analysis and extrapolation from the rat data. All of the identified pathways have been known to be important in the etiology of human liver cancer, including cell cycle control, cell growth and differentiation, apoptosis, transcriptional regulation, and protein metabolism. Conclusion The study demonstrates that the hepatic gene expression profiles from the albumin-SV40 transgenic rat model revealed genes, pathways and chromosome alterations consistent with experimental and clinical research in human liver cancer. The bioinformatics tools presented in this paper are essential for cross species extrapolation and mapping of microarray data, its analysis and interpretation. PMID:16026603

  18. Using Kepler for Tool Integration in Microarray Analysis Workflows.

    PubMed

    Gan, Zhuohui; Stowe, Jennifer C; Altintas, Ilkay; McCulloch, Andrew D; Zambon, Alexander C

    Increasing numbers of genomic technologies are leading to massive amounts of genomic data, all of which requires complex analysis. More and more bioinformatics analysis tools are being developed by scientist to simplify these analyses. However, different pipelines have been developed using different software environments. This makes integrations of these diverse bioinformatics tools difficult. Kepler provides an open source environment to integrate these disparate packages. Using Kepler, we integrated several external tools including Bioconductor packages, AltAnalyze, a python-based open source tool, and R-based comparison tool to build an automated workflow to meta-analyze both online and local microarray data. The automated workflow connects the integrated tools seamlessly, delivers data flow between the tools smoothly, and hence improves efficiency and accuracy of complex data analyses. Our workflow exemplifies the usage of Kepler as a scientific workflow platform for bioinformatics pipelines.

  19. Scalability and Validation of Big Data Bioinformatics Software.

    PubMed

    Yang, Andrian; Troup, Michael; Ho, Joshua W K

    2017-01-01

    This review examines two important aspects that are central to modern big data bioinformatics analysis - software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

  20. StrBioLib: a Java library for development of custom computational structural biology applications.

    PubMed

    Chandonia, John-Marc

    2007-08-01

    StrBioLib is a library of Java classes useful for developing software for computational structural biology research. StrBioLib contains classes to represent and manipulate protein structures, biopolymer sequences, sets of biopolymer sequences, and alignments between biopolymers based on either sequence or structure. Interfaces are provided to interact with commonly used bioinformatics applications, including (psi)-blast, modeller, muscle and Primer3, and tools are provided to read and write many file formats used to represent bioinformatic data. The library includes a general-purpose neural network object with multiple training algorithms, the Hooke and Jeeves non-linear optimization algorithm, and tools for efficient C-style string parsing and formatting. StrBioLib is the basis for the Pred2ary secondary structure prediction program, is used to build the astral compendium for sequence and structure analysis, and has been extensively tested through use in many smaller projects. Examples and documentation are available at the site below. StrBioLib may be obtained under the terms of the GNU LGPL license from http://strbio.sourceforge.net/

  1. CscoreTool: fast Hi-C compartment analysis at high resolution.

    PubMed

    Zheng, Xiaobin; Zheng, Yixian

    2018-05-01

    The genome-wide chromosome conformation capture (Hi-C) has revealed that the eukaryotic genome can be partitioned into A and B compartments that have distinctive chromatin and transcription features. Current Principle Component Analyses (PCA)-based method for the A/B compartment prediction based on Hi-C data requires substantial CPU time and memory. We report the development of a method, CscoreTool, which enables fast and memory-efficient determination of A/B compartments at high resolution even in datasets with low sequencing depth. https://github.com/scoutzxb/CscoreTool. xzheng@carnegiescience.edu. Supplementary data are available at Bioinformatics online.

  2. Computational method and system for modeling, analyzing, and optimizing DNA amplification and synthesis

    DOEpatents

    Vandersall, Jennifer A.; Gardner, Shea N.; Clague, David S.

    2010-05-04

    A computational method and computer-based system of modeling DNA synthesis for the design and interpretation of PCR amplification, parallel DNA synthesis, and microarray chip analysis. The method and system include modules that address the bioinformatics, kinetics, and thermodynamics of DNA amplification and synthesis. Specifically, the steps of DNA selection, as well as the kinetics and thermodynamics of DNA hybridization and extensions, are addressed, which enable the optimization of the processing and the prediction of the products as a function of DNA sequence, mixing protocol, time, temperature and concentration of species.

  3. Generalized Centroid Estimators in Bioinformatics

    PubMed Central

    Hamada, Michiaki; Kiryu, Hisanori; Iwasaki, Wataru; Asai, Kiyoshi

    2011-01-01

    In a number of estimation problems in bioinformatics, accuracy measures of the target problem are usually given, and it is important to design estimators that are suitable to those accuracy measures. However, there is often a discrepancy between an employed estimator and a given accuracy measure of the problem. In this study, we introduce a general class of efficient estimators for estimation problems on high-dimensional binary spaces, which represent many fundamental problems in bioinformatics. Theoretical analysis reveals that the proposed estimators generally fit with commonly-used accuracy measures (e.g. sensitivity, PPV, MCC and F-score) as well as it can be computed efficiently in many cases, and cover a wide range of problems in bioinformatics from the viewpoint of the principle of maximum expected accuracy (MEA). It is also shown that some important algorithms in bioinformatics can be interpreted in a unified manner. Not only the concept presented in this paper gives a useful framework to design MEA-based estimators but also it is highly extendable and sheds new light on many problems in bioinformatics. PMID:21365017

  4. Phosphoproteomics and Bioinformatics Analyses of Spinal Cord Proteins in Rats with Morphine Tolerance

    PubMed Central

    Liaw, Wen-Jinn; Tsao, Cheng-Ming; Huang, Go-Shine; Wu, Chin-Chen; Ho, Shung-Tai; Wang, Jhi-Joung; Tao, Yuan-Xiang; Shui, Hao-Ai

    2014-01-01

    Introduction Morphine is the most effective pain-relieving drug, but it can cause unwanted side effects. Direct neuraxial administration of morphine to spinal cord not only can provide effective, reliable pain relief but also can prevent the development of supraspinal side effects. However, repeated neuraxial administration of morphine may still lead to morphine tolerance. Methods To better understand the mechanism that causes morphine tolerance, we induced tolerance in rats at the spinal cord level by giving them twice-daily injections of morphine (20 µg/10 µL) for 4 days. We confirmed tolerance by measuring paw withdrawal latencies and maximal possible analgesic effect of morphine on day 5. We then carried out phosphoproteomic analysis to investigate the global phosphorylation of spinal proteins associated with morphine tolerance. Finally, pull-down assays were used to identify phosphorylated types and sites of 14-3-3 proteins, and bioinformatics was applied to predict biological networks impacted by the morphine-regulated proteins. Results Our proteomics data showed that repeated morphine treatment altered phosphorylation of 10 proteins in the spinal cord. Pull-down assays identified 2 serine/threonine phosphorylated sites in 14-3-3 proteins. Bioinformatics further revealed that morphine impacted on cytoskeletal reorganization, neuroplasticity, protein folding and modulation, signal transduction and biomolecular metabolism. Conclusions Repeated morphine administration may affect multiple biological networks by altering protein phosphorylation. These data may provide insight into the mechanism that underlies the development of morphine tolerance. PMID:24392096

  5. HoloVir: A Workflow for Investigating the Diversity and Function of Viruses in Invertebrate Holobionts

    PubMed Central

    Laffy, Patrick W.; Wood-Charlson, Elisha M.; Turaev, Dmitrij; Weynberg, Karen D.; Botté, Emmanuelle S.; van Oppen, Madeleine J. H.; Webster, Nicole S.; Rattei, Thomas

    2016-01-01

    Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments. PMID:27375564

  6. Deletion of the transcriptional coactivator PGC1α in skeletal muscles is associated with reduced expression of genes related to oxidative muscle function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hatazawa, Yukino; Research Fellow of Japan Society for the Promotion of Science, Tokyo; Minami, Kimiko

    The expression of the transcriptional coactivator PGC1α is increased in skeletal muscles during exercise. Previously, we showed that increased PGC1α leads to prolonged exercise performance (the duration for which running can be continued) and, at the same time, increases the expression of branched-chain amino acid (BCAA) metabolism-related enzymes and genes that are involved in supplying substrates for the TCA cycle. We recently created mice with PGC1α knockout specifically in the skeletal muscles (PGC1α KO mice), which show decreased mitochondrial content. In this study, global gene expression (microarray) analysis was performed in the skeletal muscles of PGC1α KO mice compared withmore » that of wild-type control mice. As a result, decreased expression of genes involved in the TCA cycle, oxidative phosphorylation, and BCAA metabolism were observed. Compared with previously obtained microarray data on PGC1α-overexpressing transgenic mice, each gene showed the completely opposite direction of expression change. Bioinformatic analysis of the promoter region of genes with decreased expression in PGC1α KO mice predicted the involvement of several transcription factors, including a nuclear receptor, ERR, in their regulation. As PGC1α KO microarray data in this study show opposing findings to the PGC1α transgenic data, a loss-of-function experiment, as well as a gain-of-function experiment, revealed PGC1α’s function in the oxidative energy metabolism of skeletal muscles. - Highlights: • Microarray analysis was performed in the skeletal muscle of PGC1α KO mice. • Expression of genes in the oxidative energy metabolism was decreased. • Bioinformatic analysis of promoter region of the genes predicted involvement of ERR. • PGC1α KO microarray data in this study show the mirror image of transgenic data.« less

  7. VerSeDa: vertebrate secretome database

    PubMed Central

    Cortazar, Ana R.; Oguiza, José A.

    2017-01-01

    Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. Database URL: VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php PMID:28365718

  8. VerSeDa: vertebrate secretome database.

    PubMed

    Cortazar, Ana R; Oguiza, José A; Aransay, Ana M; Lavín, José L

    2017-01-01

    Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php. © The Author(s) 2017. Published by Oxford University Press.

  9. Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sulakhe, D.; Rodriguez, A.; Wilde, M.

    2008-03-01

    Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less

  10. PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

    PubMed

    Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

    2017-11-01

    High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.

  11. Survey of MapReduce frame operation in bioinformatics.

    PubMed

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  12. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    DOE PAGES

    Wang, Pin; Wang, Yunshan; Hang, Bo; ...

    2016-07-11

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less

  13. Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

    NASA Astrophysics Data System (ADS)

    Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

    2017-02-01

    Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.

  14. A novel gene expression-based prognostic scoring system to predict survival in gastric cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Pin; Wang, Yunshan; Hang, Bo

    Analysis of gene expression patterns in gastric cancer (GC) can help to identify a comprehensive panel of gene biomarkers for predicting clinical outcomes and to discover potential new therapeutic targets. Here, a multi-step bioinformatics analytic approach was developed to establish a novel prognostic scoring system for GC. We first identified 276 genes that were robustly differentially expressed between normal and GC tissues, of which, 249 were found to be significantly associated with overall survival (OS) by univariate Cox regression analysis. The biological functions of 249 genes are related to cell cycle, RNA/ncRNA process, acetylation and extracellular matrix organization. A networkmore » was generated for view of the gene expression architecture of 249 genes in 265 GCs. Finally, we applied a canonical discriminant analysis approach to identify a 53-gene signature and a prognostic scoring system was established based on a canonical discriminant function of 53 genes. The prognostic scores strongly predicted patients with GC to have either a poor or good OS. Our study raises the prospect that the practicality of GC patient prognosis can be assessed by this prognostic scoring system.« less

  15. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

    PubMed

    Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

    2011-08-01

    The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. A case study of tuning MapReduce for efficient Bioinformatics in the cloud

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Lizhen; Wang, Zhong; Yu, Weikuan

    The combination of the Hadoop MapReduce programming model and cloud computing allows biological scientists to analyze next-generation sequencing (NGS) data in a timely and cost-effective manner. Cloud computing platforms remove the burden of IT facility procurement and management from end users and provide ease of access to Hadoop clusters. However, biological scientists are still expected to choose appropriate Hadoop parameters for running their jobs. More importantly, the available Hadoop tuning guidelines are either obsolete or too general to capture the particular characteristics of bioinformatics applications. In this paper, we aim to minimize the cloud computing cost spent on bioinformatics datamore » analysis by optimizing the extracted significant Hadoop parameters. When using MapReduce-based bioinformatics tools in the cloud, the default settings often lead to resource underutilization and wasteful expenses. We choose k-mer counting, a representative application used in a large number of NGS data analysis tools, as our study case. Experimental results show that, with the fine-tuned parameters, we achieve a total of 4× speedup compared with the original performance (using the default settings). Finally, this paper presents an exemplary case for tuning MapReduce-based bioinformatics applications in the cloud, and documents the key parameters that could lead to significant performance benefits.« less

  17. Visualising "Junk" DNA through Bioinformatics

    ERIC Educational Resources Information Center

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  18. Bioinformatics and Microarray Data Analysis on the Cloud.

    PubMed

    Calabrese, Barbara; Cannataro, Mario

    2016-01-01

    High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.

  19. An automated benchmarking platform for MHC class II binding prediction methods.

    PubMed

    Andreatta, Massimo; Trolle, Thomas; Yan, Zhen; Greenbaum, Jason A; Peters, Bjoern; Nielsen, Morten

    2018-05-01

    Computational methods for the prediction of peptide-MHC binding have become an integral and essential component for candidate selection in experimental T cell epitope discovery studies. The sheer amount of published prediction methods-and often discordant reports on their performance-poses a considerable quandary to the experimentalist who needs to choose the best tool for their research. With the goal to provide an unbiased, transparent evaluation of the state-of-the-art in the field, we created an automated platform to benchmark peptide-MHC class II binding prediction tools. The platform evaluates the absolute and relative predictive performance of all participating tools on data newly entered into the Immune Epitope Database (IEDB) before they are made public, thereby providing a frequent, unbiased assessment of available prediction tools. The benchmark runs on a weekly basis, is fully automated, and displays up-to-date results on a publicly accessible website. The initial benchmark described here included six commonly used prediction servers, but other tools are encouraged to join with a simple sign-up procedure. Performance evaluation on 59 data sets composed of over 10 000 binding affinity measurements suggested that NetMHCIIpan is currently the most accurate tool, followed by NN-align and the IEDB consensus method. Weekly reports on the participating methods can be found online at: http://tools.iedb.org/auto_bench/mhcii/weekly/. mniel@bioinformatics.dtu.dk. Supplementary data are available at Bioinformatics online.

  20. KBWS: an EMBOSS associated package for accessing bioinformatics web services.

    PubMed

    Oshita, Kazuki; Arakawa, Kazuharu; Tomita, Masaru

    2011-04-29

    The availability of bioinformatics web-based services is rapidly proliferating, for their interoperability and ease of use. The next challenge is in the integration of these services in the form of workflows, and several projects are already underway, standardizing the syntax, semantics, and user interfaces. In order to deploy the advantages of web services with locally installed tools, here we describe a collection of proxy client tools for 42 major bioinformatics web services in the form of European Molecular Biology Open Software Suite (EMBOSS) UNIX command-line tools. EMBOSS provides sophisticated means for discoverability and interoperability for hundreds of tools, and our package, named the Keio Bioinformatics Web Service (KBWS), adds functionalities of local and multiple alignment of sequences, phylogenetic analyses, and prediction of cellular localization of proteins and RNA secondary structures. This software implemented in C is available under GPL from http://www.g-language.org/kbws/ and GitHub repository http://github.com/cory-ko/KBWS. Users can utilize the SOAP services implemented in Perl directly via WSDL file at http://soap.g-language.org/kbws.wsdl (RPC Encoded) and http://soap.g-language.org/kbws_dl.wsdl (Document/literal).

  1. Quantifying the Relationships among Drug Classes

    PubMed Central

    Hert, Jérôme; Keiser, Michael J.; Irwin, John J.; Oprea, Tudor I.; Shoichet, Brian K.

    2009-01-01

    The similarity of drug targets is typically measured using sequence or structural information. Here, we consider chemo-centric approaches that measure target similarity on the basis of their ligands, asking how chemoinformatics similarities differ from those derived bioinformatically, how stable the ligand networks are to changes in chemoinformatics metrics, and which network is the most reliable for prediction of pharmacology. We calculated the similarities between hundreds of drug targets and their ligands and mapped the relationship between them in a formal network. Bioinformatics networks were based on the BLAST similarity between sequences, while chemoinformatics networks were based on the ligand-set similarities calculated with either the Similarity Ensemble Approach (SEA) or a method derived from Bayesian statistics. By multiple criteria, bioinformatics and chemoinformatics networks differed substantially, and only occasionally did a high sequence similarity correspond to a high ligand-set similarity. In contrast, the chemoinformatics networks were stable to the method used to calculate the ligand-set similarities and to the chemical representation of the ligands. Also, the chemoinformatics networks were more natural and more organized, by network theory, than their bioinformatics counterparts: ligand-based networks were found to be small-world and broad-scale. PMID:18335977

  2. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  3. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

    PubMed

    Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

    2010-01-18

    The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

  4. RNA Sequencing and Bioinformatics Analysis Implicate the Regulatory Role of a Long Noncoding RNA-mRNA Network in Hepatic Stellate Cell Activation.

    PubMed

    Guo, Can-Jie; Xiao, Xiao; Sheng, Li; Chen, Lili; Zhong, Wei; Li, Hai; Hua, Jing; Ma, Xiong

    2017-01-01

    To analyze the long noncoding (lncRNA)-mRNA expression network and potential roles in rat hepatic stellate cells (HSCs) during activation. LncRNA expression was analyzed in quiescent and culture-activated HSCs by RNA sequencing, and differentially expressed lncRNAs verified by quantitative reverse transcription polymerase chain reaction (qRT-PCR) were subjected to bioinformatics analysis. In vivo analyses of differential lncRNA-mRNA expression were performed on a rat model of liver fibrosis. We identified upregulation of 12 lncRNAs and 155 mRNAs and downregulation of 12 lncRNAs and 374 mRNAs in activated HSCs. Additionally, we identified the differential expression of upregulated lncRNAs (NONRATT012636.2, NONRATT016788.2, and NONRATT021402.2) and downregulated lncRNAs (NONRATT007863.2, NONRATT019720.2, and NONRATT024061.2) in activated HSCs relative to levels observed in quiescent HSCs, and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses showed that changes in lncRNAs associated with HSC activation revealed 11 significantly enriched pathways according to their predicted targets. Moreover, based on the predicted co-expression network, the relative dynamic levels of NONRATT013819.2 and lysyl oxidase (Lox) were compared during HSC activation both in vitro and in vivo. Our results confirmed the upregulation of lncRNA NONRATT013819.2 and Lox mRNA associated with the extracellular matrix (ECM)-related signaling pathway in HSCs and fibrotic livers. Our results detailing a dysregulated lncRNA-mRNA network might provide new treatment strategies for hepatic fibrosis based on findings indicating potentially critical roles for NONRATT013819.2 and Lox in ECM remodeling during HSC activation. © 2017 The Author(s). Published by S. Karger AG, Basel.

  5. Thorough analysis of unorthodox ABO deletions called by the 1000 Genomes project.

    PubMed

    Möller, M; Hellberg, Å; Olsson, M L

    2018-02-01

    ABO remains the clinically most important blood group system, but despite earlier extensive research, significant findings are still being made. The vast majority of catalogued ABO null alleles are based on the c.261delG polymorphism. Apart from c.802G>A, other mechanisms for O alleles are rare. While analysing the data set from the 1000 Genomes (1000G) project, we encountered two previously uncharacterized deletions, which needed further exploration. The Erythrogene database, complemented with bioinformatics software, was used to analyse ABO in 2504 individuals from 1000G. DNA samples from selected 1000G donors and African blood donors were examined by allele-specific PCR and Sanger sequencing to characterize predicted deletions. A 5821-bp deletion encompassing exons 5-7 was called in twenty 1000G individuals, predominantly Africans. This allele was confirmed and its exact deletion point defined by bioinformatic analyses and in vitro experiments. A PCR assay was developed, and screening of African samples revealed three donors heterozygous for this deletion, which was thereby phenotypically established as an O allele. Analysis of upstream genetic markers indicated an ancestral origin from ABO*O.01.02. We estimate this deletion as the 3rd most common mechanism behind O alleles. A 24-bp deletion was called in nine individuals and showed greater diversity regarding ethnic distribution and allelic background. It could neither be confirmed by in silico nor in vitro experiments. A previously uncharacterized ABO deletion among Africans was comprehensively mapped and a genotyping strategy devised. The false prediction of another deletion emphasizes the need for cautious interpretation of NGS data and calls for strict validation routines. © 2017 International Society of Blood Transfusion.

  6. Bioinformatics: Cheap and robust method to explore biomaterial from Indonesia biodiversity

    NASA Astrophysics Data System (ADS)

    Widodo

    2015-02-01

    Indonesia has a huge amount of biodiversity, which may contain many biomaterials for pharmaceutical application. These resources potency should be explored to discover new drugs for human wealth. However, the bioactive screening using conventional methods is very expensive and time-consuming. Therefore, we developed a methodology for screening the potential of natural resources based on bioinformatics. The method is developed based on the fact that organisms in the same taxon will have similar genes, metabolism and secondary metabolites product. Then we employ bioinformatics to explore the potency of biomaterial from Indonesia biodiversity by comparing species with the well-known taxon containing the active compound through published paper or chemical database. Then we analyze drug-likeness, bioactivity and the target proteins of the active compound based on their molecular structure. The target protein was examined their interaction with other proteins in the cell to determine action mechanism of the active compounds in the cellular level, as well as to predict its side effects and toxicity. By using this method, we succeeded to screen anti-cancer, immunomodulators and anti-inflammation from Indonesia biodiversity. For example, we found anticancer from marine invertebrate by employing the method. The anti-cancer was explore based on the isolated compounds of marine invertebrate from published article and database, and then identified the protein target, followed by molecular pathway analysis. The data suggested that the active compound of the invertebrate able to kill cancer cell. Further, we collect and extract the active compound from the invertebrate, and then examined the activity on cancer cell (MCF7). The MTT result showed that the methanol extract of marine invertebrate was highly potent in killing MCF7 cells. Therefore, we concluded that bioinformatics is cheap and robust way to explore bioactive from Indonesia biodiversity for source of drug and another pharmaceutical material.

  7. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  8. Agents in bioinformatics, computational and systems biology.

    PubMed

    Merelli, Emanuela; Armano, Giuliano; Cannata, Nicola; Corradini, Flavio; d'Inverno, Mark; Doms, Andreas; Lord, Phillip; Martin, Andrew; Milanesi, Luciano; Möller, Steffen; Schroeder, Michael; Luck, Michael

    2007-01-01

    The adoption of agent technologies and multi-agent systems constitutes an emerging area in bioinformatics. In this article, we report on the activity of the Working Group on Agents in Bioinformatics (BIOAGENTS) founded during the first AgentLink III Technical Forum meeting on the 2nd of July, 2004, in Rome. The meeting provided an opportunity for seeding collaborations between the agent and bioinformatics communities to develop a different (agent-based) approach of computational frameworks both for data analysis and management in bioinformatics and for systems modelling and simulation in computational and systems biology. The collaborations gave rise to applications and integrated tools that we summarize and discuss in context of the state of the art in this area. We investigate on future challenges and argue that the field should still be explored from many perspectives ranging from bio-conceptual languages for agent-based simulation, to the definition of bio-ontology-based declarative languages to be used by information agents, and to the adoption of agents for computational grids.

  9. A Scientific Software Product Line for the Bioinformatics domain.

    PubMed

    Costa, Gabriella Castro B; Braga, Regina; David, José Maria N; Campos, Fernanda

    2015-08-01

    Most specialized users (scientists) that use bioinformatics applications do not have suitable training on software development. Software Product Line (SPL) employs the concept of reuse considering that it is defined as a set of systems that are developed from a common set of base artifacts. In some contexts, such as in bioinformatics applications, it is advantageous to develop a collection of related software products, using SPL approach. If software products are similar enough, there is the possibility of predicting their commonalities, differences and then reuse these common features to support the development of new applications in the bioinformatics area. This paper presents the PL-Science approach which considers the context of SPL and ontology in order to assist scientists to define a scientific experiment, and to specify a workflow that encompasses bioinformatics applications of a given experiment. This paper also focuses on the use of ontologies to enable the use of Software Product Line in biological domains. In the context of this paper, Scientific Software Product Line (SSPL) differs from the Software Product Line due to the fact that SSPL uses an abstract scientific workflow model. This workflow is defined according to a scientific domain and using this abstract workflow model the products (scientific applications/algorithms) are instantiated. Through the use of ontology as a knowledge representation model, we can provide domain restrictions as well as add semantic aspects in order to facilitate the selection and organization of bioinformatics workflows in a Scientific Software Product Line. The use of ontologies enables not only the expression of formal restrictions but also the inferences on these restrictions, considering that a scientific domain needs a formal specification. This paper presents the development of the PL-Science approach, encompassing a methodology and an infrastructure, and also presents an approach evaluation. This evaluation presents case studies in bioinformatics, which were conducted in two renowned research institutions in Brazil. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. The eBioKit, a stand-alone educational platform for bioinformatics.

    PubMed

    Hernández-de-Diego, Rafael; de Villiers, Etienne P; Klingström, Tomas; Gourlé, Hadrien; Conesa, Ana; Bongcam-Rudloff, Erik

    2017-09-01

    Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative.

  11. The eBioKit, a stand-alone educational platform for bioinformatics

    PubMed Central

    Conesa, Ana; Bongcam-Rudloff, Erik

    2017-01-01

    Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative. PMID:28910280

  12. Open discovery: An integrated live Linux platform of Bioinformatics tools.

    PubMed

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery - a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in.

  13. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  14. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

    PubMed

    El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter

    2012-01-01

    Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.

  15. Vision & Strategy: Predictive Ecotoxicology in the 21st Century

    DTIC Science & Technology

    2011-01-01

    their relative abundance or modifications QSARs —Correlation of ecological or toxicological activity with chemical structure to understand or predict...data and collection methods. The dramatic increase in the amount of toxicological data we can collect and analyze is complemented by our improved...diverse disciplines such as biochemistry, ecology, molecular biology, toxicology , bioinformatics, and health and environmental risk assess- ment

  16. A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research

    ERIC Educational Resources Information Center

    Campbell, Chad E.; Nehm, Ross H.

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…

  17. Designing and benchmarking the MULTICOM protein structure prediction system

    PubMed Central

    2013-01-01

    Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:23442819

  18. PEA: an integrated R toolkit for plant epitranscriptome analysis.

    PubMed

    Zhai, Jingjing; Song, Jie; Cheng, Qian; Tang, Yunjia; Ma, Chuang

    2018-05-29

    The epitranscriptome, also known as chemical modifications of RNA (CMRs), is a newly discovered layer of gene regulation, the biological importance of which emerged through analysis of only a small fraction of CMRs detected by high-throughput sequencing technologies. Understanding of the epitranscriptome is hampered by the absence of computational tools for the systematic analysis of epitranscriptome sequencing data. In addition, no tools have yet been designed for accurate prediction of CMRs in plants, or to extend epitranscriptome analysis from a fraction of the transcriptome to its entirety. Here, we introduce PEA, an integrated R toolkit to facilitate the analysis of plant epitranscriptome data. The PEA toolkit contains a comprehensive collection of functions required for read mapping, CMR calling, motif scanning and discovery, and gene functional enrichment analysis. PEA also takes advantage of machine learning technologies for transcriptome-scale CMR prediction, with high prediction accuracy, using the Positive Samples Only Learning algorithm, which addresses the two-class classification problem by using only positive samples (CMRs), in the absence of negative samples (non-CMRs). Hence PEA is a versatile epitranscriptome analysis pipeline covering CMR calling, prediction, and annotation, and we describe its application to predict N6-methyladenosine (m6A) modifications in Arabidopsis thaliana. Experimental results demonstrate that the toolkit achieved 71.6% sensitivity and 73.7% specificity, which is superior to existing m6A predictors. PEA is potentially broadly applicable to the in-depth study of epitranscriptomics. PEA Docker image is available at https://hub.docker.com/r/malab/pea, source codes and user manual are available at https://github.com/cma2015/PEA. chuangma2006@gmail.com. Supplementary data are available at Bioinformatics online.

  19. A Survey of Computational Intelligence Techniques in Protein Function Prediction

    PubMed Central

    Tiwari, Arvind Kumar; Srivastava, Rajeev

    2014-01-01

    During the past, there was a massive growth of knowledge of unknown proteins with the advancement of high throughput microarray technologies. Protein function prediction is the most challenging problem in bioinformatics. In the past, the homology based approaches were used to predict the protein function, but they failed when a new protein was different from the previous one. Therefore, to alleviate the problems associated with homology based traditional approaches, numerous computational intelligence techniques have been proposed in the recent past. This paper presents a state-of-the-art comprehensive review of various computational intelligence techniques for protein function predictions using sequence, structure, protein-protein interaction network, and gene expression data used in wide areas of applications such as prediction of DNA and RNA binding sites, subcellular localization, enzyme functions, signal peptides, catalytic residues, nuclear/G-protein coupled receptors, membrane proteins, and pathway analysis from gene expression datasets. This paper also summarizes the result obtained by many researchers to solve these problems by using computational intelligence techniques with appropriate datasets to improve the prediction performance. The summary shows that ensemble classifiers and integration of multiple heterogeneous data are useful for protein function prediction. PMID:25574395

  20. Computational intelligence techniques in bioinformatics.

    PubMed

    Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I

    2013-12-01

    Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Systematic prediction of control proteins and their DNA binding sites

    PubMed Central

    Sorokin, Valeriy; Severinov, Konstantin; Gelfand, Mikhail S.

    2009-01-01

    We present here the results of a systematic bioinformatics analysis of control (C) proteins, a class of DNA-binding regulators that control time-delayed transcription of their own genes as well as restriction endonuclease genes in many type II restriction-modification systems. More than 290 C protein homologs were identified and DNA-binding sites for ∼70% of new and previously known C proteins were predicted by a combination of phylogenetic footprinting and motif searches in DNA upstream of C protein genes. Additional analysis revealed that a large proportion of C protein genes are translated from leaderless RNA, which may contribute to time-delayed nature of genetic switches operated by these proteins. Analysis of genetic contexts of newly identified C protein genes revealed that they are not exclusively associated with restriction-modification genes; numerous instances of associations with genes originating from mobile genetic elements were observed. These instances might be vestiges of ancient horizontal transfers and indicate that during evolution ancestral restriction-modification system genes were the sites of mobile elements insertions. PMID:19056824

  2. Predicted molecular signaling guiding photoreceptor cell migration following transplantation into damaged retina

    NASA Astrophysics Data System (ADS)

    Unachukwu, Uchenna John; Warren, Alice; Li, Ze; Mishra, Shawn; Zhou, Jing; Sauane, Moira; Lim, Hyungsik; Vazquez, Maribel; Redenti, Stephen

    2016-03-01

    To replace photoreceptors lost to disease or trauma and restore vision, laboratories around the world are investigating photoreceptor replacement strategies using subretinal transplantation of photoreceptor precursor cells (PPCs) and retinal progenitor cells (RPCs). Significant obstacles to advancement of photoreceptor cell-replacement include low migration rates of transplanted cells into host retina and an absence of data describing chemotactic signaling guiding migration of transplanted cells in the damaged retinal microenvironment. To elucidate chemotactic signaling guiding transplanted cell migration, bioinformatics modeling of PPC transplantation into light-damaged retina was performed. The bioinformatics modeling analyzed whole-genome expression data and matched PPC chemotactic cell-surface receptors to cognate ligands expressed in the light-damaged retinal microenvironment. A library of significantly predicted chemotactic ligand-receptor pairs, as well as downstream signaling networks was generated. PPC and RPC migration in microfluidic ligand gradients were analyzed using a highly predicted ligand-receptor pair, SDF-1α - CXCR4, and both PPCs and RPCs exhibited significant chemotaxis. This work present a systems level model and begins to elucidate molecular mechanisms involved in PPC and RPC migration within the damaged retinal microenvironment.

  3. sRNAdb: A small non-coding RNA database for gram-positive bacteria

    PubMed Central

    2012-01-01

    Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983

  4. DWARF – a data warehouse system for analyzing protein families

    PubMed Central

    Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

    2006-01-01

    Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801

  5. Identification of a novel MIP frameshift mutation associated with congenital cataract in a Chinese family by whole-exome sequencing and functional analysis.

    PubMed

    Long, Xigui; Huang, Yanru; Tan, Hu; Li, Zhuo; Zhang, Rui; Linpeng, Siyuan; Lv, Weigang; Cao, Yingxi; Li, Haoxian; Liang, Desheng; Wu, Lingqian

    2018-04-26

    To detect the underlying pathogenesis of congenital cataract in a four-generation Chinese family. Whole-exome sequencing (WES) of family members (III:4, IV:4, and IV:6) was performed. Sanger sequencing and bioinformatics analysis were subsequently conducted. Full-length WT-MIP or K228fs-MIP fused to HA markers at the N-terminal was transfected into HeLa cells. Next, quantitative real-time PCR, western blotting and immunofluorescence confocal laser scanning were performed. The age of onset for nonsyndromic cataracts in male patients was by 1-year old, earlier than for female patients, who exhibited onset at adulthood. A novel c.682_683delAA (p.K228fs230X) mutation in main intrinsic protein (MIP) cosegregated with the cataract phenotype. The instability index and unfolded states for truncated MIP were predicted to increase by bioinformatics analysis. The mRNA transcription level of K228fs-MIP was reduced compared with that of WT-MIP, and K228fs-MIP protein expression was also lower than that of WT-MIP. Immunofluorescence images showed that WT-MIP principally localized to the plasma membrane, whereas the mutant protein was trapped in the cytoplasm. Our study generated genetic and primary functional evidence for a novel c.682_683delAA mutation in MIP that expands the variant spectrum of MIP and help us better understand the molecular basis of cataract.

  6. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.

  7. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. JABAWS 2.2 distributed web services for Bioinformatics: protein disorder, conservation and RNA secondary structure.

    PubMed

    Troshin, Peter V; Procter, James B; Sherstnev, Alexander; Barton, Daniel L; Madeira, Fábio; Barton, Geoffrey J

    2018-06-01

    JABAWS 2.2 is a computational framework that simplifies the deployment of web services for Bioinformatics. In addition to the five multiple sequence alignment (MSA) algorithms in JABAWS 1.0, JABAWS 2.2 includes three additional MSA programs (Clustal Omega, MSAprobs, GLprobs), four protein disorder prediction methods (DisEMBL, IUPred, Ronn, GlobPlot), 18 measures of protein conservation as implemented in AACon, and RNA secondary structure prediction by the RNAalifold program. JABAWS 2.2 can be deployed on a variety of in-house or hosted systems. JABAWS 2.2 web services may be accessed from the Jalview multiple sequence analysis workbench (Version 2.8 and later), as well as directly via the JABAWS command line interface (CLI) client. JABAWS 2.2 can be deployed on a local virtual server as a Virtual Appliance (VA) or simply as a Web Application Archive (WAR) for private use. Improvements in JABAWS 2.2 also include simplified installation and a range of utility tools for usage statistics collection, and web services querying and monitoring. The JABAWS CLI client has been updated to support all the new services and allow integration of JABAWS 2.2 services into conventional scripts. A public JABAWS 2 server has been in production since December 2011 and served over 800 000 analyses for users worldwide. JABAWS 2.2 is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws. g.j.barton@dundee.ac.uk.

  9. Mapping the tumour human leukocyte antigen (HLA) ligandome by mass spectrometry.

    PubMed

    Freudenmann, Lena Katharina; Marcu, Ana; Stevanović, Stefan

    2018-07-01

    The entirety of human leukocyte antigen (HLA)-presented peptides is referred to as the HLA ligandome of a cell or tissue, in tumours often termed immunopeptidome. Mapping the tumour immunopeptidome by mass spectrometry (MS) comprehensively views the pathophysiologically relevant antigenic signature of human malignancies. MS is an unbiased approach stringently filtering the candidates to be tested as opposed to epitope prediction algorithms. In the setting of peptide-specific immunotherapies, MS-based strategies significantly diminish the risk of lacking clinical benefit, as they yield highly enriched amounts of truly presented peptides. Early immunopeptidomic efforts were severely limited by technical sensitivity and manual spectra interpretation. The technological progress with development of orbitrap mass analysers and enhanced chromatographic performance led to vast improvements in mass accuracy, sensitivity, resolution, and speed. Concomitantly, bioinformatic tools were developed to process MS data, integrate sequencing results, and deconvolute multi-allelic datasets. This enabled the immense advancement of tumour immunopeptidomics. Studying the HLA-presented peptide repertoire bears high potential for both answering basic scientific questions and translational application. Mapping the tumour HLA ligandome has started to significantly contribute to target identification for the design of peptide-specific cancer immunotherapies in clinical trials and compassionate need treatments. In contrast to prediction algorithms, rare HLA allotypes and HLA class II can be adequately addressed when choosing MS-guided target identification platforms. Herein, we review the identification of tumour HLA ligands focusing on sources, methods, bioinformatic data analysis, translational application, and provide an outlook on future developments. © 2018 John Wiley & Sons Ltd.

  10. A Python Analytical Pipeline to Identify Prohormone Precursors and Predict Prohormone Cleavage Sites

    PubMed Central

    Southey, Bruce R.; Sweedler, Jonathan V.; Rodriguez-Zas, Sandra L.

    2008-01-01

    Neuropeptides and hormones are signaling molecules that support cell–cell communication in the central nervous system. Experimentally characterizing neuropeptides requires significant efforts because of the complex and variable processing of prohormone precursor proteins into neuropeptides and hormones. We demonstrate the power and flexibility of the Python language to develop components of an bioinformatic analytical pipeline to identify precursors from genomic data and to predict cleavage as these precursors are en route to the final bioactive peptides. We identified 75 precursors in the rhesus genome, predicted cleavage sites using support vector machines and compared the rhesus predictions to putative assignments based on homology to human sequences. The correct classification rate of cleavage using the support vector machines was over 97% for both human and rhesus data sets. The functionality of Python has been important to develop and maintain NeuroPred (http://neuroproteomics.scs.uiuc.edu/neuropred.html), a user-centered web application for the neuroscience community that provides cleavage site prediction from a wide range of models, precision and accuracy statistics, post-translational modifications, and the molecular mass of potential peptides. The combined results illustrate the suitability of the Python language to implement an all-inclusive bioinformatics approach to predict neuropeptides that encompasses a large number of interdependent steps, from scanning genomes for precursor genes to identification of potential bioactive neuropeptides. PMID:19169350

  11. Data mining in bioinformatics using Weka.

    PubMed

    Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H

    2004-10-12

    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. http://www.cs.waikato.ac.nz/ml/weka.

  12. Detecting circular RNAs: bioinformatic and experimental challenges

    PubMed Central

    Szabo, Linda; Salzman, Julia

    2017-01-01

    The pervasive expression of circular RNAs (circRNAs) is a recently discovered feature of gene expression in highly diverged eukaryotes. Numerous algorithms that are used to detect genome-wide circRNA expression from RNA sequencing (RNA-seq) data have been developed in the past few years, but there is little overlap in their predictions and no clear gold-standard method to assess the accuracy of these algorithms. We review sources of experimental and bioinformatic biases that complicate the accurate discovery of circRNAs and discuss statistical approaches to address these biases. We conclude with a discussion of the current experimental progress on the topic. PMID:27739534

  13. MICRA: an automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data.

    PubMed

    Caboche, Ségolène; Even, Gaël; Loywick, Alexandre; Audebert, Christophe; Hot, David

    2017-12-19

    The increase in available sequence data has advanced the field of microbiology; however, making sense of these data without bioinformatics skills is still problematic. We describe MICRA, an automatic pipeline, available as a web interface, for microbial identification and characterization through reads analysis. MICRA uses iterative mapping against reference genomes to identify genes and variations. Additional modules allow prediction of antibiotic susceptibility and resistance and comparing the results of several samples. MICRA is fast, producing few false-positive annotations and variant calls compared to current methods, making it a tool of great interest for fully exploiting sequencing data.

  14. Identification of unique repeated patterns, location of mutation in DNA finger printing using artificial intelligence technique.

    PubMed

    Mukunthan, B; Nagaveni, N

    2014-01-01

    In genetic engineering, conventional techniques and algorithms employed by forensic scientists to assist in identification of individuals on the basis of their respective DNA profiles involves more complex computational steps and mathematical formulae, also the identification of location of mutation in a genomic sequence in laboratories is still an exigent task. This novel approach provides ability to solve the problems that do not have an algorithmic solution and the available solutions are also too complex to be found. The perfect blend made of bioinformatics and neural networks technique results in efficient DNA pattern analysis algorithm with utmost prediction accuracy.

  15. Novel strategy for protein exploration: high-throughput screening assisted with fuzzy neural network.

    PubMed

    Kato, Ryuji; Nakano, Hideo; Konishi, Hiroyuki; Kato, Katsuya; Koga, Yuchi; Yamane, Tsuneo; Kobayashi, Takeshi; Honda, Hiroyuki

    2005-08-19

    To engineer proteins with desirable characteristics from a naturally occurring protein, high-throughput screening (HTS) combined with directed evolutional approach is the essential technology. However, most HTS techniques are simple positive screenings. The information obtained from the positive candidates is used only as results but rarely as clues for understanding the structural rules, which may explain the protein activity. In here, we have attempted to establish a novel strategy for exploring functional proteins associated with computational analysis. As a model case, we explored lipases with inverted enantioselectivity for a substrate p-nitrophenyl 3-phenylbutyrate from the wild-type lipase of Burkhorderia cepacia KWI-56, which is originally selective for (S)-configuration of the substrate. Data from our previous work on (R)-enantioselective lipase screening were applied to fuzzy neural network (FNN), bioinformatic algorithm, to extract guidelines for screening and engineering processes to be followed. FNN has an advantageous feature of extracting hidden rules that lie between sequences of variants and their enzyme activity to gain high prediction accuracy. Without any prior knowledge, FNN predicted a rule indicating that "size at position L167," among four positions (L17, F119, L167, and L266) in the substrate binding core region, is the most influential factor for obtaining lipase with inverted (R)-enantioselectivity. Based on the guidelines obtained, newly engineered novel variants, which were not found in the actual screening, were experimentally proven to gain high (R)-enantioselectivity by engineering the size at position L167. We also designed and assayed two novel variants, namely FIGV (L17F, F119I, L167G, and L266V) and FFGI (L17F, L167G, and L266I), which were compatible with the guideline obtained from FNN analysis, and confirmed that these designed lipases could acquire high inverted enantioselectivity. The results have shown that with the aid of bioinformatic analysis, high-throughput screening can expand its potential for exploring vast combinatorial sequence spaces of proteins.

  16. Phylogenetic and Structural Analysis of the Pluripotency Factor Sex-Determining Region Y box2 Gene of Camelus dromedarius (cSox2).

    PubMed

    Alawad, Abdullah; Alharbi, Sultan; Alhazzaa, Othman; Alagrafi, Faisal; Alkhrayef, Mohammed; Alhamdan, Ziyad; Alenazi, Abdullah; Al-Johi, Hasan; Alanazi, Ibrahim O; Hammad, Mohamed

    2016-01-01

    Although the sequencing information of Sox2 cDNA for many mammalian is available, the Sox2 cDNA of Camelus dromedaries has not yet been characterized. The objective of this study was to sequence and characterize Sox2 cDNA from the brain of C. dromedarius (also known as Arabian camel). A full coding sequence of the Sox2 gene from the brain of C. dromedarius was amplified by reverse transcription PCRjmc and then sequenced using the 3730XL series platform Sequencer (Applied Biosystem) for the first time. The cDNA sequence displayed an open reading frame of 822 nucleotides, encoding a protein of 273 amino acids. The molecular weight and the isoelectric point of the translated protein were calculated as 29.825 kDa and 10.11, respectively, using bioinformatics analysis. The predicted cSox2 protein sequence exhibited high identity: 99% for Homo sapiens, Mus musculus, Bos taurus, and Vicugna pacos; 98% for Sus scrofa and 93% for Camelus ferus. A 3D structure was built based on the available crystal structure of the HMG-box domain of human stem cell transcription factor Sox2 (PDB: 2 LE4) with 81 residues and predicting bioinformatics software for 273 amino acid residues. The comparison confirms the presence of the HMG-box domain in the cSox2 protein. The orthologous phylogenetic analysis showed that the Sox2 isoform from C. dromedarius was grouped with humans, alpacas, cattle, and pigs. We believe that this genetic and structural information will be a helpful source for the annotation. Furthermore, Sox2 is one of the transcription factors that contributes to the generation-induced pluripotent stem cells (iPSCs), which in turn will probably help generate camel induced pluripotent stem cells (CiPSCs).

  17. Bioinformatics analysis of ROP8 protein to improve vaccine design against Toxoplasma gondii.

    PubMed

    Foroutan, Masoud; Ghaffarifar, Fatemeh; Sharifi, Zohreh; Dalimi, Abdolhosein; Pirestani, Majid

    2018-04-26

    Rhoptry proteins (ROPs) are involved in the different stages of Toxoplasma gondii (T. gondii) invasion and are also critical for survival within host cells. ROP8 is expressed in the early stages of infection and have a key role in the parasitophorous vacuole (PV) formation. In this paper, we have combined several bioinformatics online servers for immunogenicity prediction of ROP8 protein. In this study, several bioinformatics approaches were used to analyze the different aspects of ROP8 protein, including the physico-chemical properties, transmembrane domain, subcellular localization, secondary and tertiary structure, B and T-cell potential epitopes, and other important characteristics of this protein. The findings showed that ROP8 protein had 60 potential post-translational modification sites. Also, only one transmembrane domain was recognized for this protein. The secondary structure of ROP8 protein comprises 33.04% alpha-helix, 18.26% extended strand, and 48.70% random coil. Moreover, several potential B and T-cell epitopes were identified for ROP8. In addition, the obtained findings from antigenicity and allergenicity evaluation remarked that this protein is immunogenic and non-allergen. Based on the results of Ramachandran plot, 94.8%, 4.1%, and 1.1% of amino acid residues were incorporated in the favored, allowed, and outlier regions, respectively. This paper provides a foundation for further investigations, and laid a theoretical basis for the development of an appropriate vaccine against toxoplasmosis. More studies are needed experimentally using the ROP8 alone or in combination with other antigens in the future. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Immunoproteomic and bioinformatic approaches to identify secreted Leishmania amazonensis, L. braziliensis, and L. infantum proteins with specific reactivity using canine serum.

    PubMed

    Lima, B S S; Fialho, L C; Pires, S F; Tafuri, W L; Andrade, H M

    2016-06-15

    Leishmania spp have a wide range of hosts, and each host can harbor several Leishmania species. Dogs, for example, are frequently infected by Leishmania infantum, where they constitute its main reservoir, but they also serve as hosts for L. braziliensis and L. amazonensis. Serological tests for antibody detection are valuable tools for diagnosis of L. infantum infection due to the high levels of antibodies induced, unlike what is observed in L. amazonensis and L. braziliensis infections. Likewise, serology-based antigen-detection can be useful as an approach to diagnose any Leishmania species infection using different corporal fluid samples. Immunogenic and secreted proteins constitute powerful targets for diagnostic methods in antigen detection. As such, we performed immunoproteomic (2-DE, western blot and mass spectrometry) and bioinformatic screening to search for reactive and secreted proteins from L. amazonensis, L. braziliensis, and L. infantum. Twenty-eight non-redundant proteins were identified, among which, six were reactive only in L. amazonensis extracts, 10 in L. braziliensis extracts, and seven in L. infantum extracts. After bioinformatic analysis, seven proteins were predicted to be secreted, two of which were reactive only in L. amazonensis extracts (52kDa PDI and the glucose-regulated protein 78), one in L. braziliensis extracts (pyruvate dehydrogenase E1 beta subunit) and three in L. infantum extracts (two conserved hypothetical proteins and elongation factor 1-beta). We propose that proteins can be suitable targets for diagnostic methods based on antigen detection. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Analyses of Brucella Pathogenesis, Host Immunity, and Vaccine Targets using Systems Biology and Bioinformatics

    PubMed Central

    He, Yongqun

    2011-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning. PMID:22919594

  20. Analyses of Brucella pathogenesis, host immunity, and vaccine targets using systems biology and bioinformatics.

    PubMed

    He, Yongqun

    2012-01-01

    Brucella is a Gram-negative, facultative intracellular bacterium that causes zoonotic brucellosis in humans and various animals. Out of 10 classified Brucella species, B. melitensis, B. abortus, B. suis, and B. canis are pathogenic to humans. In the past decade, the mechanisms of Brucella pathogenesis and host immunity have been extensively investigated using the cutting edge systems biology and bioinformatics approaches. This article provides a comprehensive review of the applications of Omics (including genomics, transcriptomics, and proteomics) and bioinformatics technologies for the analysis of Brucella pathogenesis, host immune responses, and vaccine targets. Based on more than 30 sequenced Brucella genomes, comparative genomics is able to identify gene variations among Brucella strains that help to explain host specificity and virulence differences among Brucella species. Diverse transcriptomics and proteomics gene expression studies have been conducted to analyze gene expression profiles of wild type Brucella strains and mutants under different laboratory conditions. High throughput Omics analyses of host responses to infections with virulent or attenuated Brucella strains have been focused on responses by mouse and cattle macrophages, bovine trophoblastic cells, mouse and boar splenocytes, and ram buffy coat. Differential serum responses in humans and rams to Brucella infections have been analyzed using high throughput serum antibody screening technology. The Vaxign reverse vaccinology has been used to predict many Brucella vaccine targets. More than 180 Brucella virulence factors and their gene interaction networks have been identified using advanced literature mining methods. The recent development of community-based Vaccine Ontology and Brucellosis Ontology provides an efficient way for Brucella data integration, exchange, and computer-assisted automated reasoning.

  1. Bioinformatic prediction of leader genes in human periodontitis.

    PubMed

    Covani, Ugo; Marconcini, Simone; Giacomelli, Luca; Sivozhelevov, Victor; Barone, Antonio; Nicolini, Claudio

    2008-10-01

    Genes involved in different biologic processes form complex interaction networks. However, only a few have a high number of interactions with the other genes in the network. In previous bioinformatics and experimental studies concerning the T lymphocyte cell cycle, these genes were identified and termed "leader genes." In this work, genes involved in human periodontitis were tentatively identified and ranked according to their number of interactions to obtain a preliminary, broader view of molecular mechanisms of periodontitis and plan targeted experimentation. Genes were identified with interrelated queries of several databases. The interactions among these genes were mapped and given a significance score. The weighted number of links (weighted sum of scores for every interaction in which the given gene is involved) was calculated for each gene. Genes were clustered according to this parameter. The genes in the highest cluster were termed leader genes. Sixty-one genes involved or potentially involved in periodontitis were identified. Only five were identified as leader genes, whereas 12 others were ranked in an immediately lower cluster. For 10 of 17 genes there is evidence of involvement in periodontitis; seven new genes that are potentially involved in this disease were identified. The involvement in periodontitis has been completely established for only two leader genes. We applied a validated bioinformatics algorithm to increase our knowledge of molecular mechanisms of periodontitis. Even with the limitations of this ab initio analysis, this theoretical study can suggest ad hoc experimentation targeted on significant genes and, therefore, simpler than mass-scale molecular genomics. Moreover, the identification of leader genes might suggest new potential risk factors and therapeutic targets.

  2. BioContainers: an open-source and community-driven framework for software standardization.

    PubMed

    da Veiga Leprevost, Felipe; Grüning, Björn A; Alves Aflitos, Saulo; Röst, Hannes L; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I; Perez-Riverol, Yasset

    2017-08-15

    BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). The software is freely available at github.com/BioContainers/. yperez@ebi.ac.uk. © The Author(s) 2017. Published by Oxford University Press.

  3. BioContainers: an open-source and community-driven framework for software standardization

    PubMed Central

    da Veiga Leprevost, Felipe; Grüning, Björn A.; Alves Aflitos, Saulo; Röst, Hannes L.; Uszkoreit, Julian; Barsnes, Harald; Vaudel, Marc; Moreno, Pablo; Gatto, Laurent; Weber, Jonas; Bai, Mingze; Jimenez, Rafael C.; Sachsenberg, Timo; Pfeuffer, Julianus; Vera Alvarez, Roberto; Griss, Johannes; Nesvizhskii, Alexey I.; Perez-Riverol, Yasset

    2017-01-01

    Abstract Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/. Contact yperez@ebi.ac.uk PMID:28379341

  4. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    ERIC Educational Resources Information Center

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  5. Open discovery: An integrated live Linux platform of Bioinformatics tools

    PubMed Central

    Vetrivel, Umashankar; Pilla, Kalabharath

    2008-01-01

    Historically, live linux distributions for Bioinformatics have paved way for portability of Bioinformatics workbench in a platform independent manner. Moreover, most of the existing live Linux distributions limit their usage to sequence analysis and basic molecular visualization programs and are devoid of data persistence. Hence, open discovery ‐ a live linux distribution has been developed with the capability to perform complex tasks like molecular modeling, docking and molecular dynamics in a swift manner. Furthermore, it is also equipped with complete sequence analysis environment and is capable of running windows executable programs in Linux environment. Open discovery portrays the advanced customizable configuration of fedora, with data persistency accessible via USB drive or DVD. Availability The Open Discovery is distributed free under Academic Free License (AFL) and can be downloaded from http://www.OpenDiscovery.org.in PMID:19238235

  6. Omics AnalySIs System for PRecision Oncology (OASISPRO): A Web-based Omics Analysis Tool for Clinical Phenotype Prediction.

    PubMed

    Yu, Kun-Hsing; Fitzpatrick, Michael R; Pappas, Luke; Chan, Warren; Kung, Jessica; Snyder, Michael

    2017-09-12

    Precision oncology is an approach that accounts for individual differences to guide cancer management. Omics signatures have been shown to predict clinical traits for cancer patients. However, the vast amount of omics information poses an informatics challenge in systematically identifying patterns associated with health outcomes, and no general-purpose data-mining tool exists for physicians, medical researchers, and citizen scientists without significant training in programming and bioinformatics. To bridge this gap, we built the Omics AnalySIs System for PRecision Oncology (OASISPRO), a web-based system to mine the quantitative omics information from The Cancer Genome Atlas (TCGA). This system effectively visualizes patients' clinical profiles, executes machine-learning algorithms of choice on the omics data, and evaluates the prediction performance using held-out test sets. With this tool, we successfully identified genes strongly associated with tumor stage, and accurately predicted patients' survival outcomes in many cancer types, including mesothelioma and adrenocortical carcinoma. By identifying the links between omics and clinical phenotypes, this system will facilitate omics studies on precision cancer medicine and contribute to establishing personalized cancer treatment plans. This web-based tool is available at http://tinyurl.com/oasispro ;source codes are available at http://tinyurl.com/oasisproSourceCode . © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. CBB Portal @ PNNL

    Science.gov Websites

    Search PNNL Home About Research Publications Jobs News Contacts Computational Biology and Bioinformatics , and engineering to transform the data into knowledge. This new quantitative, predictive biology is to empirical modeling and physics-based simulations. CBB research seeks to: Understand. Understanding

  8. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica) genome: new insights from bioinformatics analysis

    PubMed Central

    2012-01-01

    Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were analysed to predict the possible biological functions of the rice OsGELP genes. Conclusions Our current genomic analysis, for the first time, presents fundamental information on the organization of the rice OsGELP gene family. With combination of the genomic, phylogenetic, microarray expression, protein motif distribution, and protein structure analyses, we were able to create supported basis for the functional prediction of many members in the rice GDSL esterase/lipase family. The present study provides a platform for the selection of candidate genes for further detailed functional study. PMID:22793791

  9. Systematic bioinformatics and experimental validation of yeast complexes reduces the rate of attrition during structural investigations.

    PubMed

    Brooks, Mark A; Gewartowski, Kamil; Mitsiki, Eirini; Létoquart, Juliette; Pache, Roland A; Billier, Ysaline; Bertero, Michela; Corréa, Margot; Czarnocki-Cieciura, Mariusz; Dadlez, Michal; Henriot, Véronique; Lazar, Noureddine; Delbos, Lila; Lebert, Dorothée; Piwowarski, Jan; Rochaix, Pascal; Böttcher, Bettina; Serrano, Luis; Séraphin, Bertrand; van Tilbeurgh, Herman; Aloy, Patrick; Perrakis, Anastassis; Dziembowski, Andrzej

    2010-09-08

    For high-throughput structural studies of protein complexes of composition inferred from proteomics data, it is crucial that candidate complexes are selected accurately. Herein, we exemplify a procedure that combines a bioinformatics tool for complex selection with in vivo validation, to deliver structural results in a medium-throughout manner. We have selected a set of 20 yeast complexes, which were predicted to be feasible by either an automated bioinformatics algorithm, by manual inspection of primary data, or by literature searches. These complexes were validated with two straightforward and efficient biochemical assays, and heterologous expression technologies of complex components were then used to produce the complexes to assess their feasibility experimentally. Approximately one-half of the selected complexes were useful for structural studies, and we detail one particular success story. Our results underscore the importance of accurate target selection and validation in avoiding transient, unstable, or simply nonexistent complexes from the outset. Copyright © 2010 Elsevier Ltd. All rights reserved.

  10. iEnhancer-EL: Identifying enhancers and their strength with ensemble learning approach.

    PubMed

    Liu, Bin; Li, Kai; Huang, De-Shuang; Chou, Kuo-Chen

    2018-06-07

    Identification of enhancers and their strength is important because they play a critical role in controlling gene expression. Although some bioinformatics tools were developed, they are limited in discriminating enhancers from non-enhancers only. Recently, a two-layer predictor called "iEnhancer-2L" was developed that can be used to predict the enhancer's strength as well. However, its prediction quality needs further improvement to enhance the practical application value. A new predictor called "iEnhancer-EL" was proposed that contains two layer predictors: the first one (for identifying enhancers) is formed by fusing an array of six key individual classifiers, and the second one (for their strength) formed by fusing an array of ten key individual classifiers. All these key classifiers were selected from 171 elementary classifiers formed by SVM (Support Vector Machine) based on kmer, subsequence profile, and PseKNC (Pseudo K-tuple Nucleotide Composition), respectively. Rigorous cross-validations have indicated that the proposed predictor is remarkably superior to the existing state-of-the-art one in this area. A web server for the iEnhancer-EL has been established at http://bioinformatics.hitsz.edu.cn/iEnhancer-EL/, by which users can easily get their desired results without the need to go through the mathematical details. bliu@hit.edu.cn, dshuang@tongji.edu.cn or kcchou@gordonlifescience.org. Supplementary data are available at Bioinformatics online.

  11. Pancreas Cancer Precision Treatment Using Avatar Mice from a Bioinformatics Perspective.

    PubMed

    Perales-Patón, Javier; Piñeiro-Yañez, Elena; Tejero, Héctor; López-Casas, Pedro P; Hidalgo, Manuel; Gómez-López, Gonzalo; Al-Shahrour, Fátima

    2017-01-01

    Pancreatic ductal adenocarcinoma (PDAC) is a leading cause of cancer-related death among solid malignancies. Unfortunately, PDAC lethality has not substantially decreased over the past 20 years. This aggressiveness is related to the genomic complexity and heterogeneity of PDAC, but also to the absence of an effective screening for the detection of early-stage tumors and a lack of efficient therapeutic options. Therefore, there is an urgent need to improve the arsenal of anti-PDAC drugs for an effective treatment of these patients. Patient-derived xenograft (PDX) mouse models represent a promising strategy to personalize PDAC treatment, offering a bench testing of candidate treatments and helping to select empirical treatments in PDAC patients with no therapeutic targets. Moreover, bioinformatics-based approaches have the potential to offer systematic insights into PDAC etiology predicting putatively actionable tumor-specific genomic alterations, identifying novel biomarkers and generating disease-associated gene expression signatures. This review focuses on recent efforts to individualize PDAC treatments using PDX models. Additionally, we discuss the current understanding of the PDAC genomic landscape and the putative druggable targets derived from mutational studies. PDAC molecular subclassifications and gene expression profiling studies are reviewed as well. Finally, latest bioinformatics methodologies based on somatic variant detection and prioritization, in silico drug response prediction, and drug repositioning to improve the treatment of advanced PDAC tumors are also covered. © 2017 S. Karger AG, Basel.

  12. In Silico Prediction Analysis of Idiotope-Driven T–B Cell Collaboration in Multiple Sclerosis

    PubMed Central

    Høglund, Rune A.; Lossius, Andreas; Johansen, Jorunn N.; Homan, Jane; Benth, Jūratė Šaltytė; Robins, Harlan; Bogen, Bjarne; Bremel, Robert D.; Holmøy, Trygve

    2017-01-01

    Memory B cells acting as antigen-presenting cells are believed to be important in multiple sclerosis (MS), but the antigen they present remains unknown. We hypothesized that B cells may activate CD4+ T cells in the central nervous system of MS patients by presenting idiotopes from their own immunoglobulin variable regions on human leukocyte antigen (HLA) class II molecules. Here, we use bioinformatics prediction analysis of B cell immunoglobulin variable regions from 11 MS patients and 6 controls with other inflammatory neurological disorders (OINDs), to assess whether the prerequisites for such idiotope-driven T–B cell collaboration are present. Our findings indicate that idiotopes from the complementarity determining region (CDR) 3 of MS patients on average have high predicted affinities for disease associated HLA-DRB1*15:01 molecules and are predicted to be endosomally processed by cathepsin S and L in positions that allows such HLA binding to occur. Additionally, complementarity determining region 3 sequences from cerebrospinal fluid (CSF) B cells from MS patients contain on average more rare T cell-exposed motifs that could potentially escape tolerance and stimulate CD4+ T cells than CSF B cells from OIND patients. Many of these features were associated with preferential use of the IGHV4 gene family by CSF B cells from MS patients. This is the first study to combine high-throughput sequencing of patient immune repertoires with large-scale prediction analysis and provides key indicators for future in vitro and in vivo analyses. PMID:29038659

  13. In Silico Prediction Analysis of Idiotope-Driven T-B Cell Collaboration in Multiple Sclerosis.

    PubMed

    Høglund, Rune A; Lossius, Andreas; Johansen, Jorunn N; Homan, Jane; Benth, Jūratė Šaltytė; Robins, Harlan; Bogen, Bjarne; Bremel, Robert D; Holmøy, Trygve

    2017-01-01

    Memory B cells acting as antigen-presenting cells are believed to be important in multiple sclerosis (MS), but the antigen they present remains unknown. We hypothesized that B cells may activate CD4 + T cells in the central nervous system of MS patients by presenting idiotopes from their own immunoglobulin variable regions on human leukocyte antigen (HLA) class II molecules. Here, we use bioinformatics prediction analysis of B cell immunoglobulin variable regions from 11 MS patients and 6 controls with other inflammatory neurological disorders (OINDs), to assess whether the prerequisites for such idiotope-driven T-B cell collaboration are present. Our findings indicate that idiotopes from the complementarity determining region (CDR) 3 of MS patients on average have high predicted affinities for disease associated HLA-DRB1*15:01 molecules and are predicted to be endosomally processed by cathepsin S and L in positions that allows such HLA binding to occur. Additionally, complementarity determining region 3 sequences from cerebrospinal fluid (CSF) B cells from MS patients contain on average more rare T cell-exposed motifs that could potentially escape tolerance and stimulate CD4 + T cells than CSF B cells from OIND patients. Many of these features were associated with preferential use of the IGHV4 gene family by CSF B cells from MS patients. This is the first study to combine high-throughput sequencing of patient immune repertoires with large-scale prediction analysis and provides key indicators for future in vitro and in vivo analyses.

  14. RImmPort: an R/Bioconductor package that enables ready-for-analysis immunology research data.

    PubMed

    Shankar, Ravi D; Bhattacharya, Sanchita; Jujjavarapu, Chethan; Andorf, Sandra; Wiser, Jeffery A; Butte, Atul J

    2017-04-01

    : Open access to raw clinical and molecular data related to immunological studies has created a tremendous opportunity for data-driven science. We have developed RImmPort that prepares NIAID-funded research study datasets in ImmPort (immport.org) for analysis in R. RImmPort comprises of three main components: (i) a specification of R classes that encapsulate study data, (ii) foundational methods to load data of a specific study and (iii) generic methods to slice and dice data across different dimensions in one or more studies. Furthermore, RImmPort supports open formalisms, such as CDISC standards on the open source bioinformatics platform Bioconductor, to ensure that ImmPort curated study datasets are seamlessly accessible and ready for analysis, thus enabling innovative bioinformatics research in immunology. RImmPort is available as part of Bioconductor (bioconductor.org/packages/RImmPort). rshankar@stanford.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  15. Mapping the Extracellular and Membrane Proteome Associated with the Vasculature and the Stroma in the Embryo*

    PubMed Central

    Soulet, Fabienne; Kilarski, Witold W.; Roux-Dalvai, Florence; Herbert, John M. J.; Sacewicz, Izabela; Mouton-Barbosa, Emmanuelle; Bicknell, Roy; Lalor, Patricia; Monsarrat, Bernard; Bikfalvi, Andreas

    2013-01-01

    In order to map the extracellular or membrane proteome associated with the vasculature and the stroma in an embryonic organism in vivo, we developed a biotinylation technique for chicken embryo and combined it with mass spectrometry and bioinformatic analysis. We also applied this procedure to implanted tumors growing on the chorioallantoic membrane or after the induction of granulation tissue. Membrane and extracellular matrix proteins were the most abundant components identified. Relative quantitative analysis revealed differential protein expression patterns in several tissues. Through a bioinformatic approach, we determined endothelial cell protein expression signatures, which allowed us to identify several proteins not yet reported to be associated with endothelial cells or the vasculature. This is the first study reported so far that applies in vivo biotinylation, in combination with robust label-free quantitative proteomics approaches and bioinformatic analysis, to an embryonic organism. It also provides the first description of the vascular and matrix proteome of the embryo that might constitute the starting point for further developments. PMID:23674615

  16. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    PubMed

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.

  17. SBMLmod: a Python-based web application and web service for efficient data integration and model simulation.

    PubMed

    Schäuble, Sascha; Stavrum, Anne-Kristin; Bockwoldt, Mathias; Puntervoll, Pål; Heiland, Ines

    2017-06-24

    Systems Biology Markup Language (SBML) is the standard model representation and description language in systems biology. Enriching and analysing systems biology models by integrating the multitude of available data, increases the predictive power of these models. This may be a daunting task, which commonly requires bioinformatic competence and scripting. We present SBMLmod, a Python-based web application and service, that automates integration of high throughput data into SBML models. Subsequent steady state analysis is readily accessible via the web service COPASIWS. We illustrate the utility of SBMLmod by integrating gene expression data from different healthy tissues as well as from a cancer dataset into a previously published model of mammalian tryptophan metabolism. SBMLmod is a user-friendly platform for model modification and simulation. The web application is available at http://sbmlmod.uit.no , whereas the WSDL definition file for the web service is accessible via http://sbmlmod.uit.no/SBMLmod.wsdl . Furthermore, the entire package can be downloaded from https://github.com/MolecularBioinformatics/sbml-mod-ws . We envision that SBMLmod will make automated model modification and simulation available to a broader research community.

  18. Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

    PubMed

    Radauer, Christian

    2017-01-01

    The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

  19. Plant metabolic modeling: achieving new insight into metabolism and metabolic engineering.

    PubMed

    Baghalian, Kambiz; Hajirezaei, Mohammad-Reza; Schreiber, Falk

    2014-10-01

    Models are used to represent aspects of the real world for specific purposes, and mathematical models have opened up new approaches in studying the behavior and complexity of biological systems. However, modeling is often time-consuming and requires significant computational resources for data development, data analysis, and simulation. Computational modeling has been successfully applied as an aid for metabolic engineering in microorganisms. But such model-based approaches have only recently been extended to plant metabolic engineering, mainly due to greater pathway complexity in plants and their highly compartmentalized cellular structure. Recent progress in plant systems biology and bioinformatics has begun to disentangle this complexity and facilitate the creation of efficient plant metabolic models. This review highlights several aspects of plant metabolic modeling in the context of understanding, predicting and modifying complex plant metabolism. We discuss opportunities for engineering photosynthetic carbon metabolism, sucrose synthesis, and the tricarboxylic acid cycle in leaves and oil synthesis in seeds and the application of metabolic modeling to the study of plant acclimation to the environment. The aim of the review is to offer a current perspective for plant biologists without requiring specialized knowledge of bioinformatics or systems biology. © 2014 American Society of Plant Biologists. All rights reserved.

  20. Plant Metabolic Modeling: Achieving New Insight into Metabolism and Metabolic Engineering

    PubMed Central

    Baghalian, Kambiz; Hajirezaei, Mohammad-Reza; Schreiber, Falk

    2014-01-01

    Models are used to represent aspects of the real world for specific purposes, and mathematical models have opened up new approaches in studying the behavior and complexity of biological systems. However, modeling is often time-consuming and requires significant computational resources for data development, data analysis, and simulation. Computational modeling has been successfully applied as an aid for metabolic engineering in microorganisms. But such model-based approaches have only recently been extended to plant metabolic engineering, mainly due to greater pathway complexity in plants and their highly compartmentalized cellular structure. Recent progress in plant systems biology and bioinformatics has begun to disentangle this complexity and facilitate the creation of efficient plant metabolic models. This review highlights several aspects of plant metabolic modeling in the context of understanding, predicting and modifying complex plant metabolism. We discuss opportunities for engineering photosynthetic carbon metabolism, sucrose synthesis, and the tricarboxylic acid cycle in leaves and oil synthesis in seeds and the application of metabolic modeling to the study of plant acclimation to the environment. The aim of the review is to offer a current perspective for plant biologists without requiring specialized knowledge of bioinformatics or systems biology. PMID:25344492

  1. Virtual Interactomics of Proteins from Biochemical Standpoint

    PubMed Central

    Kubrycht, Jaroslav; Sigler, Karel; Souček, Pavel

    2012-01-01

    Virtual interactomics represents a rapidly developing scientific area on the boundary line of bioinformatics and interactomics. Protein-related virtual interactomics then comprises instrumental tools for prediction, simulation, and networking of the majority of interactions important for structural and individual reproduction, differentiation, recognition, signaling, regulation, and metabolic pathways of cells and organisms. Here, we describe the main areas of virtual protein interactomics, that is, structurally based comparative analysis and prediction of functionally important interacting sites, mimotope-assisted and combined epitope prediction, molecular (protein) docking studies, and investigation of protein interaction networks. Detailed information about some interesting methodological approaches and online accessible programs or databases is displayed in our tables. Considerable part of the text deals with the searches for common conserved or functionally convergent protein regions and subgraphs of conserved interaction networks, new outstanding trends and clinically interesting results. In agreement with the presented data and relationships, virtual interactomic tools improve our scientific knowledge, help us to formulate working hypotheses, and they frequently also mediate variously important in silico simulations. PMID:22928109

  2. MINS2: revisiting the molecular code for transmembrane-helix recognition by the Sec61 translocon.

    PubMed

    Park, Yungki; Helms, Volkhard

    2008-08-15

    To be fully functional, membrane proteins should not only fold, but also get inserted into the membrane, which is mediated by the Sec61 translocon. Recent experimental studies have attempted to elucidate how the Sec61 translocon accomplishes this delicate task by measuring the translocon-mediated membrane insertion free energies of 357 systematically designed peptides. On the basis of this data set, we have developed MINS2, a novel sequence-based computational method for predicting the membrane insertion free energies of protein sequences. A benchmark analysis of MINS2 shows that MINS2 signi.cantly outperforms previously proposed methods. Importantly, the application of MINS2 to known membrane protein structures shows that a better prediction of membrane insertion free energies does not lead to a better prediction of transmembrane segments of polytopic membrane proteins. A web server for MINS2 is publicly available at http://service.bioinformatik.uni-saarland.de/mins. Supplementary data are available at Bioinformatics online.

  3. Computational Analysis of Uncharacterized Proteins of Environmental Bacterial Genome

    NASA Astrophysics Data System (ADS)

    Coxe, K. J.; Kumar, M.

    2017-12-01

    Betaproteobacteria strain CB is a gram-negative bacterium in the phylum Proteobacteria and are found naturally in soil and water. In this complex environment, bacteria play a key role in efficiently eliminating the organic material and other pollutants from wastewater. To investigate the process of pollutant removal from wastewater using bacteria, it is important to characterize the proteins encoded by the bacterial genome. Our study combines a number of bioinformatics tools to predict the function of unassigned proteins in the bacterial genome. The genome of Betaproteobacteria strain CB contains 2,112 proteins in which function of 508 proteins are unknown, termed as uncharacterized proteins (UPs). The localization of the UPs with in the cell was determined and the structure of 38 UPs was accurately predicted. These UPs were predicted to belong to various classes of proteins such as enzymes, transporters, binding proteins, signal peptides, transmembrane proteins and other proteins. The outcome of this work will help better understand wastewater treatment mechanism.

  4. The Predicted Secretome of the Plant Pathogenic Fungus Fusarium graminearum: A Refined Comparative Analysis

    PubMed Central

    Brown, Neil A.; Antoniw, John; Hammond-Kosack, Kim E.

    2012-01-01

    The fungus Fusarium graminearum forms an intimate association with the host species wheat whilst infecting the floral tissues at anthesis. During the prolonged latent period of infection, extracellular communication between live pathogen and host cells must occur, implying a role for secreted fungal proteins. The wheat cells in contact with fungal hyphae subsequently die and intracellular hyphal colonisation results in the development of visible disease symptoms. Since the original genome annotation analysis was done in 2007, which predicted the secretome using TargetP, the F. graminearum gene call has changed considerably through the combined efforts of the BROAD and MIPS institutes. As a result of the modifications to the genome and the recent findings that suggested a role for secreted proteins in virulence, the F. graminearum secretome was revisited. In the current study, a refined F. graminearum secretome was predicted by combining several bioinformatic approaches. This strategy increased the probability of identifying truly secreted proteins. A secretome of 574 proteins was predicted of which 99% was supported by transcriptional evidence. The function of the annotated and unannotated secreted proteins was explored. The potential role(s) of the annotated proteins including, putative enzymes, phytotoxins and antifungals are discussed. Characterisation of the unannotated proteins included the analysis of Pfam domains and features associated with known fungal effectors, for example, small size, cysteine-rich and containing internal amino acid repeats. A comprehensive comparative genomic analysis involving 57 fungal and oomycete genomes revealed that only a small number of the predicted F. graminearum secreted proteins can be considered to be either species or sequenced strain specific. PMID:22493673

  5. Label-free quantitative proteomic analysis of human plasma-derived microvesicles to find protein signatures of abdominal aortic aneurysms.

    PubMed

    Martinez-Pinna, Roxana; Gonzalez de Peredo, Anne; Monsarrat, Bernard; Burlet-Schiltz, Odile; Martin-Ventura, Jose Luis

    2014-08-01

    To find potential biomarkers of abdominal aortic aneurysms (AAA), we performed a differential proteomic study based on human plasma-derived microvesicles. Exosomes and microparticles isolated from plasma of AAA patients and control subjects (n = 10 each group) were analyzed by a label-free quantitative MS-based strategy. Homemade and publicly available software packages have been used for MS data analysis. The application of two kinds of bioinformatic tools allowed us to find differential protein profiles from AAA patients. Some of these proteins found by the two analysis methods belong to main pathological mechanisms of AAA such as oxidative stress, immune-inflammation, and thrombosis. Data analysis from label-free MS-based experiments requires the use of sophisticated bioinformatic approaches to perform quantitative studies from complex protein mixtures. The application of two of these bioinformatic tools provided us a preliminary list of differential proteins found in plasma-derived microvesicles not previously associated to AAA, which could help us to understand the pathological mechanisms related to this disease. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

    PubMed

    Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

    2012-07-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

    NASA Astrophysics Data System (ADS)

    Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.

  8. Clustering and Network Analysis of Reverse Phase Protein Array Data.

    PubMed

    Byron, Adam

    2017-01-01

    Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.

  9. In silico study of breast cancer associated gene 3 using LION Target Engine and other tools.

    PubMed

    León, Darryl A; Cànaves, Jaume M

    2003-12-01

    Sequence analysis of individual targets is an important step in annotation and validation. As a test case, we investigated human breast cancer associated gene 3 (BCA3) with LION Target Engine and with other bioinformatics tools. LION Target Engine confirmed that the BCA3 gene is located on 11p15.4 and that the two most likely splice variants (lacking exon 3 and exons 3 and 5, respectively) exist. Based on our manual curation of sequence data, it is proposed that an additional variant (missing only exon 5) published in a public sequence repository, is a prediction artifact. A significant number of new orthologs were also identified, and these were the basis for a high-quality protein secondary structure prediction. Moreover, our research confirmed several distinct functional domains as described in earlier reports. Sequence conservation from multiple sequence alignments, splice variant identification, secondary structure predictions, and predicted phosphorylation sites suggest that the removal of interaction sites through alternative splicing might play a modulatory role in BCA3. This in silico approach shows the depth and relevance of an analysis that can be accomplished by including a variety of publicly available tools with an integrated and customizable life science informatics platform.

  10. Proteomics, bioinformatics and targeted gene expression analysis reveals up-regulation of cochlin and identifies other potential biomarkers in the mouse model for deafness in usher syndrome type 1F

    PubMed Central

    Chance, Mark R.; Chang, Jinsook; Liu, Shuqing; Gokulrangan, Giridharan; Chen, Daniel H.-C.; Lindsay, Aaron; Geng, Ruishuang; Zheng, Qing Y.; Alagramam, Kumar

    2010-01-01

    Proteins and protein networks associated with cochlear pathogenesis in the Ames waltzer (av) mouse, a model for deafness in Usher syndrome 1F (USH1F), were identified. Cochlear protein from wild-type and av mice at postnatal day 30, a time point in which cochlear pathology is well established, was analyzed by quantitative 2D gel electrophoresis followed by mass spectrometry (MS). The analytic gel resolved 2270 spots; 69 spots showed significant changes in intensity in the av cochlea compared with the control. The cochlin protein was identified in 20 peptide spots, most of which were up-regulated, while a few were down-regulated. Analysis of MS sequence data showed that, in the av cochlea, a set of full-length isoforms of cochlin was up-regulated, while isoforms missing the N-terminal FCH/LCCL domain were down-regulated. Protein interaction network analysis of all differentially expressed proteins was performed with Metacore software. That analysis revealed a number of statistically significant candidate protein networks predicted to be altered in the affected cochlea. Quantitative PCR (qPCR) analysis of select candidates from the proteomic and bioinformatic investigations showed up-regulation of Coch mRNA and those of p53, Brn3a and Nrf2, transcription factors linked to stress response and survival. Increased mRNA of Brn3a and Nrf2 has previously been associated with increased expression of cochlin in human glaucomatous trabecular meshwork. Our report strongly suggests that increased level of cochlin is an important etiologic factor leading to the degeneration of cochlear neuroepithelia in the USH1F model. PMID:20097680

  11. SeWeR: a customizable and integrated dynamic HTML interface to bioinformatics services.

    PubMed

    Basu, M K

    2001-06-01

    Sequence analysis using Web Resources (SeWeR) is an integrated, Dynamic HTML (DHTML) interface to commonly used bioinformatics services available on the World Wide Web. It is highly customizable, extendable, platform neutral, completely server-independent and can be hosted as a web page as well as being used as stand-alone software running within a web browser.

  12. Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.

    PubMed

    Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan

    2007-01-01

    Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

  13. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  14. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  15. Shotgun proteomic analysis of Emiliania huxleyi, a marine phytoplankton species of major biogeochemical importance.

    PubMed

    Jones, Bethan M; Edwards, Richard J; Skipp, Paul J; O'Connor, C David; Iglesias-Rodriguez, M Debora

    2011-06-01

    Emiliania huxleyi is a unicellular marine phytoplankton species known to play a significant role in global biogeochemistry. Through the dual roles of photosynthesis and production of calcium carbonate (calcification), carbon is transferred from the atmosphere to ocean sediments. Almost nothing is known about the molecular mechanisms that control calcification, a process that is tightly regulated within the cell. To initiate proteomic studies on this important and phylogenetically remote organism, we have devised efficient protein extraction protocols and developed a bioinformatics pipeline that allows the statistically robust assignment of proteins from MS/MS data using preexisting EST sequences. The bioinformatics tool, termed BUDAPEST (Bioinformatics Utility for Data Analysis of Proteomics using ESTs), is fully automated and was used to search against data generated from three strains. BUDAPEST increased the number of identifications over standard protein database searches from 37 to 99 proteins when data were amalgamated. Proteins involved in diverse cellular processes were uncovered. For example, experimental evidence was obtained for a novel type I polyketide synthase and for various photosystem components. The proteomic and bioinformatic approaches developed in this study are of wider applicability, particularly to the oceanographic community where genomic sequence data for species of interest are currently scarce.

  16. Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference

    PubMed Central

    2011-01-01

    The 2011 International Conference on Bioinformatics (InCoB) conference, which is the annual scientific conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted by Kuala Lumpur, Malaysia, is co-organized with the first ISCB-Asia conference of the International Society for Computational Biology (ISCB). InCoB and the sequencing of the human genome are both celebrating their tenth anniversaries and InCoB’s goalposts for the next decade, implementing standards in bioinformatics and globally distributed computational networks, will be discussed and adopted at this conference. Of the 49 manuscripts (selected from 104 submissions) accepted to BMC Genomics and BMC Bioinformatics conference supplements, 24 are featured in this issue, covering software tools, genome/proteome analysis, systems biology (networks, pathways, bioimaging) and drug discovery and design. PMID:22372736

  17. Identification of novel candidate maternal serum protein markers for Down syndrome by integrated proteomic and bioinformatic analysis.

    PubMed

    Kang, Yuan; Dong, Xinran; Zhou, Qiongjie; Zhang, Ying; Cheng, Yan; Hu, Rong; Su, Cuihong; Jin, Hong; Liu, Xiaohui; Ma, Duan; Tian, Weidong; Li, Xiaotian

    2012-03-01

    This study aimed to identify candidate protein biomarkers from maternal serum for Down syndrome (DS) by integrated proteomic and bioinformatics analysis. A pregnancy DS group of 18 women and a control group with the same number were prepared, and the maternal serum proteins were analyzed by isobaric tags for relative and absolute quantitation and mass spectrometry, to identify DS differentially expressed maternal serum proteins (DS-DEMSPs). Comprehensive bioinformatics analysis was then employed to analyze DS-DEMSPs both in this paper and seven related publications. Down syndrome differentially expressed maternal serum proteins from different studies are significantly enriched with common Gene Ontology functions, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, transcription factor binding sites, and Pfam protein domains, However, the DS-DEMSPs are less functionally related to known DS-related genes. These evidences suggest that common molecular mechanisms induced by secondary effects may be present upon DS carrying. A simple scoring scheme revealed Alpha-2-macroglobulin, Apolipoprotein A1, Apolipoprotein E, Complement C1s subcomponent, Complement component 5, Complement component 8, alpha polypeptide, Complement component 8, beta polypeptide and Fibronectin as potential DS biomarkers. The integration of proteomics and bioinformatics studies provides a novel approach to develop new prenatal screening methods for noninvasive yet accurate diagnosis of DS. Copyright © 2012 John Wiley & Sons, Ltd.

  18. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System.

    PubMed

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-11-01

    Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV.

  19. Bioinformatics prediction and experimental validation of microRNA-20a targeting Cyclin D1 in hepatocellular carcinoma.

    PubMed

    Karimkhanloo, Hamzeh; Mohammadi-Yeganeh, Samira; Ahsani, Zeinab; Paryan, Mahdi

    2017-04-01

    Hepatocellular carcinoma is the major form of primary liver cancer, which is the second and sixth leading cause of cancer-related death in men and women, respectively. Extensive research indicates that Wnt/β-catenin signaling pathway, which plays a pivotal role in growth, development, and differentiation of hepatocellular carcinoma, is one of the major signaling pathways that is dysregulated in hepatocellular carcinoma. Cyclin D1 is a proto-oncogene and is one of the major regulators of Wnt signaling pathway, and its overexpression has been detected in various types of cancers including hepatocellular carcinoma. Using several validated bioinformatic databases, we predicted that the microRNAs are capable of targeting 3'-untranslated region of Cyclin D1 messenger RNA. According to the results, miR-20a was selected as the highest ranking microRNA targeting Cyclin D1 messenger RNA. Luciferase assay was recruited to confirm bioinformatic prediction results. Cyclin D1 expression was first assessed by quantitative real-time polymerase chain reaction in HepG2 cell line. Afterward, HepG2 cells were transduced by lentiviruses containing miR-20a. Then, the expression of miR-20a and Cyclin D1 was evaluated. The results of luciferase assay demonstrated targeting of 3'-untranslated region of Cyclin D1 messenger RNA by miR-20a. Furthermore, 238-fold decline in Cyclin D1 expression was observed after lentiviral induction of miR-20a in HepG2 cells. The results highlighted a considerable effect of miRNA-20a induction on the down-regulation of Cyclin D1 gene. Our results suggest that miR-20a can be used as a novel candidate for therapeutic purposes and a biomarker for hepatocellular carcinoma diagnosis.

  20. SOBA: sequence ontology bioinformatics analysis.

    PubMed

    Moore, Barry; Fan, Guozhen; Eilbeck, Karen

    2010-07-01

    The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

  1. Freshwater Metaviromics and Bacteriophages: A Current Assessment of the State of the Art in Relation to Bioinformatic Challenges

    PubMed Central

    Bruder, Katherine; Malki, Kema; Cooper, Alexandria; Sible, Emily; Shapiro, Jason W.; Watkins, Siobhan C.; Putonti, Catherine

    2016-01-01

    Advances in bioinformatics and sequencing technologies have allowed for the analysis of complex microbial communities at an unprecedented rate. While much focus is often placed on the cellular members of these communities, viruses play a pivotal role, particularly bacteria-infecting viruses (bacteriophages); phages mediate global biogeochemical processes and drive microbial evolution through bacterial grazing and horizontal gene transfer. Despite their importance and ubiquity in nature, very little is known about the diversity and structure of viral communities. Though the need for culture-based methods for viral identification has been somewhat circumvented through metagenomic techniques, the analysis of metaviromic data is marred with many unique issues. In this review, we examine the current bioinformatic approaches for metavirome analyses and the inherent challenges facing the field as illustrated by the ongoing efforts in the exploration of freshwater phage populations. PMID:27375355

  2. Structure prediction of Fe(II) 2-oxoglutarate dioxygenase from a psychrophilic yeast Glaciozyma antarctica PI12

    NASA Astrophysics Data System (ADS)

    Yusof, Nik Yusnoraini; Bakar, Farah Diba Abu; Mahadi, Nor Muhammad; Raih, Mohd Firdaus; Murad, Abdul Munir Abdul

    2015-09-01

    A cDNA encoding Fe(II) 2-oxoglutarate (2OG) dependent dioxygenases was isolated from psychrophilic yeast, Glaciozyma antarctica PI12. We have successfully amplified 1,029 bp cDNA sequence that encodes 342 amino acid with predicted molecular weight 38 kDa. The prediction protein was analysed using various bioinformatics tools to explore the properties of the protein. Based on a BLAST search analysis, the Fe2OX amino acid sequence showed 61% identity to the sequence of oxoglutarate/iron-dependent oxygenase from Rhodosporidium toruloides NP11. SignalP prediction showed that the Fe2OX protein contains no putative signal peptide, which suggests that this enzyme most probably localised intracellularly.The structure of Fe2OX was predicted by homology modelling using MODELLER9v11. The model with the lowest objective function was selected from hundred models generated using MODELLER9v11. Analysis of the structure revealed the longer loop at Fe2OX from G.antarctica that might be responsible for the flexibility of the structure, which contributes to its adaptation to low temperatures. Fe2OX hold a highly conserved Fe(II) binding HXD/E…H triad motif. The binding site for 2-oxoglutarate was found conserved for Arg280 among reported studies, however the Phe268 was found to be different in Fe2OX.

  3. Standardized data collection to build prediction models in oncology: a prototype for rectal cancer.

    PubMed

    Meldolesi, Elisa; van Soest, Johan; Damiani, Andrea; Dekker, Andre; Alitto, Anna Rita; Campitelli, Maura; Dinapoli, Nicola; Gatta, Roberto; Gambacorta, Maria Antonietta; Lanzotti, Vito; Lambin, Philippe; Valentini, Vincenzo

    2016-01-01

    The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor heterogeneity and the clinical outcome and/or treatment's toxicity's complexity, justify the effort to develop predictive models from decision support systems. However, the number of evaluated variables coming from multiple disciplines: oncology, computer science, bioinformatics, statistics, genomics, imaging, among others could be very large thus making traditional statistical analysis difficult to exploit. Automated data-mining processes and machine learning approaches can be a solution to organize the massive amount of data, trying to unravel important interaction. The purpose of this paper is to describe the strategy to collect and analyze data properly for decision support and introduce the concept of an 'umbrella protocol' within the framework of 'rapid learning healthcare'.

  4. An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

    PubMed

    van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

    2009-06-01

    Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.

  5. How Can We Use Bioinformatics to Predict Which Agents Will Cause Birth Defects?

    EPA Science Inventory

    The availability of genomic sequences from a growing number of human and model organisms has provided an explosion of data, information, and knowledge regarding biological systems and disease processes. High-throughput technologies such as DNA and protein microarray biochips are ...

  6. Comparison of two bioinformatics tools used to characterize the microbial diversity and predictive functional attributes of microbial mats from Lake Obersee, Antarctica.

    PubMed

    Koo, Hyunmin; Hakim, Joseph A; Morrow, Casey D; Eipers, Peter G; Davila, Alfonso; Andersen, Dale T; Bej, Asim K

    2017-09-01

    In this study, using NextGen sequencing of the collective 16S rRNA genes obtained from two sets of samples collected from Lake Obersee, Antarctica, we compared and contrasted two bioinformatics tools, PICRUSt and Tax4Fun. We then developed an R script to assess the taxonomic and predictive functional profiles of the microbial communities within the samples. Taxa such as Pseudoxanthomonas, Planctomycetaceae, Cyanobacteria Subsection III, Nitrosomonadaceae, Leptothrix, and Rhodobacter were exclusively identified by Tax4Fun that uses SILVA database; whereas PICRUSt that uses Greengenes database uniquely identified Pirellulaceae, Gemmatimonadetes A1-B1, Pseudanabaena, Salinibacterium and Sinobacteraceae. Predictive functional profiling of the microbial communities using Tax4Fun and PICRUSt separately revealed common metabolic capabilities, while also showing specific functional IDs not shared between the two approaches. Combining these functional predictions using a customized R script revealed a more inclusive metabolic profile, such as hydrolases, oxidoreductases, transferases; enzymes involved in carbohydrate and amino acid metabolisms; and membrane transport proteins known for nutrient uptake from the surrounding environment. Our results present the first molecular-phylogenetic characterization and predictive functional profiles of the microbial mat communities in Lake Obersee, while demonstrating the efficacy of combining both the taxonomic assignment information and functional IDs using the R script created in this study for a more streamlined evaluation of predictive functional profiles of microbial communities. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Open source tools and toolkits for bioinformatics: significance, and where are we?

    PubMed

    Stajich, Jason E; Lapp, Hilmar

    2006-09-01

    This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed over the last 2-3 years, we use a few selected projects encompassing toolkit libraries, analysis tools, data analysis environments and interoperability standards to show how freely available and modifiable open-source software can serve as the foundation for building important applications, analysis workflows and resources.

  8. Genomics for Everyone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick

    Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.

  9. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes.

    PubMed

    Yu, Nancy Y; Wagner, James R; Laird, Matthew R; Melli, Gabor; Rey, Sébastien; Lo, Raymond; Dao, Phuong; Sahinalp, S Cenk; Ester, Martin; Foster, Leonard J; Brinkman, Fiona S L

    2010-07-01

    PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program. We developed PSORTb version 3.0 with improved recall, higher proteome-scale prediction coverage, and new refined localization subcategories. It is the first SCL predictor specifically geared for all prokaryotes, including archaea and bacteria with atypical membrane/cell wall topologies. It features an improved standalone program, with a new batch results delivery system complementing its web interface. We evaluated the most accurate SCL predictors using 5-fold cross validation plus we performed an independent proteomics analysis, showing that PSORTb 3.0 is the most accurate but can benefit from being complemented by Proteome Analyst predictions. http://www.psort.org/psortb (download open source software or use the web interface). psort-mail@sfu.ca Supplementary data are available at Bioinformatics online.

  10. An architecture for genomics analysis in a clinical setting using Galaxy and Docker

    PubMed Central

    Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A

    2017-01-01

    Abstract Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. PMID:29048555

  11. An architecture for genomics analysis in a clinical setting using Galaxy and Docker.

    PubMed

    Digan, W; Countouris, H; Barritault, M; Baudoin, D; Laurent-Puig, P; Blons, H; Burgun, A; Rance, B

    2017-11-01

    Next-generation sequencing is used on a daily basis to perform molecular analysis to determine subtypes of disease (e.g., in cancer) and to assist in the selection of the optimal treatment. Clinical bioinformatics handles the manipulation of the data generated by the sequencer, from the generation to the analysis and interpretation. Reproducibility and traceability are crucial issues in a clinical setting. We have designed an approach based on Docker container technology and Galaxy, the popular bioinformatics analysis support open-source software. Our solution simplifies the deployment of a small-size analytical platform and simplifies the process for the clinician. From the technical point of view, the tools embedded in the platform are isolated and versioned through Docker images. Along the Galaxy platform, we also introduce the AnalysisManager, a solution that allows single-click analysis for biologists and leverages standardized bioinformatics application programming interfaces. We added a Shiny/R interactive environment to ease the visualization of the outputs. The platform relies on containers and ensures the data traceability by recording analytical actions and by associating inputs and outputs of the tools to EDAM ontology through ReGaTe. The source code is freely available on Github at https://github.com/CARPEM/GalaxyDocker. © The Author 2017. Published by Oxford University Press.

  12. [Prediction and evolution of B cell epitopes of hemagglutinin in human-infecting H6N1 avian influenza virus].

    PubMed

    Yang, Jianke; Yuan, Jian; Gao, Jiguang; Zhu, Xiaolei; Lin, Aiqin

    2015-01-01

    To predict B cell epitopes of hemagglutinin (HA) of human-infecting H6N1 avian influenza virus and analyze their evolutionary characteristics. The dataset was downloaded from GISAID and GenBank databases. And the linear and conformational B cell epitopes of HA were predicted separately by various bioinformatic software. Furthermore, the conservation, adaptation and other evolutionary characteristics were also analyzed by some bioinformatic means. Four linear epitopes (A, B, C and D) and two conformational epitopes (E and F) were obtained after consideration of multiple factors. And the C epitope and sites ( 41, 157, 186, 187) mutated easily, but the other epitopes were very conservative and the D epitope was the most conservative. Interestingly, the site 157 was identified under positive selection, suggesting that it may be a particularly important site to make the virus evade the attack from the host immune system. The HA of human-infecting H6N1 avian influenza virus has five conservative B cell epitopes (three linear and two conformational) and one site under positive selection. The findings would facilitate the vaccine development, virus control and pathogenesis understanding.

  13. An overview of topic modeling and its current applications in bioinformatics.

    PubMed

    Liu, Lin; Tang, Lin; Dong, Wen; Yao, Shaowen; Zhou, Wei

    2016-01-01

    With the rapid accumulation of biological datasets, machine learning methods designed to automate data analysis are urgently needed. In recent years, so-called topic models that originated from the field of natural language processing have been receiving much attention in bioinformatics because of their interpretability. Our aim was to review the application and development of topic models for bioinformatics. This paper starts with the description of a topic model, with a focus on the understanding of topic modeling. A general outline is provided on how to build an application in a topic model and how to develop a topic model. Meanwhile, the literature on application of topic models to biological data was searched and analyzed in depth. According to the types of models and the analogy between the concept of document-topic-word and a biological object (as well as the tasks of a topic model), we categorized the related studies and provided an outlook on the use of topic models for the development of bioinformatics applications. Topic modeling is a useful method (in contrast to the traditional means of data reduction in bioinformatics) and enhances researchers' ability to interpret biological information. Nevertheless, due to the lack of topic models optimized for specific biological data, the studies on topic modeling in biological data still have a long and challenging road ahead. We believe that topic models are a promising method for various applications in bioinformatics research.

  14. A generally applicable lightweight method for calculating a value structure for tools and services in bioinformatics infrastructure projects.

    PubMed

    Mayer, Gerhard; Quast, Christian; Felden, Janine; Lange, Matthias; Prinz, Manuel; Pühler, Alfred; Lawerenz, Chris; Scholz, Uwe; Glöckner, Frank Oliver; Müller, Wolfgang; Marcus, Katrin; Eisenacher, Martin

    2017-10-30

    Sustainable noncommercial bioinformatics infrastructures are a prerequisite to use and take advantage of the potential of big data analysis for research and economy. Consequently, funders, universities and institutes as well as users ask for a transparent value model for the tools and services offered. In this article, a generally applicable lightweight method is described by which bioinformatics infrastructure projects can estimate the value of tools and services offered without determining exactly the total costs of ownership. Five representative scenarios for value estimation from a rough estimation to a detailed breakdown of costs are presented. To account for the diversity in bioinformatics applications and services, the notion of service-specific 'service provision units' is introduced together with the factors influencing them and the main underlying assumptions for these 'value influencing factors'. Special attention is given on how to handle personnel costs and indirect costs such as electricity. Four examples are presented for the calculation of the value of tools and services provided by the German Network for Bioinformatics Infrastructure (de.NBI): one for tool usage, one for (Web-based) database analyses, one for consulting services and one for bioinformatics training events. Finally, from the discussed values, the costs of direct funding and the costs of payment of services by funded projects are calculated and compared. © The Author 2017. Published by Oxford University Press.

  15. Secretome Analysis of Vibrio cholerae Type VI Secretion System Reveals a New Effector-Immunity Pair

    PubMed Central

    Altindis, Emrah; Dong, Tao; Catalano, Christy

    2015-01-01

    ABSTRACT The type VI secretion system (T6SS) is a dynamic macromolecular organelle that many Gram-negative bacteria use to inhibit or kill other prokaryotic or eukaryotic cells. The toxic effectors of T6SS are delivered to the prey cells in a contact-dependent manner. In Vibrio cholerae, the etiologic agent of cholera, T6SS is active during intestinal infection. Here, we describe the use of comparative proteomics coupled with bioinformatics to identify a new T6SS effector-immunity pair. This analysis was able to identify all previously identified secreted substrates of T6SS except PAAR (proline, alanine, alanine, arginine) motif-containing proteins. Additionally, this approach led to the identification of a new secreted protein encoded by VCA0285 (TseH) that carries a predicted hydrolase domain. We confirmed that TseH is toxic when expressed in the periplasm of Escherichia coli and V. cholerae cells. The toxicity observed in V. cholerae was suppressed by coexpression of the protein encoded by VCA0286 (TsiH), indicating that this protein is the cognate immunity protein of TseH. Furthermore, exogenous addition of purified recombinant TseH to permeabilized E. coli cells caused cell lysis. Bioinformatics analysis of the TseH protein sequence suggest that it is a member of a new family of cell wall-degrading enzymes that include proteins belonging to the YD repeat and Rhs superfamilies and that orthologs of TseH are likely expressed by species belonging to phyla as diverse as Bacteroidetes and Proteobacteria. PMID:25759499

  16. ETE: a python Environment for Tree Exploration.

    PubMed

    Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni

    2010-01-13

    Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.

  17. In Silico PCR Tools for a Fast Primer, Probe, and Advanced Searching.

    PubMed

    Kalendar, Ruslan; Muterko, Alexandr; Shamekova, Malika; Zhambakin, Kabyl

    2017-01-01

    The polymerase chain reaction (PCR) is fundamental to molecular biology and is the most important practical molecular technique for the research laboratory. The principle of this technique has been further used and applied in plenty of other simple or complex nucleic acid amplification technologies (NAAT). In parallel to laboratory "wet bench" experiments for nucleic acid amplification technologies, in silico or virtual (bioinformatics) approaches have been developed, among which in silico PCR analysis. In silico NAAT analysis is a useful and efficient complementary method to ensure the specificity of primers or probes for an extensive range of PCR applications from homology gene discovery, molecular diagnosis, DNA fingerprinting, and repeat searching. Predicting sensitivity and specificity of primers and probes requires a search to determine whether they match a database with an optimal number of mismatches, similarity, and stability. In the development of in silico bioinformatics tools for nucleic acid amplification technologies, the prospects for the development of new NAAT or similar approaches should be taken into account, including forward-looking and comprehensive analysis that is not limited to only one PCR technique variant. The software FastPCR and the online Java web tool are integrated tools for in silico PCR of linear and circular DNA, multiple primer or probe searches in large or small databases and for advanced search. These tools are suitable for processing of batch files that are essential for automation when working with large amounts of data. The FastPCR software is available for download at http://primerdigital.com/fastpcr.html and the online Java version at http://primerdigital.com/tools/pcr.html .

  18. Significant Natural Product Biosynthetic Potential of Actinorhizal Symbionts of the Genus Frankia, as Revealed by Comparative Genomic and Proteomic Analyses▿

    PubMed Central

    Udwary, Daniel W.; Gontang, Erin A.; Jones, Adam C.; Jones, Carla S.; Schultz, Andrew W.; Winter, Jaclyn M.; Yang, Jane Y.; Beauchemin, Nicholas; Capson, Todd L.; Clark, Benjamin R.; Esquenazi, Eduardo; Eustáquio, Alessandra S.; Freel, Kelle; Gerwick, Lena; Gerwick, William H.; Gonzalez, David; Liu, Wei-Ting; Malloy, Karla L.; Maloney, Katherine N.; Nett, Markus; Nunnery, Joshawna K.; Penn, Kevin; Prieto-Davo, Alejandra; Simmons, Thomas L.; Weitz, Sara; Wilson, Micheal C.; Tisa, Louis S.; Dorrestein, Pieter C.; Moore, Bradley S.

    2011-01-01

    Bacteria of the genus Frankia are mycelium-forming actinomycetes that are found as nitrogen-fixing facultative symbionts of actinorhizal plants. Although soil-dwelling actinomycetes are well-known producers of bioactive compounds, the genus Frankia has largely gone uninvestigated for this potential. Bioinformatic analysis of the genome sequences of Frankia strains ACN14a, CcI3, and EAN1pec revealed an unexpected number of secondary metabolic biosynthesis gene clusters. Our analysis led to the identification of at least 65 biosynthetic gene clusters, the vast majority of which appear to be unique and for which products have not been observed or characterized. More than 25 secondary metabolite structures or structure fragments were predicted, and these are expected to include cyclic peptides, siderophores, pigments, signaling molecules, and specialized lipids. Outside the hopanoid gene locus, no cluster could be convincingly demonstrated to be responsible for the few secondary metabolites previously isolated from other Frankia strains. Few clusters were shared among the three species, demonstrating species-specific biosynthetic diversity. Proteomic analysis of Frankia sp. strains CcI3 and EAN1pec showed that significant and diverse secondary metabolic activity was expressed in laboratory cultures. In addition, several prominent signals in the mass range of peptide natural products were observed in Frankia sp. CcI3 by intact-cell matrix-assisted laser desorption-ionization mass spectrometry (MALDI-MS). This work supports the value of bioinformatic investigation in natural products biosynthesis using genomic information and presents a clear roadmap for natural products discovery in the Frankia genus. PMID:21498757

  19. ETE: a python Environment for Tree Exploration

    PubMed Central

    2010-01-01

    Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org. PMID:20070885

  20. BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.

    PubMed

    Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun

    2015-01-15

    It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Cytomics in predictive medicine

    NASA Astrophysics Data System (ADS)

    Tarnok, Attila; Valet, Guenther K.

    2004-07-01

    Predictive Medicine aims at the detection of changes in patient's disease state prior to the manifestation of deterioration or improvement of the current status. Patient-specific, disease-course predictions with >95% or >99% accuracy during therapy would be highly valuable for everyday medicine. If these predictors were available, disease aggravation or progression, frequently accompanied by irreversible tissue damage or therapeutic side effects, could then potentially be avoided by early preventive therapy. The molecular analysis of heterogeneous cellular systems (Cytomics) by cytometry in conjunction with pattern-oriented bioinformatic analysis of the multiparametric cytometric and other data provides a promising approach to individualized or personalized medical treatment or disease management. Predictive medicine is best implemented by cell oriented measurements e.g. by flow or image cytometry. Cell oriented gene or protein arrays as well as bead arrays for the capture of solute molecules form serum, plasma, urine or liquor are equally of high value. Clinical applications of predictive medicine by Cytomics will include multi organ failure in sepsis or non infectious posttraumatic shock in intensive care, or the pretherapeutic identification of high risk patients in cancer cytostatic. Early individualized therapy may provide better survival chances for individual patient at concomitant cost containment. Predictive medicine guided early reduction or stop of therapy may lower or abrogate potential therapeutic side effects. Further important aspects of predictive medicine concern the preoperative identification of patients with a tendency for postoperative complications or coronary artery disease patients with an increased tendency for restenosis. As a consequence, better patient care and new forms of inductive scientific hypothesis development based on the interpretation of predictive data patterns are at reach.

  2. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  3. CRISPR library designer (CLD): software for multispecies design of single guide RNA libraries.

    PubMed

    Heigwer, Florian; Zhan, Tianzuo; Breinig, Marco; Winter, Jan; Brügemann, Dirk; Leible, Svenja; Boutros, Michael

    2016-03-24

    Genetic screens using CRISPR/Cas9 are a powerful method for the functional analysis of genomes. Here we describe CRISPR library designer (CLD), an integrated bioinformatics application for the design of custom single guide RNA (sgRNA) libraries for all organisms with annotated genomes. CLD is suitable for the design of libraries using modified CRISPR enzymes and targeting non-coding regions. To demonstrate its utility, we perform a pooled screen for modulators of the TNF-related apoptosis inducing ligand (TRAIL) pathway using a custom library of 12,471 sgRNAs. CLD predicts a high fraction of functional sgRNAs and is publicly available at https://github.com/boutroslab/cld.

  4. Halophiles and their enzymes: negativity put to good use.

    PubMed

    DasSarma, Shiladitya; DasSarma, Priya

    2015-06-01

    Halophilic microorganisms possess stable enzymes that function in very high salinity, an extreme condition that leads to denaturation, aggregation, and precipitation of most other proteins. Genomic and structural analyses have established that the enzymes of halophilic Archaea and many halophilic Bacteria are negatively charged due to an excess of acidic over basic residues, and altered hydrophobicity, which enhance solubility and promote function in low water activity conditions. Here, we provide an update on recent bioinformatic analysis of predicted halophilic proteomes as well as experimental molecular studies on individual halophilic enzymes. Recent efforts on discovery and utilization of halophiles and their enzymes for biotechnology, including biofuel applications are also considered. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Halophiles and their enzymes: Negativity put to good use

    PubMed Central

    DasSarma, Shiladitya; DasSarma, Priya

    2015-01-01

    Halophilic microorganisms possess stable enzymes that function in very high salinity, an extreme condition that leads to denaturation, aggregation, and precipitation of most other proteins. Genomic and structural analyses have established that the enzymes of halophilic Archaea and many halophilic Bacteria are negatively charged due to an excess of acidic over basic residues, and altered hydrophobicity, which enhance solubility and promote function in low water activity conditions. Here, we provide an update on recent bioinformatic analysis of predicted halophilic proteomes as well as experimental molecular studies on individual halophilic enzymes. On-going efforts on discovery and utilization of halophiles and their enzymes for biotechnology, including biofuel applications are also considered. PMID:26066288

  6. BIOINFORMATICS IN THE K-8 CLASSROOM: DESIGNING INNOVATIVE ACTIVITIES FOR TEACHER IMPLEMENTATION

    PubMed Central

    Shuster, Michele; Claussen, Kira; Locke, Melly; Glazewski, Krista

    2016-01-01

    At the intersection of biology and computer science is the growing field of bioinformatics—the analysis of complex datasets of biological relevance. Despite the increasing importance of bioinformatics and associated practical applications, these are not standard topics in elementary and middle school classrooms. We report on a pilot project and its evolution to support implementation of bioinformatics-based activities in elementary and middle school classrooms. Specifically, we ultimately designed a multi-day summer teacher professional development workshop, in which teachers design innovative classroom activities. By focusing on teachers, our design leverages enhanced teacher knowledge and confidence to integrate innovative instructional materials into K-8 classrooms and contributes to capacity building in STEM instruction. PMID:27429860

  7. Imperfect duplicate insertions type of mutations in plasmepsin V modulates binding properties of PEXEL motifs of export proteins in Indian Plasmodium vivax.

    PubMed

    Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

    2013-01-01

    Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200-300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246-249 AA and SLSE from 266-269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing.

  8. Imperfect Duplicate Insertions Type of Mutations in Plasmepsin V Modulates Binding Properties of PEXEL Motifs of Export Proteins in Indian Plasmodium vivax

    PubMed Central

    Rawat, Manmeet; Vijay, Sonam; Gupta, Yash; Tiwari, Pramod Kumar; Sharma, Arun

    2013-01-01

    Introduction Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200–300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages. Method We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins. Results Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246–249 AA and SLSE from 266–269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites. Conclusion/Significance Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing. PMID:23555891

  9. Bioinformatics analysis for structure and function of CPR of Plasmodium falciparum.

    PubMed

    Fan, Zhigang; Zhang, Lingmin; Yan, Guogang; Wu, Qiang; Gan, Xiufeng; Zhong, Saifeng; Lin, Guifen

    2011-02-01

    To analyse the structure and function of NADPH-cytochrome p450 reductase (CYPOR or CPR) from Plasmodium falciparum (Pf), and to predict its' drug target and vaccine target. The structure, function, drug target and vaccine target of CPR from Plasmodium falciparum were analyzed and predicted by bioinformatics methods. PfCPR, which was older CPR, had close relationship with the CPR from other Plasmodium species, but it was distant from its hosts, such as Homo sapiens and Anopheles. PfCPR was located in the cellular nucleus of Plasmodium falciparum. 335aa-352aa and 591aa - 608aa were inserted the interior side of the nuclear membrane, while 151aa-265aa was located in the nucleolus organizer regions. PfCPR had 40 function sites and 44 protein-protein binding sites in amino acid sequence. The teriary structure of 1aa-700aa was forcep-shaped with wings. 15 segments of PfCPR had no homology with Homo sapien CPR and most were exposed on the surface of the protein. These segments had 25 protein-protein binding sites. While 13 other segments all possessed function sites. The evolution or genesis of Plasmodium falciparum is earlier than those of Homo sapiens. PfCPR is a possible resistance site of antimalarial drug and may involve immune evasion, which is associated with parasite of sporozoite in hepatocytes. PfCPR is unsuitable as vaccine target, but it has at least 13 ideal drug targets. Copyright © 2011 Hainan Medical College. Published by Elsevier B.V. All rights reserved.

  10. Three novel variants (p.Glu178Lys, p.Val245Met, p.Ser250Phe) of the phenylalanine hydroxylase (PAH) gene impair protein expression and function in vitro.

    PubMed

    Zong, Yanan; Liu, Ning; Ma, Shanshan; Bai, Ying; Guan, Fangxia; Kong, Xiangdong

    2018-08-20

    Phenylketonuria (PKU) is the most common inherited metabolic disease, an autosomal recessive disorder affecting >10,000 newborns each year globally. It can be caused by over 1000 different naturally occurring mutations in the phenylalanine hydroxylase (PAH) gene. We analyzed three novel naturally occurring PAH gene variants: p.Glu178Lys (c.532G>A), p.Val245Met (c.733G>A) and p.Ser250Phe (c.749C>T). The mutant effect on the PAH enzyme structure and function was predicted by bioinformatics software. Vectors expressing the corresponding PAH variants were generated for expression in E. coli and in HEK293T cells. The RNA expression of the three PAH variants was measured by quantitative reverse transcription polymerase chain reaction (RT-qPCR). The mutant PAH protein levels were determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), western blot and enzyme-linked immunosorbent assay (ELISA). All three variants were predicted to be pathogenic by bioinformatics analysis. The transcription of the three PAH variants was similar to the wild type PAH gene in HEK293T cells. In contrast, the levels of mutant PAH proteins decreased significantly compared to the wild type control, in both E. coli and HEK293T cells. Our results indicate that the three novel PAH gene variants (p.Glu178Lys, p.Val245Met, p.Ser250Phe) impair PAH protein expression and function in prokaryotic and eukaryotic cells. Copyright © 2018. Published by Elsevier B.V.

  11. Decoding the complex genetic causes of heart diseases using systems biology.

    PubMed

    Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K

    2015-03-01

    The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.

  12. Bioinformatics Analysis Reveals Genes Involved in the Pathogenesis of Ameloblastoma and Keratocystic Odontogenic Tumor.

    PubMed

    Santos, Eliane Macedo Sobrinho; Santos, Hércules Otacílio; Dos Santos Dias, Ivoneth; Santos, Sérgio Henrique; Batista de Paula, Alfredo Maurício; Feltenberger, John David; Sena Guimarães, André Luiz; Farias, Lucyana Conceição

    2016-01-01

    Pathogenesis of odontogenic tumors is not well known. It is important to identify genetic deregulations and molecular alterations. This study aimed to investigate, through bioinformatic analysis, the possible genes involved in the pathogenesis of ameloblastoma (AM) and keratocystic odontogenic tumor (KCOT). Genes involved in the pathogenesis of AM and KCOT were identified in GeneCards. Gene list was expanded, and the gene interactions network was mapped using the STRING software. "Weighted number of links" (WNL) was calculated to identify "leader genes" (highest WNL). Genes were ranked by K-means method and Kruskal-Wallis test was used (P<0.001). Total interactions score (TIS) was also calculated using all interaction data generated by the STRING database, in order to achieve global connectivity for each gene. The topological and ontological analyses were performed using Cytoscape software and BinGO plugin. Literature review data was used to corroborate the bioinformatics data. CDK1 was identified as leader gene for AM. In KCOT group, results show PCNA and TP53 . Both tumors exhibit a power law behavior. Our topological analysis suggested leader genes possibly important in the pathogenesis of AM and KCOT, by clustering coefficient calculated for both odontogenic tumors (0.028 for AM, zero for KCOT). The results obtained in the scatter diagram suggest an important relationship of these genes with the molecular processes involved in AM and KCOT. Ontological analysis for both AM and KCOT demonstrated different mechanisms. Bioinformatics analyzes were confirmed through literature review. These results may suggest the involvement of promising genes for a better understanding of the pathogenesis of AM and KCOT.

  13. Secondary electrospray ionization-mass spectrometry and a novel statistical bioinformatic approach identifies a cancer-related profile in exhaled breath of breast cancer patients: a pilot study.

    PubMed

    Martinez-Lozano Sinues, Pablo; Landoni, Elena; Miceli, Rosalba; Dibari, Vincenza F; Dugo, Matteo; Agresti, Roberto; Tagliabue, Elda; Cristoni, Simone; Orlandi, Rosaria

    2015-09-21

    Breath analysis represents a new frontier in medical diagnosis and a powerful tool for cancer biomarker discovery due to the recent development of analytical platforms for the detection and identification of human exhaled volatile compounds. Statistical and bioinformatic tools may represent an effective complement to the technical and instrumental enhancements needed to fully exploit clinical applications of breath analysis. Our exploratory study in a cohort of 14 breast cancer patients and 11 healthy volunteers used secondary electrospray ionization-mass spectrometry (SESI-MS) to detect a cancer-related volatile profile. SESI-MS full-scan spectra were acquired in a range of 40-350 mass-to-charge ratio (m/z), converted to matrix data and analyzed using a procedure integrating data pre-processing for quality control, and a two-step class prediction based on machine-learning techniques, including a robust feature selection, and a classifier development with internal validation. MS spectra from exhaled breath showed an individual-specific breath profile and high reciprocal homogeneity among samples, with strong agreement among technical replicates, suggesting a robust responsiveness of SESI-MS. Supervised analysis of breath data identified a support vector machine (SVM) model including 8 features corresponding to m/z 106, 126, 147, 78, 148, 52, 128, 315 and able to discriminate exhaled breath from breast cancer patients from that of healthy individuals, with sensitivity and specificity above 0.9.Our data highlight the significance of SESI-MS as an analytical technique for clinical studies of breath analysis and provide evidence that our noninvasive strategy detects volatile signatures that may support existing technologies to diagnose breast cancer.

  14. Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.

    PubMed

    Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T

    2010-01-01

    Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.

  15. hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms

    PubMed Central

    Xie, Qingjun; Tzfadia, Oren; Levy, Matan; Weithorn, Efrat; Peled-Zehavi, Hadas; Van Parys, Thomas; Van de Peer, Yves; Galili, Gad

    2016-01-01

    ABSTRACT Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/. PMID:27071037

  16. FMAP: Functional Mapping and Analysis Pipeline for metagenomics and metatranscriptomics studies.

    PubMed

    Kim, Jiwoong; Kim, Min Soo; Koh, Andrew Y; Xie, Yang; Zhan, Xiaowei

    2016-10-10

    Given the lack of a complete and comprehensive library of microbial reference genomes, determining the functional profile of diverse microbial communities is challenging. The available functional analysis pipelines lack several key features: (i) an integrated alignment tool, (ii) operon-level analysis, and (iii) the ability to process large datasets. Here we introduce our open-sourced, stand-alone functional analysis pipeline for analyzing whole metagenomic and metatranscriptomic sequencing data, FMAP (Functional Mapping and Analysis Pipeline). FMAP performs alignment, gene family abundance calculations, and statistical analysis (three levels of analyses are provided: differentially-abundant genes, operons and pathways). The resulting output can be easily visualized with heatmaps and functional pathway diagrams. FMAP functional predictions are consistent with currently available functional analysis pipelines. FMAP is a comprehensive tool for providing functional analysis of metagenomic/metatranscriptomic sequencing data. With the added features of integrated alignment, operon-level analysis, and the ability to process large datasets, FMAP will be a valuable addition to the currently available functional analysis toolbox. We believe that this software will be of great value to the wider biology and bioinformatics communities.

  17. Web-based services for drug design and discovery.

    PubMed

    Frey, Jeremy G; Bird, Colin L

    2011-09-01

    Reviews of the development of drug discovery through the 20(th) century recognised the importance of chemistry and increasingly bioinformatics, but had relatively little to say about the importance of computing and networked computing in particular. However, the design and discovery of new drugs is arguably the most significant single application of bioinformatics and cheminformatics to have benefitted from the increases in the range and power of the computational techniques since the emergence of the World Wide Web, commonly now referred to as simply 'the Web'. Web services have enabled researchers to access shared resources and to deploy standardized calculations in their search for new drugs. This article first considers the fundamental principles of Web services and workflows, and then explores the facilities and resources that have evolved to meet the specific needs of chem- and bio-informatics. This strategy leads to a more detailed examination of the basic components that characterise molecules and the essential predictive techniques, followed by a discussion of the emerging networked services that transcend the basic provisions, and the growing trend towards embracing modern techniques, in particular the Semantic Web. In the opinion of the authors, the issues that require community action are: increasing the amount of chemical data available for open access; validating the data as provided; and developing more efficient links between the worlds of cheminformatics and bioinformatics. The goal is to create ever better drug design services.

  18. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

  19. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  20. Tools4miRs – one place to gather all the tools for miRNA analysis

    PubMed Central

    Lukasik, Anna; Wójcikowski, Maciej; Zielenkiewicz, Piotr

    2016-01-01

    Summary: MiRNAs are short, non-coding molecules that negatively regulate gene expression and thereby play several important roles in living organisms. Dozens of computational methods for miRNA-related research have been developed, which greatly differ in various aspects. The substantial availability of difficult-to-compare approaches makes it challenging for the user to select a proper tool and prompts the need for a solution that will collect and categorize all the methods. Here, we present tools4miRs, the first platform that gathers currently more than 160 methods for broadly defined miRNA analysis. The collected tools are classified into several general and more detailed categories in which the users can additionally filter the available methods according to their specific research needs, capabilities and preferences. Tools4miRs is also a web-based target prediction meta-server that incorporates user-designated target prediction methods into the analysis of user-provided data. Availability and Implementation: Tools4miRs is implemented in Python using Django and is freely available at tools4mirs.org. Contact: piotr@ibb.waw.pl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153626

  1. Tools4miRs - one place to gather all the tools for miRNA analysis.

    PubMed

    Lukasik, Anna; Wójcikowski, Maciej; Zielenkiewicz, Piotr

    2016-09-01

    MiRNAs are short, non-coding molecules that negatively regulate gene expression and thereby play several important roles in living organisms. Dozens of computational methods for miRNA-related research have been developed, which greatly differ in various aspects. The substantial availability of difficult-to-compare approaches makes it challenging for the user to select a proper tool and prompts the need for a solution that will collect and categorize all the methods. Here, we present tools4miRs, the first platform that gathers currently more than 160 methods for broadly defined miRNA analysis. The collected tools are classified into several general and more detailed categories in which the users can additionally filter the available methods according to their specific research needs, capabilities and preferences. Tools4miRs is also a web-based target prediction meta-server that incorporates user-designated target prediction methods into the analysis of user-provided data. Tools4miRs is implemented in Python using Django and is freely available at tools4mirs.org. piotr@ibb.waw.pl Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  2. Genomics for Everyone

    ScienceCinema

    Chain, Patrick

    2018-05-31

    Genomics — the genetic mapping and DNA sequencing of sets of genes or the complete genomes of organisms, along with related genome analysis and database work — is emerging as one of the transformative sciences of the 21st century. But current bioinformatics tools are not accessible to most biological researchers. Now, a new computational and web-based tool called EDGE Bioinformatics is working to fulfill the promise of democratizing genomics.

  3. The Ensembl genome database project.

    PubMed

    Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

    2002-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

  4. Complete genomic sequence of the Lactobacillus temperate phage LF1.

    PubMed

    Yoon, Bo Hyun; Chang, Hyo Ihl

    2011-10-01

    Bacteriophage LF1, a newly isolated temperate phage from a mitomycin-C-induced lysate of wild type Lactobacillus fermentum, was found to contain a double-strand DNA of 42,606 base pairs (bp) with a G+C content of 45%. Bioinformatic analysis of the phage genome revealed 57 putative open reading frames (ORFs). The predicted protein products of ORFs were determined and described. According to morphological analysis by transmission electron microscopy (TEM), LF1 has an isometric head and a non-contractile tail, indicating that it belongs to the family Siphoviridae. The temperate phage LF1 has a good genetic mosaic relationship with ΦPYB5 in the packaging module. To our knowledge, this is first report of genomic sequencing and characterization of temperate phage LF1 from wild-type L. fermentum isolated from Kimchi in Korea.

  5. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  6. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines.

    PubMed

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis; Krampis, Konstantinos

    2017-08-01

    Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a "meta-script" that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. © The Authors 2017. Published by Oxford University Press.

  7. Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

    PubMed Central

    Kim, Baekdoo; Ali, Thahmina; Lijeron, Carlos; Afgan, Enis

    2017-01-01

    Abstract Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets. PMID:28854616

  8. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System

    PubMed Central

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-01-01

    Background Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. Objectives The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. Materials and Methods To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Results Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Conclusions Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV. PMID:26862379

  9. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.

    PubMed

    Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E

    2012-03-19

    A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.

  10. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community

    PubMed Central

    2012-01-01

    Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538

  11. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    PubMed Central

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  12. Bioinformatic perspectives on NRPS/PKS megasynthases: advances and challenges.

    PubMed

    Jenke-Kodama, Holger; Dittmann, Elke

    2009-07-01

    The increased understanding of both fundamental principles and mechanistic variations of NRPS/PKS megasynthases along with the unprecedented availability of microbial sequences has inspired a number of in silico studies of both enzyme families. The insights that can be extracted from these analyses go far beyond a rough classification of data and have turned bioinformatics into a frontier field of natural products research. As databases are flooded with NRPS/PKS gene sequence of microbial genomes and metagenomes, increasingly reliable structural prediction methods can help to uncover hidden treasures. Already, phylogenetic analyses have revealed that NRPS/PKS pathways should not simply be regarded as enzyme complexes, specifically evolved to product a selected natural product. Rather, they represent a collection of genetic opinions, allowing biosynthetic pathways to be shuffled in a process of perpetual chemical innovations and pathways diversification in nature can give impulses for specificities, protein interactions and genetic engineering of libraries of novel peptides and polyketides. The successful translation of the knowledge obtained from bioinformatic dissection of NRPS/PKS megasynthases into new techniques for drug discovery and design remain challenges for the future.

  13. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  14. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  15. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  16. Novel approaches to effects-based monitoring: 21st century tools for bio-effects prediction and surveillance

    EPA Science Inventory

    Effects-based monitoring (EBM) has been employed as a complement to chemical monitoring to help address knowledge gaps between chemical occurrence and biological effects. We have piloted several pathway-based approaches to EBM, that utilize modern bioinformatic and high throughpu...

  17. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research

    PubMed Central

    Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-01-01

    Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919

  18. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.

    PubMed

    Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-05-01

    RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  19. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants

    PubMed Central

    Byrska-Bishop, Marta; Wallace, John; Frase, Alexander T; Ritchie, Marylyn D

    2018-01-01

    Abstract Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Contact mdritchie@geisinger.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968757

  20. Application of bioinformatics in chronobiology research.

    PubMed

    Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía

    2013-01-01

    Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through "omics" projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research.

  1. GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training

    PubMed Central

    Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.

    2015-01-01

    In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076

  2. Perspective: Role of structure prediction in materials discovery and design

    NASA Astrophysics Data System (ADS)

    Needs, Richard J.; Pickard, Chris J.

    2016-05-01

    Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.

  3. Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets.

    PubMed

    Rideout, Jai Ram; Chase, John H; Bolyen, Evan; Ackermann, Gail; González, Antonio; Knight, Rob; Caporaso, J Gregory

    2016-06-13

    Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google's Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

  4. MiR-181b targets Six2 and inhibits the proliferation of metanephric mesenchymal cells in vitro

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lyu, Zhongshi; Mao, Zhaomin; Wang, Honglian

    2013-11-01

    Highlights: •We do bio-informatics websites to analysis of Six2 3′UTR. •MiR181b is a putative miRNA which can targets Six2 3′UTR. •MiR-181b binding site in the 3′UTR of Six2 is functional. •MiR-181b suppresses MK3 cells cell proliferation by targeting Six2. -- Abstract: MicroRNAs (miRNAs) are small non-coding RNAs that down-regulate gene expression by binding to target mRNA for cleavage or translational repression, and play important regulatory roles in renal development. Despite increasing genes have been predicted to be miRNA targets by bioinformatic analysis during kidney development, few of them have been verified by experiment. The objective of our study is tomore » identify the miRNAs targeting Six2, a critical transcription factor that maintains the mesenchymal progenitor pool via self-renewal (proliferation) during renal development. We initially analyzed the 3′UTR of Six2 and found 37 binding sites targeted by 50 putative miRNAs in the 3′UTR of Six2. Among the 50 miRNAs, miR-181b is the miRNAs predicted by the three used websites. In our study, the results of luciferase reporter assay, realtime-PCR and Western blot demonstrated that miR-181b directly targeted on the 3′UTR of Six2 and down-regulate the expression of Six2 at mRNA and protein levels. Furthermore, EdU proliferation assay along with the Six2 rescue strategy showed that miR-181b suppresses the proliferation of metanephric mesenchymal by targeting Six2 in part. In our research, we concluded that by targeting the transcription factor gene Six2, miR-181b inhibits the proliferation of metanephric mesenchymal cells in vitro and might play an important role in the formation of nephrons.« less

  5. Mutational analysis of a predicted double β-propeller domain of the DspA/E effector of Erwinia amylovora.

    PubMed

    Siamer, Sabrina; Gaubert, Stéphane; Boureau, Tristan; Brisset, Marie-Noëlle; Barny, Marie-Anne

    2013-05-01

    The bacterium Erwinia amylovora causes fire blight, an invasive disease that threatens apple trees, pear trees and other plants of the Rosaceae family. Erwinia amylovora pathogenicity relies on a type III secretion system and on a single effector DspA/E. This effector belongs to the widespread AvrE family of effectors whose biological function is unknown. In this manuscript, we performed a bioinformatic analysis of DspA/E- and AvrE-related effectors. Motif search identified nuclear localization signals, peroxisome targeting signals, endoplasmic reticulum membrane retention signals and leucine zipper motifs, but none of these motifs were present in all the AvrE-related effectors analysed. Protein threading analysis, however, predicted a conserved double β-propeller domain in the N-terminal part of all the analysed effector sequences. We then performed a random pentapeptide mutagenesis of DspA/E, which led to the characterization of 13 new altered proteins with a five amino acids insertion. Eight harboured the insertion inside the predicted β-propeller domain and six of these eight insertions impaired DspA/E stability or function. Conversely, the two remaining insertions generated proteins that were functional and abundantly secreted in the supernatant suggesting that these two insertions stabilized the protein. © 2013 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.

  6. The PathoYeastract database: an information system for the analysis of gene and genomic transcription regulation in pathogenic yeasts.

    PubMed

    Monteiro, Pedro Tiago; Pais, Pedro; Costa, Catarina; Manna, Sauvagya; Sá-Correia, Isabel; Teixeira, Miguel Cacho

    2017-01-04

    We present the PATHOgenic YEAst Search for Transcriptional Regulators And Consensus Tracking (PathoYeastract - http://pathoyeastract.org) database, a tool for the analysis and prediction of transcription regulatory associations at the gene and genomic levels in the pathogenic yeasts Candida albicans and C. glabrata Upon data retrieval from hundreds of publications, followed by curation, the database currently includes 28 000 unique documented regulatory associations between transcription factors (TF) and target genes and 107 DNA binding sites, considering 134 TFs in both species. Following the structure used for the YEASTRACT database, PathoYeastract makes available bioinformatics tools that enable the user to exploit the existing information to predict the TFs involved in the regulation of a gene or genome-wide transcriptional response, while ranking those TFs in order of their relative importance. Each search can be filtered based on the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. Promoter analysis tools and interactive visualization tools for the representation of TF regulatory networks are also provided. The PathoYeastract database further provides simple tools for the prediction of gene and genomic regulation based on orthologous regulatory associations described for other yeast species, a comparative genomics setup for the study of cross-species evolution of regulatory networks. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Studying human disease genes in Caenorhabditis elegans: a molecular genetics laboratory project.

    PubMed

    Cox-Paulson, Elisabeth A; Grana, Theresa M; Harris, Michelle A; Batzli, Janet M

    2012-01-01

    Scientists routinely integrate information from various channels to explore topics under study. We designed a 4-wk undergraduate laboratory module that used a multifaceted approach to study a question in molecular genetics. Specifically, students investigated whether Caenorhabditis elegans can be a useful model system for studying genes associated with human disease. In a large-enrollment, sophomore-level laboratory course, groups of three to four students were assigned a gene associated with either breast cancer (brc-1), Wilson disease (cua-1), ovarian dysgenesis (fshr-1), or colon cancer (mlh-1). Students compared observable phenotypes of wild-type C. elegans and C. elegans with a homozygous deletion in the assigned gene. They confirmed the genetic deletion with nested polymerase chain reaction and performed a bioinformatics analysis to predict how the deletion would affect the encoded mRNA and protein. Students also performed RNA interference (RNAi) against their assigned gene and evaluated whether RNAi caused a phenotype similar to that of the genetic deletion. As a capstone activity, students prepared scientific posters in which they presented their data, evaluated whether C. elegans was a useful model system for studying their assigned genes, and proposed future directions. Assessment showed gains in understanding genotype versus phenotype, RNAi, common bioinformatics tools, and the utility of model organisms.

  8. Studying Human Disease Genes in Caenorhabditis elegans: A Molecular Genetics Laboratory Project

    PubMed Central

    Cox-Paulson, Elisabeth A.; Grana, Theresa M.; Harris, Michelle A.; Batzli, Janet M.

    2012-01-01

    Scientists routinely integrate information from various channels to explore topics under study. We designed a 4-wk undergraduate laboratory module that used a multifaceted approach to study a question in molecular genetics. Specifically, students investigated whether Caenorhabditis elegans can be a useful model system for studying genes associated with human disease. In a large-enrollment, sophomore-level laboratory course, groups of three to four students were assigned a gene associated with either breast cancer (brc-1), Wilson disease (cua-1), ovarian dysgenesis (fshr-1), or colon cancer (mlh-1). Students compared observable phenotypes of wild-type C. elegans and C. elegans with a homozygous deletion in the assigned gene. They confirmed the genetic deletion with nested polymerase chain reaction and performed a bioinformatics analysis to predict how the deletion would affect the encoded mRNA and protein. Students also performed RNA interference (RNAi) against their assigned gene and evaluated whether RNAi caused a phenotype similar to that of the genetic deletion. As a capstone activity, students prepared scientific posters in which they presented their data, evaluated whether C. elegans was a useful model system for studying their assigned genes, and proposed future directions. Assessment showed gains in understanding genotype versus phenotype, RNAi, common bioinformatics tools, and the utility of model organisms. PMID:22665589

  9. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.

    PubMed

    Jung, Sang-Kyu; McDonald, Karen

    2011-08-16

    Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.

  10. Novel therapeutic effects of sesamin on diabetes-induced cardiac dysfunction.

    PubMed

    Thuy, Tran Duong; Phan, Nam Nhut; Wang, Chih-Yang; Yu, Han-Gang; Wang, Shu-Yin; Huang, Pung-Ling; Do, Yi-Yin; Lin, Yen-Chang

    2017-05-01

    Diabetes is a risk factor that increases the occurrence and severity of cardiovascular events. Cardiovascular complications are the leading cause of mortality of 75% of patients with diabetes >40 years old. Sesamin, the bioactive compound extracted from Sesamum indicum, is a natural compound that has diverse beneficial effects on hypoglycemia and reducing cholesterol. The aim of this study is to investigate sesamin effects to diabetes-inducing cardiac hypertrophy. In the present study bioinformatics analysis demonstrated cardiac hypertrophy signaling may be the most important pathway for upregulating genes in sesamin-treated groups. To verify the bioinformatics prediction, sesamin was used as the main bioactive compound to attenuate the impact of diabetes induced by streptozotocin (STZ) on cardiac function in a rat model. The results revealed that oral administration of sesamin for 4 weeks (100 and 200 mg/kg body weight) marginally improved blood glucose levels, body weight and significantly ameliorated the effects on heart rate and blood pressure in rats with type 1 diabetes relative to control rats. The QT interval of sesamin was also reduced relative to the control group. The findings indicated that sesamin has potential cardioprotective effects in the STZ-induced diabetes model. This suggested that this can be used as a novel treatment for patients with diabetes with cardiac dysfunction complication.

  11. Novel therapeutic effects of sesamin on diabetes-induced cardiac dysfunction

    PubMed Central

    Thuy, Tran Duong; Phan, Nam Nhut; Wang, Chih-Yang; Yu, Han-Gang; Wang, Shu-Yin; Huang, Pung-Ling; Do, Yi-Yin; Lin, Yen-Chang

    2017-01-01

    Diabetes is a risk factor that increases the occurrence and severity of cardiovascular events. Cardiovascular complications are the leading cause of mortality of 75% of patients with diabetes >40 years old. Sesamin, the bioactive compound extracted from Sesamum indicum, is a natural compound that has diverse beneficial effects on hypoglycemia and reducing cholesterol. The aim of this study is to investigate sesamin effects to diabetes-inducing cardiac hypertrophy. In the present study bioinformatics analysis demonstrated cardiac hypertrophy signaling may be the most important pathway for upregulating genes in sesamin-treated groups. To verify the bioinformatics prediction, sesamin was used as the main bioactive compound to attenuate the impact of diabetes induced by streptozotocin (STZ) on cardiac function in a rat model. The results revealed that oral administration of sesamin for 4 weeks (100 and 200 mg/kg body weight) marginally improved blood glucose levels, body weight and significantly ameliorated the effects on heart rate and blood pressure in rats with type 1 diabetes relative to control rats. The QT interval of sesamin was also reduced relative to the control group. The findings indicated that sesamin has potential cardioprotective effects in the STZ-induced diabetes model. This suggested that this can be used as a novel treatment for patients with diabetes with cardiac dysfunction complication. PMID:28358428

  12. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications.

    PubMed

    Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén

    2016-08-11

    Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure-Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron-Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.

  13. Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization

    PubMed Central

    2011-01-01

    Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net. PMID:21846353

  14. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications

    PubMed Central

    Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén

    2016-01-01

    Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure–Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods. PMID:27529225

  15. Bioinformatics analysis of the oxidosqualene cyclase gene and the amino acid sequence in mangrove plants

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Wati, R.

    2017-01-01

    This study described the bioinformatics methods to analyze seven oxidosqualene cyclase (OSC) genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, similarity, subcellular localization and phylogenetic. The physical and chemical properties of seven mangrove OSC showed variation among the genes. The percentage of the secondary structure of seven mangrove OSC genes followed the order of a helix > random coil > extended chain structure. The values of chloroplast or signal peptide were too low, indicated that no chloroplast transit peptide or signal peptide of secretion pathway in mangrove OSC genes. The target peptide value of mitochondria varied from 0.163 to 0.430, indicated it was possible to exist. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove OSC genes. To clarify the relationship among the mangrove OSC gene, a phylogenetic tree was constructed. The phylogenetic tree shows that there are three clusters, Kandelia KcMS join with Bruguiera BgLUS, Rhizophora RsM1 was close to Bruguiera BgbAS, and Rhizophora RcCAS join with Kandelia KcCAS. The present study, therefore, supported the previous results that plant OSC genes form distinct clusters in the tree.

  16. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data.

    PubMed

    Nordberg, Henrik; Bhatia, Karan; Wang, Kai; Wang, Zhong

    2013-12-01

    The recent revolution in sequencing technologies has led to an exponential growth of sequence data. As a result, most of the current bioinformatics tools become obsolete as they fail to scale with data. To tackle this 'data deluge', here we introduce the BioPig sequence analysis toolkit as one of the solutions that scale to data and computation. We built BioPig on the Apache's Hadoop MapReduce system and the Pig data flow language. Compared with traditional serial and MPI-based algorithms, BioPig has three major advantages: first, BioPig's programmability greatly reduces development time for parallel bioinformatics applications; second, testing BioPig with up to 500 Gb sequences demonstrates that it scales automatically with size of data; and finally, BioPig can be ported without modification on many Hadoop infrastructures, as tested with Magellan system at National Energy Research Scientific Computing Center and the Amazon Elastic Compute Cloud. In summary, BioPig represents a novel program framework with the potential to greatly accelerate data-intensive bioinformatics analysis.

  17. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.

    PubMed

    Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan

    2012-01-01

    Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.

  18. Xylem sap proteomics.

    PubMed

    de Bernonville, Thomas Dugé; Albenne, Cécile; Arlat, Matthieu; Hoffmann, Laurent; Lauber, Emmanuelle; Jamet, Elisabeth

    2014-01-01

    Proteomic analysis of xylem sap has recently become a major field of interest to understand several biological questions related to plant development and responses to environmental clues. The xylem sap appears as a dynamic fluid undergoing changes in its proteome upon abiotic and biotic stresses. Unlike cell compartments which are amenable to purification in sufficient amount prior to proteomic analysis, the xylem sap has to be collected in particular conditions to avoid contamination by intracellular proteins and to obtain enough material. A model plant like Arabidopsis thaliana is not suitable for such an analysis because efficient harvesting of xylem sap is difficult. The analysis of the xylem sap proteome also requires specific procedures to concentrate proteins and to focus on proteins predicted to be secreted. Indeed, xylem sap proteins appear to be synthesized and secreted in the root stele or to originate from dying differentiated xylem cells. This chapter describes protocols to collect xylem sap from Brassica species and to prepare total and N-glycoprotein extracts for identification of proteins by mass spectrometry analyses and bioinformatics.

  19. XLinkDB 2.0: integrated, large-scale structural analysis of protein crosslinking data

    PubMed Central

    Schweppe, Devin K.; Zheng, Chunxiang; Chavez, Juan D.; Navare, Arti T.; Wu, Xia; Eng, Jimmy K.; Bruce, James E.

    2016-01-01

    Motivation: Large-scale chemical cross-linking with mass spectrometry (XL-MS) analyses are quickly becoming a powerful means for high-throughput determination of protein structural information and protein–protein interactions. Recent studies have garnered thousands of cross-linked interactions, yet the field lacks an effective tool to compile experimental data or access the network and structural knowledge for these large scale analyses. We present XLinkDB 2.0 which integrates tools for network analysis, Protein Databank queries, modeling of predicted protein structures and modeling of docked protein structures. The novel, integrated approach of XLinkDB 2.0 enables the holistic analysis of XL-MS protein interaction data without limitation to the cross-linker or analytical system used for the analysis. Availability and Implementation: XLinkDB 2.0 can be found here, including documentation and help: http://xlinkdb.gs.washington.edu/. Contact: jimbruce@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153666

  20. Bioinformatics: indispensable, yet hidden in plain sight?

    PubMed

    Bartlett, Andrew; Penders, Bart; Lewis, Jamie

    2017-06-21

    Bioinformatics has multitudinous identities, organisational alignments and disciplinary links. This variety allows bioinformaticians and bioinformatic work to contribute to much (if not most) of life science research in profound ways. The multitude of bioinformatic work also translates into a multitude of credit-distribution arrangements, apparently dismissing that work. We report on the epistemic and social arrangements that characterise the relationship between bioinformatics and life science. We describe, in sociological terms, the character, power and future of bioinformatic work. The character of bioinformatic work is such that its cultural, institutional and technical structures allow for it to be black-boxed easily. The result is that bioinformatic expertise and contributions travel easily and quickly, yet remain largely uncredited. The power of bioinformatic work is shaped by its dependency on life science work, which combined with the black-boxed character of bioinformatic expertise further contributes to situating bioinformatics on the periphery of the life sciences. Finally, the imagined futures of bioinformatic work suggest that bioinformatics will become ever more indispensable without necessarily becoming more visible, forcing bioinformaticians into difficult professional and career choices. Bioinformatic expertise and labour is epistemically central but often institutionally peripheral. In part, this is a result of the ways in which the character, power distribution and potential futures of bioinformatics are constituted. However, alternative paths can be imagined.

  1. A novel pathway for the biosynthesis of heme in Archaea: genome-based bioinformatic predictions and experimental evidence.

    PubMed

    Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J; Jahn, Dieter; Layer, Gunhild

    2010-12-13

    Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d(1) biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified.

  2. Prognostic value of DNA repair based stratification of hepatocellular carcinoma

    PubMed Central

    Lin, Zhuo; Xu, Shi-Hao; Wang, Hai-Qing; Cai, Yi-Jing; Ying, Li; Song, Mei; Wang, Yu-Qun; Du, Shan-Jie; Shi, Ke-Qing; Zhou, Meng-Tao

    2016-01-01

    Aberrant activation of DNA repair is frequently associated with tumor progression and response to therapy in hepatocellular carcinoma (HCC). Bioinformatics analyses of HCC data in the Cancer Genome Atlas (TCGA) were performed to define DNA repair based molecular classification that could predict the prognosis of patients with HCC. Furthermore, we tested its predictive performance in 120 independent cases. Four molecular subgroups were identified on the basis of coordinate DNA repair cluster (CDRC) comprising 15 genes in TCGA dataset. Increasing expression of CDRC genes were significantly associated with TP53 mutation. High CDRC was significantly correlated with advanced tumor grades, advanced pathological stage and increased vascular invasion rate. Multivariate Cox regression analysis indicated that the molecular subgrouping was an independent prognostic parameter for both overall survival (p = 0.004, hazard ratio (HR): 2.989) and tumor-free survival (p = 0.049, HR: 3.366) in TCGA dataset. Similar results were also obtained by analyzing the independent cohort. These data suggest that distinct dysregulation of DNA repair constituents based molecular classes in HCC would be useful for predicting prognosis and designing clinical trials for targeted therapy. PMID:27174663

  3. Identification, cloning and sequencing of Escherichia coli strain chi1378 (O78:K80) iss gene isolated from poultry colibacillosis in Iran.

    PubMed

    Derakhshandeh, A; Zahraei Salehi, T; Tadjbakhsh, H; Karimi, V

    2009-09-01

    To identify, clone and sequence the iss (increased serum survival) gene from E. coli strain chi1378 isolated from Iranian poultry and to predict its protein product, Iss. The iss gene from E. coli strain chi1378 was amplified and cloned into the pTZ57R/T vector and sequenced. From the DNA sequence, the Iss predictive protein was evaluated using bioinformatics. Iss from strain chi1378 had 100% identity with other E. coli serotypes and isolates from different origins and also 98% identity with E. coli O157:H7 Iss protein. Phylogenetic analysis showed no significant different phylogenic groups among E. coli strains. The strong association of predicted Iss protein among different E. coli strains suggests that it could be a good antigen to control and detect avian pathogenic E. coli (APEC). Because the exact pathogenesis and the role of virulence factors are unknown, the Iss protein could be used as a target for vaccination in the future, but further research is required.

  4. A Novel Pathway for the Biosynthesis of Heme in Archaea: Genome-Based Bioinformatic Predictions and Experimental Evidence

    PubMed Central

    Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J.; Jahn, Dieter; Layer, Gunhild

    2010-01-01

    Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d 1 biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified. PMID:21197080

  5. ProMateus—an open research approach to protein-binding sites analysis

    PubMed Central

    Neuvirth, Hani; Heinemann, Uri; Birnbaum, David; Tishby, Naftali; Schreiber, Gideon

    2007-01-01

    The development of bioinformatic tools by individual labs results in the abundance of parallel programs for the same task. For example, identification of binding site regions between interacting proteins is done using: ProMate, WHISCY, PPI-Pred, PINUP and others. All servers first identify unique properties of binding sites and then incorporate them into a predictor. Obviously, the resulting prediction would improve if the most suitable parameters from each of those predictors would be incorporated into one server. However, because of the variation in methods and databases, this is currently not feasible. Here, the protein-binding site prediction server is extended into a general protein-binding sites research tool, ProMateus. This web tool, based on ProMate's infrastructure enables the easy exploration and incorporation of new features and databases by the user, providing an evaluation of the benefit of individual features and their combination within a set framework. This transforms the individual research into a community exercise, bringing out the best from all users for optimized predictions. The analysis is demonstrated on a database of protein protein and protein-DNA interactions. This approach is basically different from that used in generating meta-servers. The implications of the open-research approach are discussed. ProMateus is available at http://bip.weizmann.ac.il/promate. PMID:17488838

  6. NiaoDuQing granules relieve chronic kidney disease symptoms by decreasing renal fibrosis and anemia

    PubMed Central

    Wang, Xu; Yu, Suyun; Jia, Qi; Chen, Lichuan; Zhong, Jinqiu; Pan, Yanhong; Shen, Peiliang; Shen, Yin; Wang, Siliang; Wei, Zhonghong; Cao, Yuzhu; Lu, Yin

    2017-01-01

    NiaoDuQing (NDQ) granules, a traditional Chinese medicine, has been clinically used in China for over fourteen years to treat chronic kidney disease (CKD). To elucidate the mechanisms underlying the therapeutic benefits of NDQ, we designed an approach incorporating chemoinformatics, bioinformatics, network biology methods, and cellular and molecular biology experiments. A total of 182 active compounds were identified in NDQ granules, and 397 putative targets associated with different diseases were derived through ADME modelling and target prediction tools. Protein-protein interaction networks of CKD-related and putative NDQ targets were constructed, and 219 candidate targets were identified based on topological features. Pathway enrichment analysis showed that the candidate targets were mostly related to the TGF-β, the p38MAPK, and the erythropoietin (EPO) receptor signaling pathways, which are known contributors to renal fibrosis and/or renal anemia. A rat model of CKD was established to validate the drug-target mechanisms predicted by the systems pharmacology analysis. Experimental results confirmed that NDQ granules exerted therapeutic effects on CKD and its comorbidities, including renal anemia, mainly by modulating the TGF-β and EPO signaling pathways. Thus, the pharmacological actions of NDQ on CKD symptoms correlated well with in silico predictions. PMID:28915563

  7. Protein function prediction using neighbor relativity in protein-protein interaction network.

    PubMed

    Moosavi, Sobhan; Rahgozar, Masoud; Rahimi, Amir

    2013-04-01

    There is a large gap between the number of discovered proteins and the number of functionally annotated ones. Due to the high cost of determining protein function by wet-lab research, function prediction has become a major task for computational biology and bioinformatics. Some researches utilize the proteins interaction information to predict function for un-annotated proteins. In this paper, we propose a novel approach called "Neighbor Relativity Coefficient" (NRC) based on interaction network topology which estimates the functional similarity between two proteins. NRC is calculated for each pair of proteins based on their graph-based features including distance, common neighbors and the number of paths between them. In order to ascribe function to an un-annotated protein, NRC estimates a weight for each neighbor to transfer its annotation to the unknown protein. Finally, the unknown protein will be annotated by the top score transferred functions. We also investigate the effect of using different coefficients for various types of functions. The proposed method has been evaluated on Saccharomyces cerevisiae and Homo sapiens interaction networks. The performance analysis demonstrates that NRC yields better results in comparison with previous protein function prediction approaches that utilize interaction network. Copyright © 2012 Elsevier Ltd. All rights reserved.

  8. GAPIT: genome association and prediction integrated tool.

    PubMed

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  9. Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis

    PubMed Central

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-01-01

    Motivation: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman’s GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. Results: In Bellman’s GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman’s GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman’s GAP as an implementation platform of ‘real-world’ bioinformatics tools. Availability: Bellman’s GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics. Contact: robert@techfak.uni-bielefeld.de Supplementary information: Supplementary data are available at Bioinformatics online PMID:23355290

  10. Broad issues to consider for library involvement in bioinformatics*

    PubMed Central

    Geer, Renata C.

    2006-01-01

    Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662

  11. Open Reading Frame Phylogenetic Analysis on the Cloud

    PubMed Central

    2013-01-01

    Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships among Norovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members of Norovirus. PMID:23671843

  12. SoS Notebook: An Interactive Multi-Language Data Analysis Environment.

    PubMed

    Peng, Bo; Wang, Gao; Ma, Jun; Leong, Man Chong; Wakefield, Chris; Melott, James; Chiu, Yulun; Du, Di; Weinstein, John N

    2018-05-22

    Complex bioinformatic data analysis workflows involving multiple scripts in different languages can be difficult to consolidate, share, and reproduce. An environment that streamlines the entire processes of data collection, analysis, visualization and reporting of such multi-language analyses is currently lacking. We developed Script of Scripts (SoS) Notebook, a web-based notebook environment that allows the use of multiple scripting language in a single notebook, with data flowing freely within and across languages. SoS Notebook enables researchers to perform sophisticated bioinformatic analysis using the most suitable tools for different parts of the workflow, without the limitations of a particular language or complications of cross-language communications. SoS Notebook is hosted at http://vatlab.github.io/SoS/ and is distributed under a BSD license. bpeng@mdanderson.org.

  13. Extensive complementarity between gene function prediction methods.

    PubMed

    Vidulin, Vedrana; Šmuc, Tomislav; Supek, Fran

    2016-12-01

    The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them. The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  15. Bioinformatics: Current Practice and Future Challenges for Life Science Education

    ERIC Educational Resources Information Center

    Hack, Catherine; Kendall, Gary

    2005-01-01

    It is widely predicted that the application of high-throughput technologies to the quantification and identification of biological molecules will cause a paradigm shift in the life sciences. However, if the biosciences are to evolve from a predominantly descriptive discipline to an information science, practitioners will require enhanced skills in…

  16. Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

    PubMed

    Mørk, Søren; Holmes, Ian

    2012-03-01

    Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.

  17. Bioinformatic analyses to select phenotype affecting polymorphisms in HTR2C gene.

    PubMed

    Piva, Francesco; Giulietti, Matteo; Baldelli, Luisa; Nardi, Bernardo; Bellantuono, Cesario; Armeni, Tatiana; Saccucci, Franca; Principato, Giovanni

    2011-08-01

    Single nucleotide polymorphisms (SNPs) in serotonin related genes influence mental disorders, responses to pharmacological and psychotherapeutic treatments. In planning association studies, researchers that want to investigate new SNPs have to select some among a large number of candidates. Our aim is to guide researchers in the selection of the most likely phenotype affecting polymorphisms. Here, we studied serotonin receptor 2C (HTR2C) SNPs because, till now, only relatively few of about 2000 are investigated. We used the most updated and assessed bioinformatic tools to predict which variations can give rise to biological effects among 2450 HTR2C SNPs. We suggest 48 SNPs that are worth considering in future association studies in the field of psychiatry, psychology and pharmacogenomics. Moreover, our analyses point out the biological level probably affected, such as transcription, splicing, miRNA regulation and protein structure, thus allowing to suggest future molecular investigations. Although few association studies are available in literature, their results are in agreement with our predictions, showing that our selection methods can help to guide future association studies. Copyright © 2011 John Wiley & Sons, Ltd.

  18. Defining the Predicted Protein Secretome of the Fungal Wheat Leaf Pathogen Mycosphaerella graminicola

    PubMed Central

    Morais do Amaral, Alexandre; Antoniw, John; Rudd, Jason J.; Hammond-Kosack, Kim E.

    2012-01-01

    The Dothideomycete fungus Mycosphaerella graminicola is the causal agent of Septoria tritici blotch, a devastating disease of wheat leaves that causes dramatic decreases in yield. Infection involves an initial extended period of symptomless intercellular colonisation prior to the development of visible necrotic disease lesions. Previous functional genomics and gene expression profiling studies have implicated the production of secreted virulence effector proteins as key facilitators of the initial symptomless growth phase. In order to identify additional candidate virulence effectors, we re-analysed and catalogued the predicted protein secretome of M. graminicola isolate IPO323, which is currently regarded as the reference strain for this species. We combined several bioinformatic approaches in order to increase the probability of identifying truly secreted proteins with either a predicted enzymatic function or an as yet unknown function. An initial secretome of 970 proteins was predicted, whilst further stringent selection criteria predicted 492 proteins. Of these, 321 possess some functional annotation, the composition of which may reflect the strictly intercellular growth habit of this pathogen, leaving 171 with no functional annotation. This analysis identified a protein family encoding secreted peroxidases/chloroperoxidases (PF01328) which is expanded within all members of the family Mycosphaerellaceae. Further analyses were done on the non-annotated proteins for size and cysteine content (effector protein hallmarks), and then by studying the distribution of homologues in 17 other sequenced Dothideomycete fungi within an overall total of 91 predicted proteomes from fungal, oomycete and nematode species. This detailed M. graminicola secretome analysis provides the basis for further functional and comparative genomics studies. PMID:23236356

  19. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    NASA Technical Reports Server (NTRS)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  20. Practical applications of the bioinformatics toolbox for narrowing quantitative trait loci.

    PubMed

    Burgess-Herbert, Sarah L; Cox, Allison; Tsaih, Shirng-Wern; Paigen, Beverly

    2008-12-01

    Dissecting the genes involved in complex traits can be confounded by multiple factors, including extensive epistatic interactions among genes, the involvement of epigenetic regulators, and the variable expressivity of traits. Although quantitative trait locus (QTL) analysis has been a powerful tool for localizing the chromosomal regions underlying complex traits, systematically identifying the causal genes remains challenging. Here, through its application to plasma levels of high-density lipoprotein cholesterol (HDL) in mice, we demonstrate a strategy for narrowing QTL that utilizes comparative genomics and bioinformatics techniques. We show how QTL detected in multiple crosses are subjected to both combined cross analysis and haplotype block analysis; how QTL from one species are mapped to the concordant regions in another species; and how genomewide scans associating haplotype groups with their phenotypes can be used to prioritize the narrowed regions. Then we illustrate how these individual methods for narrowing QTL can be systematically integrated for mouse chromosomes 12 and 15, resulting in a significantly reduced number of candidate genes, often from hundreds to <10. Finally, we give an example of how additional bioinformatics resources can be combined with experiments to determine the most likely quantitative trait genes.

  1. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  2. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  3. A computational chemistry perspective on the current status and future direction of hepatitis B antiviral drug discovery.

    PubMed

    Morgnanesi, Dante; Heinrichs, Eric J; Mele, Anthony R; Wilkinson, Sean; Zhou, Suzanne; Kulp, John L

    2015-11-01

    Computational chemical biology, applied to research on hepatitis B virus (HBV), has two major branches: bioinformatics (statistical models) and first-principle methods (molecular physics). While bioinformatics focuses on statistical tools and biological databases, molecular physics uses mathematics and chemical theory to study the interactions of biomolecules. Three computational techniques most commonly used in HBV research are homology modeling, molecular docking, and molecular dynamics. Homology modeling is a computational simulation to predict protein structure and has been used to construct conformers of the viral polymerase (reverse transcriptase domain and RNase H domain) and the HBV X protein. Molecular docking is used to predict the most likely orientation of a ligand when it is bound to a protein, as well as determining an energy score of the docked conformation. Molecular dynamics is a simulation that analyzes biomolecule motions and determines conformation and stability patterns. All of these modeling techniques have aided in the understanding of resistance mutations on HBV non-nucleos(t)ide reverse-transcriptase inhibitor binding. Finally, bioinformatics can be used to study the DNA and RNA protein sequences of viruses to both analyze drug resistance and to genotype the viral genomes. Overall, with these techniques, and others, computational chemical biology is becoming more and more necessary in hepatitis B research. This article forms part of a symposium in Antiviral Research on "An unfinished story: from the discovery of the Australia antigen to the development of new curative therapies for hepatitis B." Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Draft genome sequence of non-shiga toxin-producing Escherichia coli O157 NCCP15738.

    PubMed

    Kwon, Taesoo; Kim, Jung-Beom; Bak, Young-Seok; Yu, Young-Bin; Kwon, Ki Sung; Kim, Won; Cho, Seung-Hak

    2016-01-01

    The non-shiga toxin-producing Escherichia coli (non-STEC) O157 is a pathogenic strain that cause diarrhea but does not cause hemolytic-uremic syndrome, or hemorrhagic colitis. Here, we present the 5-Mb draft genome sequence of non-STEC O157 NCCP15738, which was isolated from the feces of a Korean patient with diarrhea, and describe its features and the structural basis for its genome evolution. A total of 565-Mbp paired-end reads were generated using the Illumina-HiSeq 2000 platform. The reads were assembled into 135 scaffolds throughout the de novo assembly. The assembled genome size of NCCP15738 was 5,005,278 bp with an N50 value of 142,450 bp and 50.65 % G+C content. Using Rapid Annotation using Subsystem Technology analysis, we predicted 4780 ORFs and 31 RNA genes. The evolutionary tree was inferred from multiple sequence alignment of 45 E. coli species. The most closely related neighbor of NCCP15738 indicated by whole-genome phylogeny was E. coli UMNK88, but that indicated by multilocus sequence analysis was E. coli DH1(ME8569). A comparison between the NCCP15738 genome and those of reference strains, E. coli K-12 substr. MG1655 and EHEC O157:H7 EDL933 by bioinformatics analyses revealed unique genes in NCCP15738 associated with lysis protein S, two-component signal transduction system, conjugation, the flagellum, nucleotide-binding proteins, and metal-ion binding proteins. Notably, NCCP15738 has a dual flagella system like that in Vibrio parahaemolyticus, Aeromonas spp., and Rhodospirillum centenum. The draft genome sequence and the results of bioinformatics analysis of NCCP15738 provide the basis for understanding the genomic evolution of this strain.

  5. Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures.

    PubMed

    Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka

    2015-11-01

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety. 2015 FRAME.

  6. Differential protein-coding gene and long noncoding RNA expression in smoking-related lung squamous cell carcinoma.

    PubMed

    Li, Shicheng; Sun, Xiao; Miao, Shuncheng; Liu, Jia; Jiao, Wenjie

    2017-11-01

    Cigarette smoking is one of the greatest preventable risk factors for developing cancer, and most cases of lung squamous cell carcinoma (lung SCC) are associated with smoking. The pathogenesis mechanism of tumor progress is unclear. This study aimed to identify biomarkers in smoking-related lung cancer, including protein-coding gene, long noncoding RNA, and transcription factors. We selected and obtained messenger RNA microarray datasets and clinical data from the Gene Expression Omnibus database to identify gene expression altered by cigarette smoking. Integrated bioinformatic analysis was used to clarify biological functions of the identified genes, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the construction of a protein-protein interaction network, transcription factor, and statistical analyses. Subsequent quantitative real-time PCR was utilized to verify these bioinformatic analyses. Five hundred and ninety-eight differentially expressed genes and 21 long noncoding RNA were identified in smoking-related lung SCC. GO and KEGG pathway analysis showed that identified genes were enriched in the cancer-related functions and pathways. The protein-protein interaction network revealed seven hub genes identified in lung SCC. Several transcription factors and their binding sites were predicted. The results of real-time quantitative PCR revealed that AURKA and BIRC5 were significantly upregulated and LINC00094 was downregulated in the tumor tissues of smoking patients. Further statistical analysis indicated that dysregulation of AURKA, BIRC5, and LINC00094 indicated poor prognosis in lung SCC. Protein-coding genes AURKA, BIRC5, and LINC00094 could be biomarkers or therapeutic targets for smoking-related lung SCC. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.

  7. Carbohydrate-active enzymes in Trichoderma harzianum: a bioinformatic analysis bioprospecting for key enzymes for the biofuels industry.

    PubMed

    Ferreira Filho, Jaire Alves; Horta, Maria Augusta Crivelente; Beloti, Lilian Luzia; Dos Santos, Clelton Aparecido; de Souza, Anete Pereira

    2017-10-12

    Trichoderma harzianum is used in biotechnology applications due to its ability to produce powerful enzymes for the conversion of lignocellulosic substrates into soluble sugars. Active enzymes involved in carbohydrate metabolism are defined as carbohydrate-active enzymes (CAZymes), and the most abundant family in the CAZy database is the glycoside hydrolases. The enzymes of this family play a fundamental role in the decomposition of plant biomass. In this study, the CAZymes of T. harzianum were identified and classified using bioinformatic approaches after which the expression profiles of all annotated CAZymes were assessed via RNA-Seq, and a phylogenetic analysis was performed. A total of 430 CAZymes (3.7% of the total proteins for this organism) were annotated in T. harzianum, including 259 glycoside hydrolases (GHs), 101 glycosyl transferases (GTs), 6 polysaccharide lyases (PLs), 22 carbohydrate esterases (CEs), 42 auxiliary activities (AAs) and 46 carbohydrate-binding modules (CBMs). Among the identified T. harzianum CAZymes, 47% were predicted to harbor a signal peptide sequence and were therefore classified as secreted proteins. The GH families were the CAZyme class with the greatest number of expressed genes, including GH18 (23 genes), GH3 (17 genes), GH16 (16 genes), GH2 (13 genes) and GH5 (12 genes). A phylogenetic analysis of the proteins in the AA9/GH61, CE5 and GH55 families showed high functional variation among the proteins. Identifying the main proteins used by T. harzianum for biomass degradation can ensure new advances in the biofuel production field. Herein, we annotated and characterized the expression levels of all of the CAZymes from T. harzianum, which may contribute to future studies focusing on the functional and structural characterization of the identified proteins.

  8. Integrated web visualizations for protein-protein interaction databases.

    PubMed

    Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas

    2015-06-16

    Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.

  9. Some of the most interesting CASP11 targets through the eyes of their authors

    PubMed Central

    Kryshtafovych, Andriy; Moult, John; Baslé, Arnaud; Burgin, Alex; Craig, Timothy K.; Edwards, Robert A.; Fass, Deborah; Hartmann, Marcus D.; Korycinski, Mateusz; Lewis, Richard J.; Lorimer, Donald; Lupas, Andrei N.; Newman, Janet; Peat, Thomas S.; Piepenbrink, Kurt H.; Prahlad, Janani; van Raaij, Mark J.; Rohwer, Forest; Segall, Anca M.; Seguritan, Victor; Sundberg, Eric J.; Singh, Abhimanyu K.; Wilson, Mark A.

    2015-01-01

    ABSTRACT The Critical Assessment of protein Structure Prediction (CASP) experiment would not have been possible without the prediction targets provided by the experimental structural biology community. In this article, selected crystallographers providing targets for the CASP11 experiment discuss the functional and biological significance of the target proteins, highlight their most interesting structural features, and assess whether these features were correctly reproduced in the predictions submitted to CASP11. Proteins 2016; 84(Suppl 1):34–50. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26473983

  10. Model-driven discovery of underground metabolic functions in Escherichia coli.

    PubMed

    Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M

    2015-01-20

    Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.

  11. Myocyte enhancer factor 2D provides a cross-talk between chronic inflammation and lung cancer.

    PubMed

    Zhu, Hai-Xing; Shi, Lin; Zhang, Yong; Zhu, Yi-Chun; Bai, Chun-Xue; Wang, Xiang-Dong; Zhou, Jie-Bai

    2017-03-24

    Lung cancer is the leading cause of cancer-related morbidity and mortality worldwide. Patients with chronic respiratory diseases, such as chronic obstructive pulmonary disease (COPD), are exposed to a higher risk of developing lung cancer. Chronic inflammation may play an important role in the lung carcinogenesis among those patients. The present study aimed at identifying candidate biomarker predicting lung cancer risk among patients with chronic respiratory diseases. We applied clinical bioinformatics tools to analyze different gene profile datasets with a special focus on screening the potential biomarker during chronic inflammation-lung cancer transition. Then we adopted an in vitro model based on LPS-challenged A549 cells to validate the biomarker through RNA-sequencing, quantitative real time polymerase chain reaction, and western blot analysis. Bioinformatics analyses of the 16 enrolled GSE datasets from Gene Expression Omnibus online database showed myocyte enhancer factor 2D (MEF2D) level significantly increased in COPD patients coexisting non-small-cell lung carcinoma (NSCLC). Inflammation challenge increased MEF2D expression in NSCLC cell line A549, associated with the severity of inflammation. Extracellular signal-regulated protein kinase inhibition could reverse the up-regulation of MEF2D in inflammation-activated A549. MEF2D played a critical role in NSCLC cell bio-behaviors, including proliferation, differentiation, and movement. Inflammatory conditions led to increased MEF2D expression, which might further contribute to the development of lung cancer through influencing cancer microenvironment and cell bio-behaviors. MEF2D might be a potential biomarker during chronic inflammation-lung cancer transition, predicting the risk of lung cancer among patients with chronic respiratory diseases.

  12. A Novel Framework for the Comparative Analysis of Biological Networks

    PubMed Central

    Pache, Roland A.; Aloy, Patrick

    2012-01-01

    Genome sequencing projects provide nearly complete lists of the individual components present in an organism, but reveal little about how they work together. Follow-up initiatives have deciphered thousands of dynamic and context-dependent interrelationships between gene products that need to be analyzed with novel bioinformatics approaches able to capture their complex emerging properties. Here, we present a novel framework for the alignment and comparative analysis of biological networks of arbitrary topology. Our strategy includes the prediction of likely conserved interactions, based on evolutionary distances, to counter the high number of missing interactions in the current interactome networks, and a fast assessment of the statistical significance of individual alignment solutions, which vastly increases its performance with respect to existing tools. Finally, we illustrate the biological significance of the results through the identification of novel complex components and potential cases of cross-talk between pathways and alternative signaling routes. PMID:22363585

  13. A mRNA and cognate microRNAs localize in the nucleolus.

    PubMed

    Reyes-Gutierrez, Pablo; Ritland Politz, Joan C; Pederson, Thoru

    2014-01-01

    We previously discovered that a set of 5 microRNAs are concentrated in the nucleolus of rat myoblasts. We now report that several mRNAs are also localized in the nucleoli of these cells as determined by microarray analysis of RNA from purified nucleoli. Among the most abundant of these nucleolus-localized mRNAs is that encoding insulin-like growth factor 2 (IGF2), a regulator of myoblast proliferation and differentiation. The presence of IGF2 mRNA in nucleoli was confirmed by fluorescence in situ hybridization, and RT-PCR experiments demonstrated that these nucleolar transcripts are spliced, thus arriving from the nucleoplasm. Bioinformatics analysis predicted canonically structured, highly thermodynamically stable interactions between IGF2 mRNA and all 5 of the nucleolus-localized microRNAs. These results raise the possibility that the nucleolus is a staging site for setting up particular mRNA-microRNA interactions prior to export to the cytoplasm.

  14. modPDZpep: a web resource for structure based analysis of human PDZ-mediated interaction networks.

    PubMed

    Sain, Neetu; Mohanty, Debasisa

    2016-09-21

    PDZ domains recognize short sequence stretches usually present in C-terminal of their interaction partners. Because of the involvement of PDZ domains in many important biological processes, several attempts have been made for developing bioinformatics tools for genome-wide identification of PDZ interaction networks. Currently available tools for prediction of interaction partners of PDZ domains utilize machine learning approach. Since, they have been trained using experimental substrate specificity data for specific PDZ families, their applicability is limited to PDZ families closely related to the training set. These tools also do not allow analysis of PDZ-peptide interaction interfaces. We have used a structure based approach to develop modPDZpep, a program to predict the interaction partners of human PDZ domains and analyze structural details of PDZ interaction interfaces. modPDZpep predicts interaction partners by using structural models of PDZ-peptide complexes and evaluating binding energy scores using residue based statistical pair potentials. Since, it does not require training using experimental data on peptide binding affinity, it can predict substrates for diverse PDZ families. Because of the use of simple scoring function for binding energy, it is also fast enough for genome scale structure based analysis of PDZ interaction networks. Benchmarking using artificial as well as real negative datasets indicates good predictive power with ROC-AUC values in the range of 0.7 to 0.9 for a large number of human PDZ domains. Another novel feature of modPDZpep is its ability to map novel PDZ mediated interactions in human protein-protein interaction networks, either by utilizing available experimental phage display data or by structure based predictions. In summary, we have developed modPDZpep, a web-server for structure based analysis of human PDZ domains. It is freely available at http://www.nii.ac.in/modPDZpep.html or http://202.54.226.235/modPDZpep.html . This article was reviewed by Michael Gromiha and Zoltán Gáspári.

  15. NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence

    PubMed Central

    Nielsen, Morten; Lundegaard, Claus; Blicher, Thomas; Lamberth, Kasper; Harndahl, Mikkel; Justesen, Sune; Røder, Gustav; Peters, Bjoern; Sette, Alessandro; Lund, Ole; Buus, Søren

    2007-01-01

    Background Binding of peptides to Major Histocompatibility Complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC class I system (HLA-I) is extremely polymorphic. The number of registered HLA-I molecules has now surpassed 1500. Characterizing the specificity of each separately would be a major undertaking. Principal Findings Here, we have drawn on a large database of known peptide-HLA-I interactions to develop a bioinformatics method, which takes both peptide and HLA sequence information into account, and generates quantitative predictions of the affinity of any peptide-HLA-I interaction. Prospective experimental validation of peptides predicted to bind to previously untested HLA-I molecules, cross-validation, and retrospective prediction of known HIV immune epitopes and endogenous presented peptides, all successfully validate this method. We further demonstrate that the method can be applied to perform a clustering analysis of MHC specificities and suggest using this clustering to select particularly informative novel MHC molecules for future biochemical and functional analysis. Conclusions Encompassing all HLA molecules, this high-throughput computational method lends itself to epitope searches that are not only genome- and pathogen-wide, but also HLA-wide. Thus, it offers a truly global analysis of immune responses supporting rational development of vaccines and immunotherapy. It also promises to provide new basic insights into HLA structure-function relationships. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan. PMID:17726526

  16. mORCA: sailing bioinformatics world with mobile devices.

    PubMed

    Díaz-Del-Pino, Sergio; Falgueras, Juan; Perez-Wohlfeil, Esteban; Trelles, Oswaldo

    2018-03-01

    Nearly 10 years have passed since the first mobile apps appeared. Given the fact that bioinformatics is a web-based world and that mobile devices are endowed with web-browsers, it seemed natural that bioinformatics would transit from personal computers to mobile devices but nothing could be further from the truth. The transition demands new paradigms, designs and novel implementations. Throughout an in-depth analysis of requirements of existing bioinformatics applications we designed and deployed an easy-to-use web-based lightweight mobile client. Such client is able to browse, select, compose automatically interface parameters, invoke services and monitor the execution of Web Services using the service's metadata stored in catalogs or repositories. mORCA is available at http://bitlab-es.com/morca/app as a web-app. It is also available in the App store by Apple and Play Store by Google. The software will be available for at least 2 years. ortrelles@uma.es. Source code, final web-app, training material and documentation is available at http://bitlab-es.com/morca. © The Author(s) 2017. Published by Oxford University Press.

  17. A primer to frequent itemset mining for bioinformatics

    PubMed Central

    Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart

    2015-01-01

    Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173

  18. p3d--Python module for structural bioinformatics.

    PubMed

    Fufezan, Christian; Specht, Michael

    2009-08-21

    High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.

  19. Best practices in bioinformatics training for life scientists.

    PubMed

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K

    2013-09-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.

  20. Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.

    PubMed

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-03-01

    Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.

  1. Best practices in bioinformatics training for life scientists

    PubMed Central

    Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D.; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L.; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C.; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K.

    2013-01-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301

  2. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  3. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  4. Serum proteome profiling in canine idiopathic dilated cardiomyopathy using TMT-based quantitative proteomics approach.

    PubMed

    Bilić, Petra; Guillemin, Nicolas; Kovačević, Alan; Beer Ljubić, Blanka; Jović, Ines; Galan, Asier; Eckersall, Peter David; Burchmore, Richard; Mrljak, Vladimir

    2018-05-15

    Idiopathic dilated cardiomyopathy (iDCM) is a primary myocardial disorder with an unknown aetiology, characterized by reduced contractility and ventricular dilation of the left or both ventricles. Naturally occurring canine iDCM was used herein to identify serum proteomic signature of the disease compared to the healthy state, providing an insight into underlying mechanisms and revealing proteins with biomarker potential. To achieve this, we used high-throughput label-based quantitative LC-MS/MS proteomics approach and bioinformatics analysis of the in silico inferred interactome protein network created from the initial list of differential proteins. To complement the proteomic analysis, serum biochemical parameters and levels of know biomarkers of cardiac function were measured. Several proteins with biomarker potential were identified, such as inter-alpha-trypsin inhibitor heavy chain H4, microfibril-associated glycoprotein 4 and apolipoprotein A-IV, which were validated using an independent method (Western blotting) and showed high specificity and sensitivity according to the receiver operating characteristic curve analysis. Bioinformatics analysis revealed involvement of different pathways in iDCM, such as complement cascade activation, lipoprotein particles dynamics, elastic fibre formation, GPCR signalling and respiratory electron transport chain. Idiopathic dilated cardiomyopathy is a severe primary myocardial disease of unknown cause, affecting both humans and dogs. This study is a contribution to the canine heart disease research by means of proteomic and bioinformatic state of the art analyses, following similar approach in human iDCM research. Importantly, we used serum as non-invasive and easily accessible biological source of information and contributed to the scarce data on biofluid proteome research on this topic. Bioinformatics analysis revealed biological pathways modulated in canine iDCM with potential of further targeted research. Also, several proteins with biomarker potential have been identified and successfully validated. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

    PubMed

    Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-07-01

    Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.

  6. Prediction of Drug-Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures.

    PubMed

    Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Zhou, Yong; An, Ji-Yong

    2017-07-05

    Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.

  7. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

    PubMed Central

    Peña-Castillo, Lourdes; Tasan, Murat; Myers, Chad L; Lee, Hyunju; Joshi, Trupti; Zhang, Chao; Guan, Yuanfang; Leone, Michele; Pagnani, Andrea; Kim, Wan Kyu; Krumpelman, Chase; Tian, Weidong; Obozinski, Guillaume; Qi, Yanjun; Mostafavi, Sara; Lin, Guan Ning; Berriz, Gabriel F; Gibbons, Francis D; Lanckriet, Gert; Qiu, Jian; Grant, Charles; Barutcuoglu, Zafer; Hill, David P; Warde-Farley, David; Grouios, Chris; Ray, Debajyoti; Blake, Judith A; Deng, Minghua; Jordan, Michael I; Noble, William S; Morris, Quaid; Klein-Seetharaman, Judith; Bar-Joseph, Ziv; Chen, Ting; Sun, Fengzhu; Troyanskaya, Olga G; Marcotte, Edward M; Xu, Dong; Hughes, Timothy R; Roth, Frederick P

    2008-01-01

    Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized. PMID:18613946

  8. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  9. Towards a career in bioinformatics.

    PubMed

    Ranganathan, Shoba

    2009-12-03

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010.

  10. Role of miR-1 expression in clear cell renal cell carcinoma (ccRCC): A bioinformatics study based on GEO, ArrayExpress microarrays and TCGA database.

    PubMed

    Yan, Hai-Biao; Huang, Jia-Cheng; Chen, You-Rong; Yao, Jian-Ni; Cen, Wei-Ning; Li, Jia-Yi; Jiang, Yi-Fan; Chen, Gang; Li, Sheng-Hua

    2018-02-01

    To investigate the clinical value and potential molecular mechanisms of miR-1 in clear cell renal cell carcinoma (ccRCC). We searched the Gene Expression Omnibus (GEO), ArrayExpress, several online publication databases and the Cancer Genome Atlas (TCGA). Continuous variable meta-analysis and diagnostic meta-analysis were conducted, both in Stata 14, to show the expression of miR-1 in ccRCC. Furthermore, we acquired the potential targets of miR-1 from datasets that transfected miR-1 into ccRCC cells, online prediction databases, differentially expressed genes from TCGA and literature. Subsequently bioinformatics analysis based on aforementioned selected target genes was conducted. The combined effect was -0.92 with the 95% confidence interval (CI) of -1.08 to -0.77 based on fixed effect model (I 2  = 81.3%, P < 0.001). No publication bias was found in our investigation. Sensitivity analysis showed that GSE47582 and 2 TCGA studies might cause heterogeneity. After eliminating them, the combined effect was -0.47 (95%CI: -0.78, -0.16) with I 2  = 18.3%. As for the diagnostic meta-analysis, the combined sensitivity and specificity were 0.90 (95%CI: 0.61, 0.98) and 0.63 (95%CI: 0.39, 0.82). The area under the curve (AUC) in the summarized receiver operating characteristic (SROC) curve was 0.83 (95%CI: 0.80, 0.86). No publication bias was found (P = 0.15). We finally got 67 genes which were defined the promising target genes of miR-1 in ccRCC. The most three significant KEGG pathways based on the aforementioned genes were Complement and coagulation cascades, ECM-receptor interaction and Focal adhesion. The downregulation of miR-1 might play an important role in ccRCC by targeting its target genes. Copyright © 2017 Elsevier GmbH. All rights reserved.

  11. A new genome-mining tool redefines the lasso peptide biosynthetic landscape

    PubMed Central

    Tietz, Jonathan I.; Schwalen, Christopher J.; Patel, Parth S.; Maxson, Tucker; Blair, Patricia M.; Tai, Hua-Chia; Zakai, Uzma I.; Mitchell, Douglas A.

    2016-01-01

    Ribosomally synthesized and post-translationally modified peptide (RiPP) natural products are attractive for genome-driven discovery and re-engineering, but limitations in bioinformatic methods and exponentially increasing genomic data make large-scale mining difficult. We report RODEO (Rapid ORF Description and Evaluation Online), which combines hidden Markov model-based analysis, heuristic scoring, and machine learning to identify biosynthetic gene clusters and predict RiPP precursor peptides. We initially focused on lasso peptides, which display intriguing physiochemical properties and bioactivities, but their hypervariability renders them challenging prospects for automated mining. Our approach yielded the most comprehensive mapping of lasso peptide space, revealing >1,300 compounds. We characterized the structures and bioactivities of six lasso peptides, prioritized based on predicted structural novelty, including an unprecedented handcuff-like topology and another with a citrulline modification exceptionally rare among bacteria. These combined insights significantly expand the knowledge of lasso peptides, and more broadly, provide a framework for future genome-mining efforts. PMID:28244986

  12. Using Bioinformatics Approach to Explore the Pharmacological Mechanisms of Multiple Ingredients in Shuang-Huang-Lian

    PubMed Central

    Zhang, Bai-xia; Li, Jian; Gu, Hao; Li, Qiang; Zhang, Qi; Zhang, Tian-jiao; Wang, Yun; Cai, Cheng-ke

    2015-01-01

    Due to the proved clinical efficacy, Shuang-Huang-Lian (SHL) has developed a variety of dosage forms. However, the in-depth research on targets and pharmacological mechanisms of SHL preparations was scarce. In the presented study, the bioinformatics approaches were adopted to integrate relevant data and biological information. As a result, a PPI network was built and the common topological parameters were characterized. The results suggested that the PPI network of SHL exhibited a scale-free property and modular architecture. The drug target network of SHL was structured with 21 functional modules. According to certain modules and pharmacological effects distribution, an antitumor effect and potential drug targets were predicted. A biological network which contained 26 subnetworks was constructed to elucidate the antipneumonia mechanism of SHL. We also extracted the subnetwork to explicitly display the pathway where one effective component acts on the pneumonia related targets. In conclusions, a bioinformatics approach was established for exploring the drug targets, pharmacological activity distribution, effective components of SHL, and its mechanism of antipneumonia. Above all, we identified the effective components and disclosed the mechanism of SHL from the view of system. PMID:26495421

  13. Pathway mapping and development of disease-specific biomarkers: protein-based network biomarkers

    PubMed Central

    Chen, Hao; Zhu, Zhitu; Zhu, Yichun; Wang, Jian; Mei, Yunqing; Cheng, Yunfeng

    2015-01-01

    It is known that a disease is rarely a consequence of an abnormality of a single gene, but reflects the interactions of various processes in a complex network. Annotated molecular networks offer new opportunities to understand diseases within a systems biology framework and provide an excellent substrate for network-based identification of biomarkers. The network biomarkers and dynamic network biomarkers (DNBs) represent new types of biomarkers with protein–protein or gene–gene interactions that can be monitored and evaluated at different stages and time-points during development of disease. Clinical bioinformatics as a new way to combine clinical measurements and signs with human tissue-generated bioinformatics is crucial to translate biomarkers into clinical application, validate the disease specificity, and understand the role of biomarkers in clinical settings. In this article, the recent advances and developments on network biomarkers and DNBs are comprehensively reviewed. How network biomarkers help a better understanding of molecular mechanism of diseases, the advantages and constraints of network biomarkers for clinical application, clinical bioinformatics as a bridge to the development of diseases-specific, stage-specific, severity-specific and therapy predictive biomarkers, and the potentials of network biomarkers are also discussed. PMID:25560835

  14. Current and future methods for evaluating the allergenic potential of proteins: international workshop report 23-25 October 2007.

    PubMed

    Thomas, Karluss; Herouet-Guicheney, Corinne; Ladics, Gregory; McClain, Scott; MacIntosh, Susan; Privalle, Laura; Woolhiser, Mike

    2008-09-01

    The International Life Science Institute's Health and Environmental Sciences Institute's Protein Allergenicity Technical Committee hosted an international workshop October 23-25, 2007, in Nice, France, to review and discuss existing and emerging methods and techniques for improving the current weight-of-evidence approach for evaluating the potential allergenicity of novel proteins. The workshop included over 40 international experts from government, industry, and academia. Their expertise represented a range of disciplines including immunology, chemistry, molecular biology, bioinformatics, and toxicology. Among participants, there was consensus that (1) current bioinformatic approaches are highly conservative; (2) advances in bioinformatics using structural comparisons of proteins may be helpful as the availability of structural data increases; (3) proteomics may prove useful for monitoring the natural variability in a plant's proteome and assessing the impact of biotechnology transformations on endogenous levels of allergens, but only when analytical techniques have been standardized and additional data are available on the natural variation of protein expression in non-transgenic bred plants; (4) basophil response assays are promising techniques, but need additional evaluation around specificity, sensitivity, and reproducibility; (5) additional research is required to develop and validate an animal model for the purpose of predicting protein allergenicity.

  15. Transcriptome Analysis of Human Immune Responses Following Live Vaccine Strain (LVS) Francisella Tularensis Vaccination

    DTIC Science & Technology

    2007-03-08

    with CD3D 50848 PAR1/UBE3A Prader–Willi syndrome chromosome region 1, GMCSFRalpha precursor, IL3Ralpha precursor (CD123) Brain development...intervention programs justifiable? Emerg. Infect. Dis. 3, 83–94. iebel, U., Kindler , B., Pepperkok, R., 2004. ‘Harvester’: a fast meta search engine of human...protein resources. Bioinformatics 20, 1962–1963. iebel, U., Kindler , B., Pepperkok, R., 2005. Bioinformatic “Harvester”: a search engine for genome

  16. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  17. Scalable computing for evolutionary genomics.

    PubMed

    Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert

    2012-01-01

    Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.

  18. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  19. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.

    PubMed

    Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole

    2016-01-01

    Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.

  20. IsomiR Bank: a research resource for tracking IsomiRs.

    PubMed

    Zhang, Yuanwei; Zang, Qiguang; Xu, Bo; Zheng, Wei; Ban, Rongjun; Zhang, Huan; Yang, Yifan; Hao, Qiaomei; Iqbal, Furhan; Li, Ao; Shi, Qinghua

    2016-07-01

    : Next-Generation Sequencing (NGS) technology has revealed that microRNAs (miRNAs) are capable of exhibiting frequent differences from their corresponding mature reference sequences, generating multiple variants: the isoforms of miRNAs (isomiRs). These isomiRs mainly originate via the imprecise and alternative cleavage during the pre-miRNA processing and post-transcriptional modifications that influence miRNA stability, their sub-cellular localization and target selection. Although several tools for the identification of isomiR have been reported, no bioinformatics resource dedicated to gather isomiRs from public NGS data and to provide functional analysis of these isomiRs is available to date. Thus, a free online database, IsomiR Bank has been created to integrate isomiRs detected by our previously published algorithm CPSS. In total, 2727 samples (Small RNA NGS data downloaded from ArrayExpress) from eight species (Arabidopsis thaliana, Drosophila melanogaster, Danio rerio, Homo sapiens, Mus musculus, Oryza sativa, Solanum lycopersicum and Zea mays) are analyzed. At present, 308 919 isomiRs from 4706 mature miRNAs are collected into IsomiR Bank. In addition, IsomiR Bank provides target prediction and enrichment analysis to evaluate the effects of isomiRs on target selection. IsomiR Bank is implemented in PHP/PERL + MySQL + R format and can be freely accessed at http://mcg.ustc.edu.cn/bsc/isomir/ : aoli@ustc.edu.cn or qshi@ustc.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Top