Science.gov

Sample records for addition bioinformatics analysis

  1. Bioinformatics in protein analysis.

    PubMed

    Persson, B

    2000-01-01

    The chapter gives an overview of bioinformatic techniques of importance in protein analysis. These include database searches, sequence comparisons and structural predictions. Links to useful World Wide Web (WWW) pages are given in relation to each topic. Databases with biological information are reviewed with emphasis on databases for nucleotide sequences (EMBL, GenBank, DDBJ), genomes, amino acid sequences (Swissprot, PIR, TrEMBL, GenePept), and three-dimensional structures (PDB). Integrated user interfaces for databases (SRS and Entrez) are described. An introduction to databases of sequence patterns and protein families is also given (Prosite, Pfam, Blocks). Furthermore, the chapter describes the widespread methods for sequence comparisons, FASTA and BLAST, and the corresponding WWW services. The techniques involving multiple sequence alignments are also reviewed: alignment creation with the Clustal programs, phylogenetic tree calculation with the Clustal or Phylip packages and tree display using Drawtree, njplot or phylo_win. Finally, the chapter also treats the issue of structural prediction. Different methods for secondary structure predictions are described (Chou-Fasman, Garnier-Osguthorpe-Robson, Predator, PHD). Techniques for predicting membrane proteins, antigenic sites and postranslational modifications are also reviewed. PMID:10803381

  2. KDE Bioscience: platform for bioinformatics analysis workflows.

    PubMed

    Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue

    2006-08-01

    Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research. PMID:16260186

  3. High-throughput protein analysis integrating bioinformatics and experimental assays.

    PubMed

    del Val, Coral; Mehrle, Alexander; Falkenhahn, Mechthild; Seiler, Markus; Glatting, Karl-Heinz; Poustka, Annemarie; Suhai, Sandor; Wiemann, Stefan

    2004-01-01

    The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins. PMID:14762202

  4. CAPweb: a bioinformatics CGH array Analysis Platform.

    PubMed

    Liva, Stéphane; Hupé, Philippe; Neuvial, Pierre; Brito, Isabel; Viara, Eric; La Rosa, Philippe; Barillot, Emmanuel

    2006-07-01

    Assessing variations in DNA copy number is crucial for understanding constitutional or somatic diseases, particularly cancers. The recently developed array-CGH (comparative genomic hybridization) technology allows this to be investigated at the genomic level. We report the availability of a web tool for analysing array-CGH data. CAPweb (CGH array Analysis Platform on the Web) is intended as a user-friendly tool enabling biologists to completely analyse CGH arrays from the raw data to the visualization and biological interpretation. The user typically performs the following bioinformatics steps of a CGH array project within CAPweb: the secure upload of the results of CGH array image analysis and of the array annotation (genomic position of the probes); first level analysis of each array, including automatic normalization of the data (for correcting experimental biases), breakpoint detection and status assignment (gain, loss or normal); validation or deletion of the analysis based on a summary report and quality criteria; visualization and biological analysis of the genomic profiles and results through a user-friendly interface. CAPweb is accessible at http://bioinfo.curie.fr/CAPweb. PMID:16845053

  5. Bioinformatics Analysis of Estrogen-Responsive Genes.

    PubMed

    Handel, Adam E

    2016-01-01

    Estrogen is a steroid hormone that plays critical roles in a myriad of intracellular pathways. The expression of many genes is regulated through the steroid hormone receptors ESR1 and ESR2. These bind to DNA and modulate the expression of target genes. Identification of estrogen target genes is greatly facilitated by the use of transcriptomic methods, such as RNA-seq and expression microarrays, and chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq). Combining transcriptomic and ChIP-seq data enables a distinction to be drawn between direct and indirect estrogen target genes. This chapter discusses some methods of identifying estrogen target genes that do not require any expertise in programming languages or complex bioinformatics. PMID:26585125

  6. Bioinformatic analysis of expression data to identify effector candidates.

    PubMed

    Reid, Adam J; Jones, John T

    2014-01-01

    Pathogens produce effectors that manipulate the host to the benefit of the pathogen. These effectors are often secreted proteins that are upregulated during the early phases of infection. These properties can be used to identify candidate effectors from genomes and transcriptomes of pathogens. Here we describe commonly used bioinformatic approaches that (1) allow identification of genes encoding predicted secreted proteins within a genome and (2) allow the identification of genes encoding predicted secreted proteins that are upregulated at important stages of the life cycle. Other approaches for bioinformatic identification of effector candidates, including OrthoMCL analysis to identify expanded gene families, are also described. PMID:24643549

  7. Design and bioinformatics analysis of genome-wide CLIP experiments

    PubMed Central

    Wang, Tao; Xiao, Guanghua; Chu, Yongjun; Zhang, Michael Q.; Corey, David R.; Xie, Yang

    2015-01-01

    The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses. PMID:25958398

  8. Bioinformatics and Microarray Data Analysis on the Cloud.

    PubMed

    Calabrese, Barbara; Cannataro, Mario

    2016-01-01

    High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data. PMID:25863787

  9. Biochip microsystem for bioinformatics recognition and analysis

    NASA Technical Reports Server (NTRS)

    Lue, Jaw-Chyng (Inventor); Fang, Wai-Chi (Inventor)

    2011-01-01

    A system with applications in pattern recognition, or classification, of DNA assay samples. Because DNA reference and sample material in wells of an assay may be caused to fluoresce depending upon dye added to the material, the resulting light may be imaged onto an embodiment comprising an array of photodetectors and an adaptive neural network, with applications to DNA analysis. Other embodiments are described and claimed.

  10. Serial analysis of gene expression (SAGE): unraveling the bioinformatics tools.

    PubMed

    Tuteja, Renu; Tuteja, Narendra

    2004-08-01

    Serial analysis of gene expression (SAGE) is a powerful technique that can be used for global analysis of gene expression. Its chief advantage over other methods is that it does not require prior knowledge of the genes of interest and provides qualitative and quantitative data of potentially every transcribed sequence in a particular cell or tissue type. This is a technique of expression profiling, which permits simultaneous, comparative and quantitative analysis of gene-specific, 9- to 13-basepair sequences. These short sequences, called SAGE tags, are linked together for efficient sequencing. The sequencing data are then analyzed to identify each gene expressed in the cell and the levels at which each gene is expressed. The main benefit of SAGE includes the digital output and the identification of novel genes. In this review, we present an outline of the method, various bioinformatics methods for data analysis and general applications of this important technology. PMID:15273993

  11. Bioinformatics analysis of Brucella vaccines and vaccine targets using VIOLIN

    PubMed Central

    2010-01-01

    Background Brucella spp. are Gram-negative, facultative intracellular bacteria that cause brucellosis, one of the commonest zoonotic diseases found worldwide in humans and a variety of animal species. While several animal vaccines are available, there is no effective and safe vaccine for prevention of brucellosis in humans. VIOLIN (http://www.violinet.org) is a web-based vaccine database and analysis system that curates, stores, and analyzes published data of commercialized vaccines, and vaccines in clinical trials or in research. VIOLIN contains information for 454 vaccines or vaccine candidates for 73 pathogens. VIOLIN also contains many bioinformatics tools for vaccine data analysis, data integration, and vaccine target prediction. To demonstrate the applicability of VIOLIN for vaccine research, VIOLIN was used for bioinformatics analysis of existing Brucella vaccines and prediction of new Brucella vaccine targets. Results VIOLIN contains many literature mining programs (e.g., Vaxmesh) that provide in-depth analysis of Brucella vaccine literature. As a result of manual literature curation, VIOLIN contains information for 38 Brucella vaccines or vaccine candidates, 14 protective Brucella antigens, and 68 host response studies to Brucella vaccines from 97 peer-reviewed articles. These Brucella vaccines are classified in the Vaccine Ontology (VO) system and used for different ontological applications. The web-based VIOLIN vaccine target prediction program Vaxign was used to predict new Brucella vaccine targets. Vaxign identified 14 outer membrane proteins that are conserved in six virulent strains from B. abortus, B. melitensis, and B. suis that are pathogenic in humans. Of the 14 membrane proteins, two proteins (Omp2b and Omp31-1) are not present in B. ovis, a Brucella species that is not pathogenic in humans. Brucella vaccine data stored in VIOLIN were compared and analyzed using the VIOLIN query system. Conclusions Bioinformatics curation and ontological

  12. Bioinformatics approaches to single-cell analysis in developmental biology.

    PubMed

    Yalcin, Dicle; Hakguder, Zeynep M; Otu, Hasan H

    2016-03-01

    Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology. PMID:26358759

  13. Detecting evolution of bioinformatics with a content and co-authorship analysis.

    PubMed

    Song, Min; Yang, Christopher C; Tang, Xuning

    2013-12-01

    Bioinformatics is an interdisciplinary research field that applies advanced computational techniques to biological data. Bibliometrics analysis has recently been adopted to understand the knowledge structure of a research field by citation pattern. In this paper, we explore the knowledge structure of Bioinformatics from the perspective of a core open access Bioinformatics journal, BMC Bioinformatics with trend analysis, the content and co-authorship network similarity, and principal component analysis. Publications in four core journals including Bioinformatics - Oxford Journal and four conferences in Bioinformatics were harvested from DBLP. After converting publications into TF-IDF term vectors, we calculate the content similarity, and we also calculate the social network similarity based on the co-authorship network by utilizing the overlap measure between two co-authorship networks. Key terms is extracted and analyzed with PCA, visualization of the co-authorship network is conducted. The experimental results show that Bioinformatics is fast-growing, dynamic and diversified. The content analysis shows that there is an increasing overlap among Bioinformatics journals in terms of topics and more research groups participate in researching Bioinformatics according to the co-authorship network similarity. PMID:23710427

  14. Bioinformatic Analysis of HIV-1 Entry and Pathogenesis

    PubMed Central

    Aiamkitsumrit, Benjamas; Dampier, Will; Antell, Gregory; Rivera, Nina; Martin-Garcia, Julio; Pirrone, Vanessa; Nonnemacher, Michael R.; Wigdahl, Brian

    2015-01-01

    The evolution of human immunodeficiency virus type 1 (HIV-1) with respect to co-receptor utilization has been shown to be relevant to HIV-1 pathogenesis and disease. The CCR5-utilizing (R5) virus has been shown to be important in the very early stages of transmission and highly prevalent during asymptomatic infection and chronic disease. In addition, the R5 virus has been proposed to be involved in neuroinvasion and central nervous system (CNS) disease. In contrast, the CXCR4-utilizing (X4) virus is more prevalent during the course of disease progression and concurrent with the loss of CD4+ T cells. The dual-tropic virus is able to utilize both co-receptors (CXCR4 and CCR5) and has been thought to represent an intermediate transitional virus that possesses properties of both X4 and R5 viruses that can be encountered at many stages of disease. The use of computational tools and bioinformatic approaches in the prediction of HIV-1 co-receptor usage has been growing in importance with respect to understanding HIV-1 pathogenesis and disease, developing diagnostic tools, and improving the efficacy of therapeutic strategies focused on blocking viral entry. Current strategies have enhanced the sensitivity, specificity, and reproducibility relative to the prediction of co-receptor use; however, these technologies need to be improved with respect to their efficient and accurate use across the HIV-1 subtypes. The most effective approach may center on the combined use of different algorithms involving sequences within and outside of the env-V3 loop. This review focuses on the HIV-1 entry process and on co-receptor utilization, including bioinformatic tools utilized in the prediction of co-receptor usage. It also provides novel preliminary analyses for enabling identification of linkages between amino acids in V3 with other components of the HIV-1 genome and demonstrates that these linkages are different between X4 and R5 viruses. PMID:24862329

  15. Bioinformatics analysis of the epitope regions for norovirus capsid protein

    PubMed Central

    2013-01-01

    Background Norovirus is the major cause of nonbacterial epidemic gastroenteritis, being highly prevalent in both developing and developed countries. Despite of the available monoclonal antibodies (MAbs) for different sub-genogroups, a comprehensive epitope analysis based on various bioinformatics technology is highly desired for future potential antibody development in clinical diagonosis and treatment. Methods A total of 18 full-length human norovirus capsid protein sequences were downloaded from GenBank. Protein modeling was performed with program Modeller 9.9. The modeled 3D structures of capsid protein of norovirus were submitted to the protein antigen spatial epitope prediction webserver (SEPPA) for predicting the possible spatial epitopes with the default threshold. The results were processed using the Biosoftware. Results Compared with GI, we found that the GII genogroup had four deletions and two special insertions in the VP1 region. The predicted conformational epitope regions mainly concentrated on N-terminal (1~96), Middle Part (298~305, 355~375) and C-terminal (560~570). We find two common epitope regions on sequences for GI and GII genogroup, and also found an exclusive epitope region for GII genogroup. Conclusions The predicted conformational epitope regions of norovirus VP1 mainly concentrated on N-terminal, Middle Part and C-terminal. We find two common epitope regions on sequences for GI and GII genogroup, and also found an exclusive epitope region for GII genogroup. The overlapping with experimental epitopes indicates the important role of latest computational technologies. With the fast development of computational immunology tools, the bioinformatics pipeline will be more and more critical to vaccine design. PMID:23514273

  16. Edge Bioinformatics

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in amore » genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen or co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  17. Edge Bioinformatics

    SciTech Connect

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen or co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance

  18. Molecular cloning and bioinformatic analysis of SPATA4 gene.

    PubMed

    Liu, Shang-feng; Ai, Chao; Ge, Zhong-qi; Liu, Hai-luo; Liu, Bo-wen; He, Shan; Wang, Zhao

    2005-11-30

    Full-length cDNA sequences of four novel SPATA4 genes in chimpanzee, cow, chicken and ascidian were identified by bioinformatic analysis using mouse or human SPATA4 cDNA fragment as electronic probe. All these genes have 6 exons and have similar protein molecular weight and do not localize in sex chromosome. The mouse SPATA4 sequence is identified as significantly changed in cryptorchidism, which shares no significant homology with any known protein in swissprot databases except for the homologous genes in various vertebrates. Our searching results showed that all SPATA4 proteins have a putative conserved domain DUF1042. The percentages of putative SPATA4 protein sequence identity ranging from 30 % to 99 %. The high similarity was also found in 1 kb promoter regions of human, mouse and rat SPATA4 gene. The similarities of the sequences upstream of SPATA4 promoter also have a high proportion. The results of searching SymAtlas (http://symatlas.gnf.org/SymAtlas/) showed that human SPATA4 has a high expression in testis, especially in testis interstitial, leydig cell, seminiferous tubule and germ cell. Mouse SPATA4 was observed exclusively in adult mouse testis and almost no signal was detected in other tissues. The pI values of the protein are negative, ranging from 9.44 to 10.15. The subcellular location of the protein is usually in the nucleus. And the signal peptide possibilities for SPATA4 are always zero. Using the SNPs data in NCBI, we found 33 SNPs in human SPATA4 gene genomic DNA region, with the distribution of 29 SNPs in the introns. CpG island searching gives the data about CpG island, which shows that the regions of the CpG island have a high similarity with each other, though the length of the CpG island is different from each other. This research is a fundamental work in the fields of the bioinformational analysis, and also put forward a new way for the bioinformatic analysis of other genes. PMID:16336790

  19. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    PubMed Central

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-01-01

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula. PMID:27049397

  20. PATRIC, the bacterial bioinformatics database and analysis resource

    PubMed Central

    Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

    2014-01-01

    The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323

  1. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula.

    PubMed

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-01-01

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula. PMID:27049397

  2. Whale song analyses using bioinformatics sequence analysis approaches

    NASA Astrophysics Data System (ADS)

    Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

    2005-04-01

    Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.

  3. Predictive Bioinformatic Assignment of Methyl-Bearing Stereocenters, Total Synthesis, and an Additional Molecular Target of Ajudazol B.

    PubMed

    Essig, Sebastian; Schmalzbauer, Björn; Bretzke, Sebastian; Scherer, Olga; Koeberle, Andreas; Werz, Oliver; Müller, Rolf; Menche, Dirk

    2016-02-19

    Full details on the evaluation and application of an easily feasible and generally useful method for configurational assignments of isolated methyl-bearing stereocenters are reported. The analytical tool relies on a bioinformatic gene cluster analysis and utilizes a predictive enoylreductase alignment, and its feasibility was demonstrated by the full stereochemical determination of the ajudazols, highly potent inhibitors of the mitochondrial respiratory chain. Furthermore, a full account of our strategies and tactics that culminated in the total synthesis of ajudazol B, the most potent and least abundant of these structurally unique class of myxobacterial natural products, is presented. Key features include an application of an asymmetric ortholithiation strategy for synthesis of the characteristic anti-configured hydroxyisochromanone core bearing three contiguous stereocenters, a modular oxazole formation, a flexible cross-metathesis approach for terminal allyl amide synthesis, and a late-stage Z,Z-selective Suzuki coupling. This total synthesis unambiguously proves the correct stereochemistry, which was further corroborated by comparison with reisolated natural material. Finally, 5-lipoxygenase was discovered as an additional molecular target of ajudazol B. Activities against this clinically validated key enzyme of the biosynthesis of proinflammatory leukotrienes were in the range of the approved drug zileuton, which further underlines the biological importance of this unique natural product. PMID:26796481

  4. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis.

    PubMed

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  5. ZBIT Bioinformatics Toolbox: A Web-Platform for Systems Biology and Expression Data Analysis

    PubMed Central

    Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas

    2016-01-01

    Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475

  6. Credibility Analysis of Putative Disease-Causing Genes Using Bioinformatics

    PubMed Central

    Abel, Olubunmi; Powell, John F.; Andersen, Peter M.; Al-Chalabi, Ammar

    2013-01-01

    Background Genetic studies are challenging in many complex diseases, particularly those with limited diagnostic certainty, low prevalence or of old age. The result is that genes may be reported as disease-causing with varying levels of evidence, and in some cases, the data may be so limited as to be indistinguishable from chance findings. When there are large numbers of such genes, an objective method for ranking the evidence is useful. Using the neurodegenerative and complex disease amyotrophic lateral sclerosis (ALS) as a model, and the disease-specific database ALSoD, the objective is to develop a method using publicly available data to generate a credibility score for putative disease-causing genes. Methods Genes with at least one publication suggesting involvement in adult onset familial ALS were collated following an exhaustive literature search. SQL was used to generate a score by extracting information from the publications and combined with a pathogenicity analysis using bioinformatics tools. The resulting score allowed us to rank genes in order of credibility. To validate the method, we compared the objective ranking with a rank generated by ALS genetics experts. Spearman's Rho was used to compare rankings generated by the different methods. Results The automated method ranked ALS genes in the following order: SOD1, TARDBP, FUS, ANG, SPG11, NEFH, OPTN, ALS2, SETX, FIG4, VAPB, DCTN1, TAF15, VCP, DAO. This compared very well to the ranking of ALS genetics experts, with Spearman's Rho of 0.69 (P = 0.009). Conclusion We have presented an automated method for scoring the level of evidence for a gene being disease-causing. In developing the method we have used the model disease ALS, but it could equally be applied to any disease in which there is genotypic uncertainty. PMID:23755159

  7. Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom.

    PubMed

    Liu, Xiao; Wolfe, Richard; Welch, Lonnie R; Domozych, David S; Popper, Zoë A; Showalter, Allan M

    2016-01-01

    Extensins (EXTs) are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs) that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser) followed by three to five prolines (Pro) residues, which are hydroxylated as hydroxyproline (Hyp) and glycosylated. Some EXTs have Tyrosine (Tyr)-X-Tyr (where X can be any amino acid) motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs), proline-rich extensin-like receptor kinases (PERKs), formin-homolog EXTs (FH EXTs), chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1) classical EXTs were likely derived after the terrestrialization of plants; (2) LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3) monocots have few classical EXTs; (4) Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5) green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in

  8. Bioinformatic Identification and Analysis of Extensins in the Plant Kingdom

    PubMed Central

    Liu, Xiao; Wolfe, Richard; Welch, Lonnie R.; Domozych, David S.; Popper, Zoë A.; Showalter, Allan M.

    2016-01-01

    Extensins (EXTs) are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs) that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser) followed by three to five prolines (Pro) residues, which are hydroxylated as hydroxyproline (Hyp) and glycosylated. Some EXTs have Tyrosine (Tyr)-X-Tyr (where X can be any amino acid) motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs), proline-rich extensin-like receptor kinases (PERKs), formin-homolog EXTs (FH EXTs), chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1) classical EXTs were likely derived after the terrestrialization of plants; (2) LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3) monocots have few classical EXTs; (4) Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5) green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in

  9. Indentification and Analysis of Occludin Phosphosites: A Combined Mass Spectroscoy and Bioinformatics Approach

    SciTech Connect

    Sundstrom, J.; Tash, B; Murakami, T; Flanagan, J; Bewley, M; Stanley, B; Gonsar, K; Antonetti, D

    2009-01-01

    The molecular function of occludin, an integral membrane component of tight junctions, remains unclear. VEGF-induced phosphorylation sites were mapped on occludin by combining MS data analysis with bioinformatics. In vivo phosphorylation of Ser490 was validated and protein interaction studies combined with crystal structure analysis suggest that Ser490 phosphorylation attenuates the interaction between occludin and ZO-1. This study demonstrates that combining MS data and bioinformatics can successfully identify novel phosphorylation sites from limiting samples.

  10. Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing

    PubMed Central

    Bao, Riyue; Huang, Lei; Andrade, Jorge; Tan, Wei; Kibbe, Warren A; Jiang, Hongmei; Feng, Gang

    2014-01-01

    The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, pre-processing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics. PMID:25288881

  11. Buying in to bioinformatics: an introduction to commercial sequence analysis software.

    PubMed

    Smith, David Roy

    2015-07-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  12. Buying in to bioinformatics: an introduction to commercial sequence analysis software

    PubMed Central

    2015-01-01

    Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics. PMID:25183247

  13. Bioinformatic Analysis of Toll-Like Receptor Sequences and Structures.

    PubMed

    Monie, Tom P; Gay, Nicholas J; Gangloff, Monique

    2016-01-01

    Continual advancements in computing power and sophistication, coupled with rapid increases in protein sequence and structural information, have made bioinformatic tools an invaluable resource for the molecular and structural biologist. With the degree of sequence information continuing to expand at an almost exponential rate, it is essential that scientists today have a basic understanding of how to utilise, manipulate and analyse this information for the benefit of their own experiments. In the context of Toll-Interleukin I Receptor domain containing proteins, we describe here a series of the more common and user-friendly bioinformatic tools available as Internet-based resources. These will enable the identification and alignment of protein sequences; the identification of functional motifs; the characterisation of protein secondary structure; the identification of protein structural folds and distantly homologous proteins; and the validation of the structural geometry of modelled protein structures. PMID:26803620

  14. Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis and beyond.

    PubMed

    Lee, Sungmin; Min, Hyeyoung; Yoon, Sungroh

    2016-07-01

    A wide variety of large-scale data have been produced in bioinformatics. In response, the need for efficient handling of biomedical big data has been partly met by parallel computing. However, the time demand of many bioinformatics programs still remains high for large-scale practical uses because of factors that hinder acceleration by parallelization. Recently, new generations of storage devices have emerged, such as NAND flash-based solid-state drives (SSDs), and with the renewed interest in near-data processing, they are increasingly becoming acceleration methods that can accompany parallel processing. In certain cases, a simple drop-in replacement of hard disk drives by SSDs results in dramatic speedup. Despite the various advantages and continuous cost reduction of SSDs, there has been little review of SSD-based profiling and performance exploration of important but time-consuming bioinformatics programs. For an informative review, we perform in-depth profiling and analysis of 23 key bioinformatics programs using multiple types of devices. Based on the insight we obtain from this research, we further discuss issues related to design and optimize bioinformatics algorithms and pipelines to fully exploit SSDs. The programs we profile cover traditional and emerging areas of importance, such as alignment, assembly, mapping, expression analysis, variant calling and metagenomics. We explain how acceleration by parallelization can be combined with SSDs for improved performance and also how using SSDs can expedite important bioinformatics pipelines, such as variant calling by the Genome Analysis Toolkit and transcriptome analysis using RNA sequencing. We hope that this review can provide useful directions and tips to accompany future bioinformatics algorithm design procedures that properly consider new generations of powerful storage devices. PMID:26330577

  15. [Bioinformatics analysis of DNA demethylase genes in Lonicera japonica Thunb].

    PubMed

    Qi, Lin-jie; Yuan, Yuan; Wu, Chong; Huang, Lu-qi; Chen, Ping

    2015-03-01

    The DNA demethylase genes are widespread in plants. Four DNA demethylase genes (LJDME1, LJDME2, LJDME3 and LJDME4) were obtained from transcriptome dataset of Lonicera japonica Thunb by using bioinformatics methods and the proteins' physicochemical properties they encoded were predicted. The phylogenetic tree showed that the four DNA demethylase genes and Arabidopsis thaliana DME had a close relationship. The result of gene expression model showed that four DNA demethylase genes were different between species. The expression levels of LJDME1 and LJDME2 were even more higher in Lonicera japonica var. chinensis than those in L. japonica. LJDME] and LJDME2 maybe regulate the active compounds of L. japonica. This study aims to lay a foundation for further understanding of the function of DNA demethylase genes in L. japonica. PMID:26118119

  16. In Silico Identification, Phylogenetic and Bioinformatic Analysis of Argonaute Genes in Plants

    PubMed Central

    Mirzaei, Khaled; Bahramnejad, Bahman; Shamsifard, Mohammad Hasan; Zamani, Wahid

    2014-01-01

    Argonaute protein family is the key players in pathways of gene silencing and small regulatory RNAs in different organisms. Argonaute proteins can bind small noncoding RNAs and control protein synthesis, affect messenger RNA stability, and even participate in the production of new forms of small RNAs. The aim of this study was to characterize and perform bioinformatic analysis of Argonaute proteins in 32 plant species that their genome was sequenced. A total of 437 Argonaute genes were identified and were analyzed based on lengths, gene structure, and protein structure. Results showed that Argonaute proteins were highly conserved across plant kingdom. Phylogenic analysis divided plant Argonautes into three classes. Argonaute proteins have three conserved domains PAZ, MID and PIWI. In addition to three conserved domains namely, PAZ, MID, and PIWI, we identified few more domains in AGO of some plant species. Expression profile analysis of Argonaute proteins showed that expression of these genes varies in most of tissues, which means that these proteins are involved in regulation of most pathways of the plant system. Numbers of alternative transcripts of Argonaute genes were highly variable among the plants. A thorough analysis of large number of putative Argonaute genes revealed several interesting aspects associated with this protein and brought novel information with promising usefulness for both basic and biotechnological applications. PMID:25309901

  17. In silico identification, phylogenetic and bioinformatic analysis of argonaute genes in plants.

    PubMed

    Mirzaei, Khaled; Bahramnejad, Bahman; Shamsifard, Mohammad Hasan; Zamani, Wahid

    2014-01-01

    Argonaute protein family is the key players in pathways of gene silencing and small regulatory RNAs in different organisms. Argonaute proteins can bind small noncoding RNAs and control protein synthesis, affect messenger RNA stability, and even participate in the production of new forms of small RNAs. The aim of this study was to characterize and perform bioinformatic analysis of Argonaute proteins in 32 plant species that their genome was sequenced. A total of 437 Argonaute genes were identified and were analyzed based on lengths, gene structure, and protein structure. Results showed that Argonaute proteins were highly conserved across plant kingdom. Phylogenic analysis divided plant Argonautes into three classes. Argonaute proteins have three conserved domains PAZ, MID and PIWI. In addition to three conserved domains namely, PAZ, MID, and PIWI, we identified few more domains in AGO of some plant species. Expression profile analysis of Argonaute proteins showed that expression of these genes varies in most of tissues, which means that these proteins are involved in regulation of most pathways of the plant system. Numbers of alternative transcripts of Argonaute genes were highly variable among the plants. A thorough analysis of large number of putative Argonaute genes revealed several interesting aspects associated with this protein and brought novel information with promising usefulness for both basic and biotechnological applications. PMID:25309901

  18. [BIOINFORMATIC SEARCH AND PHYLOGENETIC ANALYSIS OF THE CELLULOSE SYNTHASE GENES OF FLAX (LINUM USITATISSIMUM)].

    PubMed

    Pydiura, N A; Bayer, G Ya; Galinousky, D V; Yemets, A I; Pirko, Ya V; Podvitski, T A; Anisimova, N V; Khotyleva, L V; Kilchevsky, A V; Blume, Ya B

    2015-01-01

    A bioinformatic search of sequences encoding cellulose synthase genes in the flax genome, and their comparison to dicots orthologs was carried out. The analysis revealed 32 cellulose synthase gene candidates, 16 of which are highly likely to encode cellulose synthases, and the remaining 16--cellulose synthase-like proteins (Csl). Phylogenetic analysis of gene products of cellulose synthase genes allowed distinguishing 6 groups of cellulose synthase genes of different classes: CesA1/10, CesA3, CesA4, CesA5/6/2/9, CesA7 and CesA8. Paralogous sequences within classes CesA1/10 and CesA5/6/2/9 which are associated with the primary cell wall formation are characterized by a greater similarity within these classes than orthologous sequences. Whereas the genes controlling the biosynthesis of secondary cell wall cellulose form distinct clades: CesA4, CesA7, and CesA8. The analysis of 16 identified flax cellulose synthase gene candidates shows the presence of at least 12 different cellulose synthase gene variants in flax genome which are represented in all six clades of cellulose synthase genes. Thus, at this point genes of all ten known cellulose synthase classes are identify in flax genome, but their correct classification requires additional research. PMID:26638491

  19. Identification of key genes in glioblastoma-associated stromal cells using bioinformatics analysis

    PubMed Central

    CHEN, CHENGYONG; SUN, CHONG; TANG, DONG; YANG, GUANGCHENG; ZHOU, XUANJUN; WANG, DONGHAI

    2016-01-01

    The aim of the present study was to identify key genes and pathways in glioblastoma-associated stromal cells (GASCs) using bioinformatics. The expression profile of microarray GSE24100 was obtained from the Gene Expression Omnibus database, which included the expression profile of 4 GASC samples and 3 control stromal cell samples. Differentially expressed genes (DEGs) were identified using limma software in R language, and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analysis of DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery software. In addition, a protein-protein interaction (PPI) network was constructed. Subsequently, a sub-network was constructed to obtain additional information on genes identified in the PPI network using CFinder software. In total, 502 DEGs were identified in GASCs, including 331 upregulated genes and 171 downregulated genes. Cyclin-dependent kinase 1 (CDK1), cyclin A2, mitotic checkpoint serine/threonine kinase (BUB1), cell division cycle 20 (CDC20), polo-like kinase 1 (PLK1), and transcription factor breast cancer 1, early onset (BRCA1) were identified from the PPI network, and sub-networks revealed these genes as hub genes that were involved in significant pathways, including mitotic, cell cycle and p53 signaling pathways. In conclusion, CDK1, BUB1, CDC20, PLK1 and BRCA1 may be key genes that are involved in significant pathways associated with glioblastoma. This information may lead to the identification of the mechanism of glioblastoma tumorigenesis. PMID:27313730

  20. Bioinformatic analysis of functional proteins involved in obesity associated with diabetes.

    PubMed

    Rao, Allam Appa; Tayaru, N Manga; Thota, Hanuman; Changalasetty, Suresh Babu; Thota, Lalitha Saroja; Gedela, Srinubabu

    2008-03-01

    The twin epidemic of diabetes and obesity pose daunting challenges worldwide. The dramatic rise in obesity-associated diabetes resulted in an alarming increase in the incidence and prevalence of obesity an important complication of diabetes. Differences among individuals in their susceptibility to both these conditions probably reflect their genetic constitutions. The dramatic improvements in genomic and bioinformatic resources are accelerating the pace of gene discovery. It is tempting to speculate the key susceptible genes/proteins that bridges diabetes mellitus and obesity. In this regard, we evaluated the role of several genes/proteins that are believed to be involved in the evolution of obesity associated diabetes by employing multiple sequence alignment using ClustalW tool and constructed a phylogram tree using functional protein sequences extracted from NCBI. Phylogram was constructed using Neighbor-Joining Algorithm a bioinformatic tool. Our bioinformatic analysis reports resistin gene as ominous link with obesity associated diabetes. This bioinformatic study will be useful for future studies towards therapeutic inventions of obesity associated type 2 diabetes. PMID:23675069

  1. The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis.

    PubMed

    Alva, Vikram; Nam, Seung-Zin; Söding, Johannes; Lupas, Andrei N

    2016-07-01

    The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts and non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) and internally (e.g. HHpred, HHblits, PCOILS). While a beta version of the Toolkit was released 10 years ago, the current production-level release has been available since 2008 and has serviced more than 1.6 million external user queries. The usage of the Toolkit has continued to increase linearly over the years, reaching more than 400 000 queries in 2015. In fact, through the breadth of its tools and their tight interconnection, the Toolkit has become an excellent platform for experimental scientists as well as a useful resource for teaching bioinformatic inquiry to students in the life sciences. In this article, we report on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment. PMID:27131380

  2. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae.

    PubMed Central

    Spingola, M; Grate, L; Haussler, D; Ares, M

    1999-01-01

    Introns have typically been discovered in an ad hoc fashion: introns are found as a gene is characterized for other reasons. As complete eukaryotic genome sequences become available, better methods for predicting RNA processing signals in raw sequence will be necessary in order to discover genes and predict their expression. Here we present a catalog of 228 yeast introns, arrived at through a combination of bioinformatic and molecular analysis. Introns annotated in the Saccharomyces Genome Database (SGD) were evaluated, questionable introns were removed after failing a test for splicing in vivo, and known introns absent from the SGD annotation were added. A novel branchpoint sequence, AAUUAAC, was identified within an annotated intron that lacks a six-of-seven match to the highly conserved branchpoint consensus UACUAAC. Analysis of the database corroborates many conclusions about pre-mRNA substrate requirements for splicing derived from experimental studies, but indicates that splicing in yeast may not be as rigidly determined by splice-site conservation as had previously been thought. Using this database and a molecular technique that directly displays the lariat intron products of spliced transcripts (intron display), we suggest that the current set of 228 introns is still not complete, and that additional intron-containing genes remain to be discovered in yeast. The database can be accessed at http://www.cse.ucsc.edu/research/compbi o/yeast_introns.html. PMID:10024174

  3. Proteomics and bioinformatics analysis of mouse hypothalamic neurogenesis with or without EPHX2 gene deletion

    PubMed Central

    Zhong, Lijun; Zhou, Juntuo; Wang, Dawei; Zou, Xiajuan; Lou, Yaxin; Liu, Dan; Yang, Bin; Zhu, Yi; Li, Xiaoxia

    2015-01-01

    The aim of this study was to identify differently expressed proteins in the presence and absence of EPHX2 gene in mouse hypothalamus using proteomics profiling and bioinformatics analysis. This study was performed on 3 wild type (WT) and 3 EPHX2 gene global knockout (KO) mice (EPHX2 -/-). Using the nano- electrospray ionization (ESI)-LC-MS/MS detector, we identified 31 over-expressed proteins in WT mouse hypothalamus compared to the KO counterparts. Gene Ontology (GO) annotation in terms of the protein-protein interaction network indicated that cellular metabolic process, protein metabolic process, signaling transduction and protein post-translation biological processes involved in EPHX2 -/- regulatory network. In addition, signaling pathway enrichment analysis also highlighted chronic neurodegenerative diseases and some other signaling pathways, such as TGF-beta signaling pathway, T cell receptor signaling pathway, ErbB signaling pathway, Neurotrophin signaling pathway and MAPK signaling pathway, were strongly coupled with EPHX2 gene knockout. Further studies into the molecular functions of EPHX2 gene in hypothalamus will help to provide new perspective in neurogenesis. PMID:26722453

  4. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace.

    PubMed

    Qu, Kun; Garamszegi, Sara; Wu, Felix; Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P; Lee, Brian T; Kuhn, Robert M; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y; Mesirov, Jill P

    2016-03-01

    Complex biomedical analyses require the use of multiple software tools in concert and remain challenging for much of the biomedical research community. We introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource that currently supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate integrative analysis by non-programmers, it offers a growing set of 'recipes', short workflows to guide investigators through high-utility analysis tasks. PMID:26780094

  5. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Xiang, Fang; Ningqiu, Li; Xiaozhe, Fu; Kaibin, Li; Qiang, Lin; Lihui, Liu; Cunbin, Shi; Shuqin, Wu

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects. PMID:26351170

  6. Bioinformatics analysis of the gene expression profile of hepatocellular carcinoma: preliminary results

    PubMed Central

    Li, Jia

    2016-01-01

    Aim of the study To analyse the expression profile of hepatocellular carcinoma compared with normal liver by using bioinformatics methods. Material and methods In this study, we analysed the microarray expression data of HCC and adjacent normal liver samples from the Gene Expression Omnibus (GEO) database to screen for differentially expressed genes. Then, functional analyses were performed using GenCLiP analysis, Gene Ontology categories, and aberrant pathway identification. In addition, we used the CMap database to identify small molecules that can induce HCC. Results Overall, 2721 differentially expressed genes (DEGs) were identified. We found 180 metastasis-related genes and constructed co-occurrence networks. Several significant pathways, including the transforming growth factor β (TGF-β) signalling pathway, were identified as closely related to these DEGs. Some candidate small molecules (such as betahistine) were identified that might provide a basis for developing HCC treatments in the future. Conclusions Although we functionally analysed the differences in the gene expression profiles of HCC and normal liver tissues, our study is essentially preliminary, and it may be premature to apply our results to clinical trials. Further research and experimental testing are required in future studies. PMID:27095935

  7. Deep Sequencing Analysis of Nucleolar Small RNAs: Bioinformatics.

    PubMed

    Bai, Baoyan; Laiho, Marikki

    2016-01-01

    Small RNAs (size 20-30 nt) of various types have been actively investigated in recent years, and their subcellular compartmentalization and relative concentrations are likely to be of importance to their cellular and physiological functions. Comprehensive data on this subset of the transcriptome can only be obtained by application of high-throughput sequencing, which yields data that are inherently complex and multidimensional, as sequence composition, length, and abundance will all inform to the small RNA function. Subsequent data analysis, hypothesis testing, and presentation/visualization of the results are correspondingly challenging. We have constructed small RNA libraries derived from different cellular compartments, including the nucleolus, and asked whether small RNAs exist in the nucleolus and whether they are distinct from cytoplasmic and nuclear small RNAs, the miRNAs. Here, we present a workflow for analysis of small RNA sequencing data generated by the Ion Torrent PGM sequencer from samples derived from different cellular compartments. PMID:27576724

  8. BLASTGrabber: a bioinformatic tool for visualization, analysis and sequence selection of massive BLAST data

    PubMed Central

    2014-01-01

    Background Advances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data. Results Here we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality. Conclusion The BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations. PMID:24885091

  9. Gene expression profiling via bioinformatics analysis reveals biomarkers in laryngeal squamous cell carcinoma

    PubMed Central

    GUAN, GUO-FANG; ZHENG, YING; WEN, LIAN-JI; ZHANG, DE-JUN; YU, DUO-JIAO; LU, YAN-QING; ZHAO, YAN; ZHANG, HUI

    2015-01-01

    The present study aimed to identify key genes and relevant microRNAs (miRNAs) involved in laryngeal squamous cell carcinoma (LSCC). The gene expression profiles of LSCC tissue samples were analyzed with various bioinformatics tools. A gene expression data set (GSE51985), including ten laryngeal squamous cell carcinoma (LSCC) tissue samples and ten adjacent non-neoplastic tissue samples, was downloaded from the Gene Expression Omnibus. Differential analysis was performed using software package limma of R. Functional enrichment analysis was applied to the differentially expressed genes (DEGs) using the Database for Annotation, Visualization and Integrated Discovery. Protein-protein interaction (PPI) networks were constructed for the protein products using information from the Search Tool for the Retrieval of Interacting Genes/Proteins. Module analysis was performed using ClusterONE (a software plugin from Cytoscape). MicroRNAs (miRNAs) regulating the DEGs were predicted using WebGestalt. A total of 461 DEGs were identified in LSCC, 297 of which were upregulated and 164 of which were downregulated. Cell cycle, proteasome and DNA replication were significantly over-represented in the upregulated genes, while the ribosome was significantly over-represented in the downregulated genes. Two PPI networks were constructed for the up- and downregulated genes. One module from the upregulated gene network was associated with protein kinase. Numerous miRNAs associated with LSCC were predicted, including miRNA (miR)-25, miR-32, miR-92 and miR-29. In conclusion, numerous key genes and pathways involved in LSCC were revealed, which may aid the advancement of current knowledge regarding the pathogenesis of LSCC. In addition, relevant miRNAs were also identified, which may represent potential biomarkers for use in the diagnosis or treatment of the disease. PMID:25936657

  10. Integration and bioinformatics analysis of DNA-methylated genes associated with drug resistance in ovarian cancer

    PubMed Central

    YAN, BINGBING; YIN, FUQIANG; WANG, QI; ZHANG, WEI; LI, LI

    2016-01-01

    The main obstacle to the successful treatment of ovarian cancer is the development of drug resistance to combined chemotherapy. Among all the factors associated with drug resistance, DNA methylation apparently plays a critical role. In this study, we performed an integrative analysis of the 26 DNA-methylated genes associated with drug resistance in ovarian cancer, and the genes were further evaluated by comprehensive bioinformatics analysis including gene/protein interaction, biological process enrichment and annotation. The results from the protein interaction analyses revealed that at least 20 of these 26 methylated genes are present in the protein interaction network, indicating that they interact with each other, have a correlation in function, and may participate as a whole in the regulation of ovarian cancer drug resistance. There is a direct interaction between the phosphatase and tensin homolog (PTEN) gene and at least half of the other genes, indicating that PTEN may possess core regulatory functions among these genes. Biological process enrichment and annotation demonstrated that most of these methylated genes were significantly associated with apoptosis, which is possibly an essential way for these genes to be involved in the regulation of multidrug resistance in ovarian cancer. In addition, a comprehensive analysis of clinical factors revealed that the methylation level of genes that are associated with the regulation of drug resistance in ovarian cancer was significantly correlated with the prognosis of ovarian cancer. Overall, this study preliminarily explains the potential correlation between the genes with DNA methylation and drug resistance in ovarian cancer. This finding has significance for our understanding of the regulation of resistant ovarian cancer by methylated genes, the treatment of ovarian cancer, and improvement of the prognosis of ovarian cancer. PMID:27347118

  11. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology

    PubMed Central

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-01-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology. PMID:26753026

  12. Quantitative Analysis of the Trends Exhibited by the Three Interdisciplinary Biological Sciences: Biophysics, Bioinformatics, and Systems Biology.

    PubMed

    Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh

    2015-12-01

    New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology. PMID:26753026

  13. [Cloning and bioinformatic analysis and expression analysis of beta-glucuronidase in Scutellaria baicalensis].

    PubMed

    Guo, Shuang-shuang; Cheng, Lin; Yang, Li-min; Han, Mei

    2015-11-01

    The β-Glucuronidase gene (sbGUS) cDNA firstly from Scutellari abaicalensis leaf was cloned by RT-PCR, with GenBank accession number KR364726. The full length cDNA of sbGUS was 1 584 bp with an open reading frame (ORF), encoding an unstable protein with 527 amino acids. The bioinformatic analysis showed that the sbGUS encoding protein had isoelectric point (pI) of 5.55 and a calculated molecular weight about 58.724 8 kDa, with a transmembrane regions and signal peptide, had conserved domains of glycoside hydrolase super family and unintegrated trans-glycosidase catalytic structure. In the secondary structure, the percentage of alpha helix, extended strand, β-extended and random coil were 25.62%, 28.84%, 13.28% and 32.26%, respectively. The homologous analysis indicated the nucleotide sequence 98.93% similarity and the amino acid sequence 98.29% similarity with S. baicalensis (BAA97804.1), in the nine positions were different. The expression level of sGUS was the highest in root based on a real-time PCR analysis, followed by flower and stem, and the lowest was in stem. The results provide a foundation for exploring the molecular function of sbGUS involved in baicalcin biosynthesis based on synthetic biology approach in S. baicalensis plants. PMID:27097409

  14. Bioinformatics analysis of differentially expressed proteins in prostate cancer based on proteomics data

    PubMed Central

    Chen, Chen; Zhang, Li-Guo; Liu, Jian; Han, Hui; Chen, Ning; Yao, An-Liang; Kang, Shao-San; Gao, Wei-Xing; Shen, Hong; Zhang, Long-Jun; Li, Ya-Peng; Cao, Feng-Hong; Li, Zhi-Guo

    2016-01-01

    We mined the literature for proteomics data to examine the occurrence and metastasis of prostate cancer (PCa) through a bioinformatics analysis. We divided the differentially expressed proteins (DEPs) into two groups: the group consisting of PCa and benign tissues (P&b) and the group presenting both high and low PCa metastatic tendencies (H&L). In the P&b group, we found 320 DEPs, 20 of which were reported more than three times, and DES was the most commonly reported. Among these DEPs, the expression levels of FGG, GSN, SERPINC1, TPM1, and TUBB4B have not yet been correlated with PCa. In the H&L group, we identified 353 DEPs, 13 of which were reported more than three times. Among these DEPs, MDH2 and MYH9 have not yet been correlated with PCa metastasis. We further confirmed that DES was differentially expressed between 30 cancer and 30 benign tissues. In addition, DEPs associated with protein transport, regulation of actin cytoskeleton, and the extracellular matrix (ECM)–receptor interaction pathway were prevalent in the H&L group and have not yet been studied in detail in this context. Proteins related to homeostasis, the wound-healing response, focal adhesions, and the complement and coagulation pathways were overrepresented in both groups. Our findings suggest that the repeatedly reported DEPs in the two groups may function as potential biomarkers for detecting PCa and predicting its aggressiveness. Furthermore, the implicated biological processes and signaling pathways may help elucidate the molecular mechanisms of PCa carcinogenesis and metastasis and provide new targets for clinical treatment. PMID:27051295

  15. Applying Instructional Design Theories to Bioinformatics Education in Microarray Analysis and Primer Design Workshops

    ERIC Educational Resources Information Center

    Shachak, Aviv; Ophir, Ron; Rubin, Eitan

    2005-01-01

    The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…

  16. Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline

    PubMed Central

    Jiménez-Barrón, Laura T.; O'Rawe, Jason A.; Wu, Yiyang; Yoon, Margaret; Fang, Han; Iossifov, Ivan; Lyon, Gholson J.

    2015-01-01

    Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations. PMID:27148569

  17. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    PubMed Central

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  18. proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data*

    PubMed Central

    Wang, Xiaojing; Slebos, Robbert J. C.; Chambers, Matthew C.; Tabb, David L.; Liebler, Daniel C.; Zhang, Bing

    2016-01-01

    To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs)1 within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation, and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily reannotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research. PMID:26657539

  19. Multi-loci diagnosis of acute lymphoblastic leukaemia with high-throughput sequencing and bioinformatics analysis.

    PubMed

    Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu

    2016-05-01

    High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. PMID:26898266

  20. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists

    PubMed Central

    Huang, Da Wei; Sherman, Brad T.; Lempicki, Richard A.

    2009-01-01

    Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests. PMID:19033363

  1. SweetNET: A Bioinformatics Workflow for Glycopeptide MS/MS Spectral Analysis.

    PubMed

    Nasir, Waqas; Toledo, Alejandro Gomez; Noborn, Fredrik; Nilsson, Jonas; Wang, Mingxun; Bandeira, Nuno; Larson, Göran

    2016-08-01

    Glycoproteomics has rapidly become an independent analytical platform bridging the fields of glycomics and proteomics to address site-specific protein glycosylation and its impact in biology. Current glycopeptide characterization relies on time-consuming manual interpretations and demands high levels of personal expertise. Efficient data interpretation constitutes one of the major challenges to be overcome before true high-throughput glycopeptide analysis can be achieved. The development of new glyco-related bioinformatics tools is thus of crucial importance to fulfill this goal. Here we present SweetNET: a data-oriented bioinformatics workflow for efficient analysis of hundreds of thousands of glycopeptide MS/MS-spectra. We have analyzed MS data sets from two separate glycopeptide enrichment protocols targeting sialylated glycopeptides and chondroitin sulfate linkage region glycopeptides, respectively. Molecular networking was performed to organize the glycopeptide MS/MS data based on spectral similarities. The combination of spectral clustering, oxonium ion intensity profiles, and precursor ion m/z shift distributions provided typical signatures for the initial assignment of different N-, O- and CS-glycopeptide classes and their respective glycoforms. These signatures were further used to guide database searches leading to the identification and validation of a large number of glycopeptide variants including novel deoxyhexose (fucose) modifications in the linkage region of chondroitin sulfate proteoglycans. PMID:27399812

  2. Nautilus: a bioinformatics package for the analysis of HIV type 1 targeted deep sequencing data.

    PubMed

    Kijak, Gustavo H; Pham, Phuc; Sanders-Buell, Eric; Harbolick, Elizabeth A; Eller, Leigh Anne; Robb, Merlin L; Michael, Nelson L; Kim, Jerome H; Tovanabutra, Sodsai

    2013-10-01

    The advent of next generation sequencing technologies is providing new insight into HIV-1 diversity and evolution, which has created the need for bioinformatics tools that could be applied to the characterization of viral quasispecies. Here we present Nautilus, a bioinformatics package for the analysis of HIV-1 targeted deep sequencing data. The DeepHaplo module determines the nucleotide base frequency and read depth at each position and computes the haplotype frequencies based on the linkage among polymorphisms in the same next generation sequence read. The Motifs module computes the frequency of the variants in the setting of their sequence context and mapping orientation, which allows for the validation of polymorphisms and haplotypes when strand bias is suspected. Both modules are accessed through a user-friendly GUI, which runs on Mac OS X (version 10.7.4 or later), and are based on Python, JAVA, and R scripts. Nautilus is available from www.hivresearch.org/research.php?ServiceID=5&SubServiceID=6 . PMID:23809062

  3. Bioinformatics identification of modules of transcription factor binding sites in Alzheimer's disease-related genes by in silico promoter analysis and microarrays.

    PubMed

    Augustin, Regina; Lichtenthaler, Stefan F; Greeff, Michael; Hansen, Jens; Wurst, Wolfgang; Trümbach, Dietrich

    2011-01-01

    The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases. PMID:21559189

  4. Adaptation of a Bioinformatics Microarray Analysis Workflow for a Toxicogenomic Study in Rainbow Trout

    PubMed Central

    Depiereux, Sophie; De Meulder, Bertrand; Bareke, Eric; Berger, Fabrice; Le Gac, Florence; Depiereux, Eric; Kestemont, Patrick

    2015-01-01

    Sex steroids play a key role in triggering sex differentiation in fish, the use of exogenous hormone treatment leading to partial or complete sex reversal. This phenomenon has attracted attention since the discovery that even low environmental doses of exogenous steroids can adversely affect gonad morphology (ovotestis development) and induce reproductive failure. Modern genomic-based technologies have enhanced opportunities to find out mechanisms of actions (MOA) and identify biomarkers related to the toxic action of a compound. However, high throughput data interpretation relies on statistical analysis, species genomic resources, and bioinformatics tools. The goals of this study are to improve the knowledge of feminisation in fish, by the analysis of molecular responses in the gonads of rainbow trout fry after chronic exposure to several doses (0.01, 0.1, 1 and 10 μg/L) of ethynylestradiol (EE2) and to offer target genes as potential biomarkers of ovotestis development. We successfully adapted a bioinformatics microarray analysis workflow elaborated on human data to a toxicogenomic study using rainbow trout, a fish species lacking accurate functional annotation and genomic resources. The workflow allowed to obtain lists of genes supposed to be enriched in true positive differentially expressed genes (DEGs), which were subjected to over-representation analysis methods (ORA). Several pathways and ontologies, mostly related to cell division and metabolism, sexual reproduction and steroid production, were found significantly enriched in our analyses. Moreover, two sets of potential ovotestis biomarkers were selected using several criteria. The first group displayed specific potential biomarkers belonging to pathways/ontologies highlighted in the experiment. Among them, the early ovarian differentiation gene foxl2a was overexpressed. The second group, which was highly sensitive but not specific, included the DEGs presenting the highest fold change and lowest p

  5. Fatty acid metabolism pathway play an important role in carcinogenesis of human colorectal cancers by Microarray-Bioinformatics analysis.

    PubMed

    Yeh, Ching-Sheng; Wang, Jaw-Yuan; Cheng, Tian-Lu; Juan, Chin-Hung; Wu, Chan-Han; Lin, Shiu-Ru

    2006-02-28

    The present study systematically explored metabolic pathways and altered expressions of genes speculatively participating in colorectal carcinogenesis by using a Microarray-Bioinformatic analysis methods. The results revealed that 157 genes were up-regulated and 281 genes were down-regulated in colorectal cancer (CRC). Gene Ontology (GO) and relevant bioinformatics tools indicated that the functional category to which 438 genes (12%; 438/3800) of the most frequent alteration belonged was metabolism. The analysis of 10 colorectal cancer tissue specimens demonstrated that genes involved in fatty acid metabolic pathways had high rates of overexpression. In addition, we stimulated CRL-1790 cell line with linoleic acid (a polyunsaturated fatty acid) for 12, 24, 48 and 72 h. Cell proliferation was elevated by 5, 25, 28 and 31% (P<0.05), respectively. Further analyses revealed that the genes increasingly expressed in the cell line included enoyl-Coenzyme A, hydratase/3-hydroxyacyl Coenzyme A dehydrogenase (EHHADH), enoyl Coenzyme A hydratase, short chain, 1, mitochondrial (ECHS1); glutaryl-Coenzyme A dehydrogenase (GCDH), acyl-Coenzyme A oxidase 2, branched chain (ACOX2); acyl-Coenzyme A dehydrogenase, C-2 to C-3 short chain precursor (ACADS); carnitine palmitoyltransferase 1B (CPT1B), acyl-CoA synthetase long-chain family member 5 (ACSL5), and cytochrome P450, family 4, subfamily A, and polypeptide 11 (CYP4A11) genes. This indicated that the stimulating effect of linoleic acid on cell proliferation was due to interference with the metabolic pathway of fatty acid metabolism. In conclusion, genes with altered expression levels in CRC were mainly associated with fatty acid metabolic pathways speculated to have an important role linked to carcinogenesis. PMID:15885896

  6. Genomic expression profiling and bioinformatics analysis on diabetic nephrology with ginsenoside Rg3

    PubMed Central

    Wang, Juan; Cui, Chunli; Fu, Li; Xiao, Zili; Xie, Nanzi; Liu, Yang; Yu, Lu; Wang, Haifeng; Luo, Bangzhen

    2016-01-01

    Diabetic nephropathy (DN), a common diabetes-related complication, is the leading cause of progressive chronic kidney disease (CKD) and end-stage renal disease. Despite the rapid development in the treatment of DN, currently available therapies used in early DN cannot prevent progressive CKD. The exact pathogenic mechanisms and the molecular events underlying DN development remain unclear. Ginsenoside Rg3 is a herbal medicine with numerous pharmacological effects. To gain a greater understanding of the molecular mechanism and signaling pathway underlying the effect of ginsenoside Rg3 in DN therapy, an RNA sequencing approach was performed to screen differential gene expression in a rat model of DN treated with ginsenoside Rg3. A combined bioinformatics analysis was then conducted to obtain insights into the underlying molecular mechanisms of the disease development, in order to identify potential novel targets for the treatment of DN. Six Sprague-Dawley male rats were randomly divided into 3 groups: Normal control group, DN group and ginsenoside-Rg3 treatment group, with two rats in each group. RNA sequencing was adopted for transcriptome profiling of cells from the renal cortex of DN rat model. Differentially expressed genes were screened out. Cluster analysis, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were used to analyze the differentially expressed genes. In total, 78 differentially expressed genes in the DN control group were identified when compared with the normal control group, of which 52 genes were upregulated and 26 genes were downregulated. Differential expression of 43 genes was observed in the ginsenoside-Rg3 treatment group when compared with the DN control group, consisting of 10 upregulated genes and 33 downregulated genes. Notably, 21 that were downregulated in the DN control group compared with the control were then shown to be upregulated in the ginsenoside-Rg3 treatment group compared with the DN

  7. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    PubMed

    Ju, Feng; Zhang, Tong

    2015-11-01

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation. PMID:26451629

  8. Bioinformatic analysis of non-VP1 capsid protein of coxsackievirus A6.

    PubMed

    Liu, Hong-Bo; Yang, Guang-Fei; Liang, Si-Jia; Lin, Jun

    2016-08-01

    This study bioinformatically analyzed the non-VP1 capsid proteins (VP2-VP4) of Coxasckievirus A6 (CVA6), with an attempt to predict their basic physicochemical properties, structural/functional features and linear B cell eiptopes. The online tools SubLoc, TargetP and the others from ExPASy Bioinformatics Resource Portal, and SWISS-MODEL (an online protein structure modeling server), were utilized to analyze the amino acid (AA) sequences of VP2-VP4 proteins of CVA6. Our results showed that the VP proteins of CVA6 were all of hydrophilic nature, contained phosphorylation and glycosylation sites and harbored no signal peptide sequences and acetylation sites. Except VP3, the other proteins did not have transmembrane helix structure and nuclear localization signal sequences. Random coils were the major conformation of the secondary structure of the capsid proteins. Analysis of the linear B cell epitopes by employing Bepipred showed that the average antigenic indices (AI) of individual VP proteins were all greater than 0 and the average AI of VP4 was substantially higher than that of VP2 and VP3. The VP proteins all contained a number of potential B cell epitopes and some eiptopes were located at the internal side of the viral capsid or were buried. We successfully predicted the fundamental physicochemical properties, structural/functional features and the linear B cell eiptopes and found that different VP proteins share some common features and each has its unique attributes. These findings will help us understand the pathogenicity of CVA6 and develop related vaccines and immunodiagnostic reagents. PMID:27465341

  9. Comparative metagenomic analysis of human gut microbiome composition using two different bioinformatic pipelines.

    PubMed

    D'Argenio, Valeria; Casaburi, Giorgio; Precone, Vincenza; Salvatore, Francesco

    2014-01-01

    Technological advances in next-generation sequencing-based approaches have greatly impacted the analysis of microbial community composition. In particular, 16S rRNA-based methods have been widely used to analyze the whole set of bacteria present in a target environment. As a consequence, several specific bioinformatic pipelines have been developed to manage these data. MetaGenome Rapid Annotation using Subsystem Technology (MG-RAST) and Quantitative Insights Into Microbial Ecology (QIIME) are two freely available tools for metagenomic analyses that have been used in a wide range of studies. Here, we report the comparative analysis of the same dataset with both QIIME and MG-RAST in order to evaluate their accuracy in taxonomic assignment and in diversity analysis. We found that taxonomic assignment was more accurate with QIIME which, at family level, assigned a significantly higher number of reads. Thus, QIIME generated a more accurate BIOM file, which in turn improved the diversity analysis output. Finally, although informatics skills are needed to install QIIME, it offers a wide range of metrics that are useful for downstream applications and, not less important, it is not dependent on server times. PMID:24719854

  10. Human, vector and parasite Hsp90 proteins: A comparative bioinformatics analysis

    PubMed Central

    Faya, Ngonidzashe; Penkler, David L.; Tastan Bishop, Özlem

    2015-01-01

    The treatment of protozoan parasitic diseases is challenging, and thus identification and analysis of new drug targets is important. Parasites survive within host organisms, and some need intermediate hosts to complete their life cycle. Changing host environment puts stress on parasites, and often adaptation is accompanied by the expression of large amounts of heat shock proteins (Hsps). Among Hsps, Hsp90 proteins play an important role in stress environments. Yet, there has been little computational research on Hsp90 proteins to analyze them comparatively as potential parasitic drug targets. Here, an attempt was made to gain detailed insights into the differences between host, vector and parasitic Hsp90 proteins by large-scale bioinformatics analysis. A total of 104 Hsp90 sequences were divided into three groups based on their cellular localizations; namely cytosolic, mitochondrial and endoplasmic reticulum (ER). Further, the parasitic proteins were divided according to the type of parasite (protozoa, helminth and ectoparasite). Primary sequence analysis, phylogenetic tree calculations, motif analysis and physicochemical properties of Hsp90 proteins suggested that despite the overall structural conservation of these proteins, parasitic Hsp90 proteins have unique features which differentiate them from human ones, thus encouraging the idea that protozoan Hsp90 proteins should be further analyzed as potential drug targets. PMID:26793431

  11. Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics

    PubMed Central

    2012-01-01

    Progress in genomics has raised expectations in many fields, and particularly in personalized cancer research. The new technologies available make it possible to combine information about potential disease markers, altered function and accessible drug targets, which, coupled with pathological and medical information, will help produce more appropriate clinical decisions. The accessibility of such experimental techniques makes it all the more necessary to improve and adapt computational strategies to the new challenges. This review focuses on the critical issues associated with the standard pipeline, which includes: DNA sequencing analysis; analysis of mutations in coding regions; the study of genome rearrangements; extrapolating information on mutations to the functional and signaling level; and predicting the effects of therapies using mouse tumor models. We describe the possibilities, limitations and future challenges of current bioinformatics strategies for each of these issues. Furthermore, we emphasize the need for the collaboration between the bioinformaticians who implement the software and use the data resources, the computational biologists who develop the analytical methods, and the clinicians, the systems' end users and those ultimately responsible for taking medical decisions. Finally, the different steps in cancer genome analysis are illustrated through examples of applications in cancer genome analysis. PMID:22839973

  12. LYN, a Key Gene From Bioinformatics Analysis, Contributes to Development and Progression of Esophageal Adenocarcinoma

    PubMed Central

    Liu, Dabiao

    2015-01-01

    Background Esophageal adenocarcinoma is a lethal malignancy whose incidence is rapidly growing in recent years. Previous reports suggested that Barrett’s esophagus (BE), which is represented by metaplasia-dysplasia-carcinoma transition, is regarded as the premalignant lesion of esophageal neoplasm. However, our knowledge about the development of esophageal adenocarcinoma is still very limited. Material/Methods In order to acquire better understanding about the pathological mechanisms in this field, we obtained gene profiling data on BE, esophageal adenocarcinoma patients, and normal controls from the Gene Expression Omnibus (GEO) database. Bioinformatics analyses, including Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, were conducted. Results Our results revealed that several pathways, such as the wound healing, complement, and coagulation pathways, were closely correlated with cancer development and progression. The mitogen-activated protein kinase (MAPK) pathway was discovered to be responsible for the predisposition stage of cancer; while response to stress, cytokine-cytokine receptor interaction, nod-like receptor signaling pathway, and ECM-receptor interaction were chief contributors of cancer progression. More importantly, we discovered in this study that LYN was a critical gene. It was found to be the key nodule of several significant biological networks, which suggests its close correlation with cancer initiation and progression. Conclusions These results provided more information on the mechanisms of esophageal adenocarcinoma, which enlightened our way to the clinical discovery of novel therapeutic makers for conquering esophageal cancer. Keywords: esophageal adenocarcinoma; LYN; Go analysis; KEGG pathway. PMID:26708841

  13. Structural and Phylogenetic Analysis of Laccases from Trichoderma: A Bioinformatic Approach

    PubMed Central

    Cázares-García, Saila Viridiana; Vázquez-Garcidueñas, Ma. Soledad; Vázquez-Marrufo, Gerardo

    2013-01-01

    The genus Trichoderma includes species of great biotechnological value, both for their mycoparasitic activities and for their ability to produce extracellular hydrolytic enzymes. Although activity of extracellular laccase has previously been reported in Trichoderma spp., the possible number of isoenzymes is still unknown, as are the structural and functional characteristics of both the genes and the putative proteins. In this study, the system of laccases sensu stricto in the Trichoderma species, the genomes of which are publicly available, were analyzed using bioinformatic tools. The intron/exon structure of the genes and the identification of specific motifs in the sequence of amino acids of the proteins generated in silico allow for clear differentiation between extracellular and intracellular enzymes. Phylogenetic analysis suggests that the common ancestor of the genus possessed a functional gene for each one of these enzymes, which is a characteristic preserved in T. atroviride and T. virens. This analysis also reveals that T. harzianum and T. reesei only retained the intracellular activity, whereas T. asperellum added an extracellular isoenzyme acquired through horizontal gene transfer during the mycoparasitic process. The evolutionary analysis shows that in general, extracellular laccases are subjected to purifying selection, and intracellular laccases show neutral evolution. The data provided by the present study will enable the generation of experimental approximations to better understand the physiological role of laccases in the genus Trichoderma and to increase their biotechnological potential. PMID:23383142

  14. Bioinformatics analysis of the serine and glycine pathway in cancer cells

    PubMed Central

    Morello, Maria; Minieri, Marilena; Melino, Gerry; Amelio, Ivano

    2014-01-01

    Serine and glycine are amino acids that provide the essential precursors for the synthesis of proteins, nucleic acids and lipids. Employing 3 subsequent enzymes, phosphoglycerate dehydrogenase (PHGDH), phosphoserine phosphatase (PSPH), phosphoserine aminotransferase 1 (PSAT1), 3-phosphoglycerate from glycolysis can be converted in serine, which in turn can by converted in glycine by serine methyl transferase (SHMT). Besides proving precursors for macromolecules, serine/glycine biosynthesis is also required for the maintenance of cellular redox state. Therefore, this metabolic pathway has a pivotal role in proliferating cells, including cancer cells. In the last few years an emerging literature provides genetic and functional evidences that hyperactivation of serine/glycine biosynthetic pathway drives tumorigenesis. Here, we extend these observations performing a bioinformatics analysis using public cancer datasets. Our analysis highlighted the relevance of PHGDH and SHMT2 expression as prognostic factor for breast cancer, revealing a substantial ability of these enzymes to predict patient survival outcome. However analyzing patient datasets of lung cancer our analysis reveled that some other enzymes of the pathways, rather than PHGDH, might be associated to prognosis. Although these observations require further investigations they might suggest a selective requirement of some enzymes in specific cancer types, recommending more cautions in the development of novel translational opportunities and biomarker identification of human cancers. PMID:25436979

  15. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    PubMed Central

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-01-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm. PMID:26879404

  16. LYN, a Key Gene From Bioinformatics Analysis, Contributes to Development and Progression of Esophageal Adenocarcinoma.

    PubMed

    Liu, Dabiao

    2015-01-01

    BACKGROUND Esophageal adenocarcinoma is a lethal malignancy whose incidence is rapidly growing in recent years. Previous reports suggested that Barrett's esophagus (BE), which is represented by metaplasia-dysplasia-carcinoma transition, is regarded as the premalignant lesion of esophageal neoplasm. However, our knowledge about the development of esophageal adenocarcinoma is still very limited. MATERIAL AND METHODS In order to acquire better understanding about the pathological mechanisms in this field, we obtained gene profiling data on BE, esophageal adenocarcinoma patients, and normal controls from the Gene Expression Omnibus (GEO) database. Bioinformatics analyses, including Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, were conducted. RESULTS Our results revealed that several pathways, such as the wound healing, complement, and coagulation pathways, were closely correlated with cancer development and progression. The mitogen-activated protein kinase (MAPK) pathway was discovered to be responsible for the predisposition stage of cancer; while response to stress, cytokine-cytokine receptor interaction, nod-like receptor signaling pathway, and ECM-receptor interaction were chief contributors of cancer progression. More importantly, we discovered in this study that LYN was a critical gene. It was found to be the key nodule of several significant biological networks, which suggests its close correlation with cancer initiation and progression. CONCLUSIONS These results provided more information on the mechanisms of esophageal adenocarcinoma, which enlightened our way to the clinical discovery of novel therapeutic makers for conquering esophageal cancer. PMID:26708841

  17. Bioinformatics Analysis of Transcriptome Dynamics During Growth in Angus Cattle Longissimus Muscle

    PubMed Central

    Moisá, Sonia J.; Shike, Daniel W.; Graugnard, Daniel E.; Rodriguez-Zas, Sandra L.; Everts, Robin E.; Lewin, Harris A.; Faulkner, Dan B.; Berger, Larry L.; Loor, Juan J.

    2013-01-01

    Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth. PMID:23943656

  18. Bioinformatic identification and expression analysis of Nelumbo nucifera microRNA and their targets1

    PubMed Central

    Pan, Lei; Wang, Xiaolei; Jin, Jing; Yu, Xiaolu; Hu, Jihong

    2015-01-01

    Premise of the study: Sacred lotus (Nelumbo nucifera) is a perennial aquatic herbaceous plant of ecological, ornamental, and economic importance. MicroRNAs (miRNAs) play an important role in plant development. However, reports of miRNAs and their role in sacred lotus have been limited. Methods: Using the homology search of known miRNAs with genome and transcriptome contig sequences, we employed a pipeline to identify miRNAs in N. nucifera. We also predicted the targets of these miRNAs. Results: We found 106 conserved miRNAs in N. nucifera, and 456 of their miRNA targets were annotated. Quantitative real-time PCR (qRT-PCR) analysis revealed the different expression levels of the 10 selected conserved miRNAs in tissues of young leaves, stems, and flowers of N. nucifera. Negative correlation of expression level between five miRNAs and their target genes was also revealed. Discussion: Combining bioinformatics and experiment analysis, we identified the miRNAs in N. nucifera. The results can be used as a workbench for further investigation of the roles of miRNAs in N. nucifera. PMID:26421251

  19. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine.

    PubMed

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-01-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM's diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients' target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ's cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the "multi-component, multi-target and multi-pathway" combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM's molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm. PMID:26879404

  20. BATMAN-TCM: a Bioinformatics Analysis Tool for Molecular mechANism of Traditional Chinese Medicine

    NASA Astrophysics Data System (ADS)

    Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu

    2016-02-01

    Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.

  1. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications.

    PubMed

    Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén

    2016-01-01

    Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure-Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron-Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods. PMID:27529225

  2. Deep Artificial Neural Networks and Neuromorphic Chips for Big Data Analysis: Pharmaceutical and Bioinformatics Applications

    PubMed Central

    Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén

    2016-01-01

    Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure–Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods. PMID:27529225

  3. Bioinformatics Analysis of the Effects of Tobacco Smoke on Gene Expression

    PubMed Central

    Cao, Chunhua; Chen, Jianhua; Lyu, Chengqi; Yu, Jia; Zhao, Wei; Wang, Yi; Zou, Derong

    2015-01-01

    This study was designed to explore the effects of tobacco smoke on gene expression through bioinformatics analyses. Gene expression profile GSE17913 was downloaded from the Gene Expression Omnibus database. The differentially expressed genes (DEGs) in buccal mucosa tissues between 39 active smokers and 40 never smokers were identified. Gene Ontology (GO) and pathway enrichment analyses of DEGs were performed, followed by protein-protein interaction (PPI) network, transcriptional regulatory network as well as miRNA-target regulatory network construction. In total, 88 up-regulated DEGs and 106 down-regulated DEGs were identified. Among these DEGs, cytochrome P450, family 1, subfamily A, polypeptide 1 (CYP1A1) and CYP1B1 were enriched in the Metabolism of xenobiotics by cytochrome P450 pathway. In the PPI network, tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta (YWHAZ), and CYP1A1 were hub genes. In the transcriptional regulatory network, transcription factors of MYC associated factor X (MAX) and upstream transcription factor 1 (USF1) regulated many overlapped DEGs. In addition, protein tyrosine phosphatase, receptor type, D (PTPRD) was regulated by multiple miRNAs in the miRNA-DEG regulatory network. CYP1A1, CYP1B1, YWHAZ and PTPRD, and TF of MAX and USF1 may have the potential to be used as biomarkers and therapeutic targets in tobacco smoke-related pathological changes. PMID:26629988

  4. Bioinformatics in the information age

    SciTech Connect

    Spengler, Sylvia J.

    2000-02-01

    There is a well-known story about the blind man examining the elephant: the part of the elephant examined determines his perception of the whole beast. Perhaps bioinformatics--the shotgun marriage between biology and mathematics, computer science, and engineering--is like an elephant that occupies a large chair in the scientific living room. Given the demand for and shortage of researchers with the computer skills to handle large volumes of biological data, where exactly does the bioinformatics elephant sit? There are probably many biologists who feel that a major product of this bioinformatics elephant is large piles of waste material. If you have tried to plow through Web sites and software packages in search of a specific tool for analyzing and collating large amounts of research data, you may well feel the same way. But there has been progress with major initiatives to develop more computing power, educate biologists about computers, increase funding, and set standards. For our purposes, bioinformatics is not simply a biologically inclined rehash of information theory (1) nor is it a hodgepodge of computer science techniques for building, updating, and accessing biological data. Rather bioinformatics incorporates both of these capabilities into a broad interdisciplinary science that involves both conceptual and practical tools for the understanding, generation, processing, and propagation of biological information. As such, bioinformatics is the sine qua non of 21st-century biology. Analyzing gene expression using cDNA microarrays immobilized on slides or other solid supports (gene chips) is set to revolutionize biology and medicine and, in so doing, generate vast quantities of data that have to be accurately interpreted (Fig. 1). As discussed at a meeting a few months ago (Microarray Algorithms and Statistical Analysis: Methods and Standards; Tahoe City, California; 9-12 November 1999), experiments with cDNA arrays must be subjected to quality control

  5. Cloning, expression and bioinformatics analysis of ATP sulfurylase from Acidithiobacillus ferrooxidans ATCC 23270 in Escherichia coli

    PubMed Central

    Jaramillo, Michael L; Abanto, Michel; Quispe, Ruth L; Calderón, Julio; del Valle, Luís J; Talledo, Miguel; Ramírez, Pablo

    2012-01-01

    Molecular studies of enzymes involved in sulfite oxidation in Acidithiobacillus ferrooxidans have not yet been developed, especially in the ATP sulfurylase (ATPS) of these acidophilus tiobacilli that have importance in biomining. This enzyme synthesizes ATP and sulfate from adenosine phosphosulfate (APS) and pyrophosphate (PPi), final stage of the sulfite oxidation by these organisms in order to obtain energy. The atpS gene (1674 bp) encoding the ATPS from Acidithiobacillus ferrooxidans ATCC 23270 was amplified using PCR, cloned in the pET101-TOPO plasmid, sequenced and expressed in Escherichia coli obtaining a 63.5 kDa ATPS recombinant protein according to SDS-PAGE analysis. The bioinformatics and phylogenetic analyses determined that the ATPS from A. ferrooxidans presents ATP sulfurylase (ATS) and APS kinase (ASK) domains similar to ATPS of Aquifex aeolicus, probably of a more ancestral origin. Enzyme activity towards ATP formation was determined by quantification of ATP formed from E. coli cell extracts, using a bioluminescence assay based on light emission by the luciferase enzyme. Our results demonstrate that the recombinant ATP sulfurylase from A. ferrooxidans presents an enzymatic activity for the formation of ATP and sulfate, and possibly is a bifunctional enzyme due to its high homology to the ASK domain from A. aeolicus and true kinases. PMID:23055613

  6. Java bioinformatics analysis web services for multiple sequence alignment—JABAWS:MSA

    PubMed Central

    Troshin, Peter V.; Procter, James B.; Barton, Geoffrey J.

    2011-01-01

    Summary: JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts. Availability and Implementation: JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws. Contact: g.j.barton@dundee.ac.uk PMID:21593132

  7. Cloning, expression and bioinformatics analysis of ATP sulfurylase from Acidithiobacillus ferrooxidans ATCC 23270 in Escherichia coli.

    PubMed

    Jaramillo, Michael L; Abanto, Michel; Quispe, Ruth L; Calderón, Julio; Del Valle, Luís J; Talledo, Miguel; Ramírez, Pablo

    2012-01-01

    Molecular studies of enzymes involved in sulfite oxidation in Acidithiobacillus ferrooxidans have not yet been developed, especially in the ATP sulfurylase (ATPS) of these acidophilus tiobacilli that have importance in biomining. This enzyme synthesizes ATP and sulfate from adenosine phosphosulfate (APS) and pyrophosphate (PPi), final stage of the sulfite oxidation by these organisms in order to obtain energy. The atpS gene (1674 bp) encoding the ATPS from Acidithiobacillus ferrooxidans ATCC 23270 was amplified using PCR, cloned in the pET101-TOPO plasmid, sequenced and expressed in Escherichia coli obtaining a 63.5 kDa ATPS recombinant protein according to SDS-PAGE analysis. The bioinformatics and phylogenetic analyses determined that the ATPS from A. ferrooxidans presents ATP sulfurylase (ATS) and APS kinase (ASK) domains similar to ATPS of Aquifex aeolicus, probably of a more ancestral origin. Enzyme activity towards ATP formation was determined by quantification of ATP formed from E. coli cell extracts, using a bioluminescence assay based on light emission by the luciferase enzyme. Our results demonstrate that the recombinant ATP sulfurylase from A. ferrooxidans presents an enzymatic activity for the formation of ATP and sulfate, and possibly is a bifunctional enzyme due to its high homology to the ASK domain from A. aeolicus and true kinases. PMID:23055613

  8. Basics of Genome Sequence Analysis in Bioinformatics -- its Fundamental Ideas and Problems

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2009-02-01

    The genome sequences are one of the most fundamental data among various omics analyses. So far, basic bioinformatics tools have developing to treat genome sequences. First step of genome sequence analysis is to predict or assign "genes" on genome sequences. In the case of Eukaryotes, we can identify genes by use of full length cDNA sequences with local alignment tools such as search, blast and fasta, etc. However, it is difficult to catch mRNAs (transcripts) in Prokaryotes. Therefore, computational prediction for gene identification is first choice to start genome sequence analysis. In this review, we pick up methods for computational gene prediction first. Once genes are predicted, next step is to functions for proteins or RNAs encoded on a gene. Then, how we can define the distance between gene sequences is very important for the further analysis. So, we describe the basics of mathematical concept for gene comparison. And we also introduce our novel concept for biological sequence comparisons for the view point of informational theory. In the post genome era, many researchers are very interested in not only gene functions but also the gene regulations whose information is also on genome sequences. Cis-regulatory elements, however, is too short to find some mathematical rules. Therefore, computationally predicted cis-elements tend to include many false-positives. To reduce the ratio false-positives, we need reliable database of set of cis-regulatory elements called cis-regulatory modules for a gene. So, we are trying to develop the Cis-Regulatory Elements Module Reference Database. In the third section, we introduce you the procedure to construct the Cis-Regulatory Elements Module Reference Database and its user interfaces.

  9. The 26S proteasome in Schistosoma mansoni: bioinformatics analysis, developmental expression, and RNA interference (RNAi) studies.

    PubMed

    Nabhan, Joseph F; El-Shehabi, Fouad; Patocka, Nicholas; Ribeiro, Paula

    2007-11-01

    The 26S proteasome is a proteolytic complex responsible for the degradation of the vast majority of eukaryotic proteins. Regulated proteolysis by the proteasome is thought to influence cell cycle progression, transcriptional control, and other critical cellular processes. Here, we used a bioinformatics approach to identify the proteasomal constituents of the parasitic trematode Schistosoma mansoni. A detailed search of the S. mansoni genome database identified a total of 31 putative proteasomal subunits, including 17 subunits of the regulatory (19S) complex and 14 predicted catalytic (20S) subunits. A quantitative real-time RT-PCR analysis of subunit expression levels revealed that the S. mansoni proteasome components are differentially expressed among cercaria, schistosomula, and adult worms. In particular, the data suggest that the proteasome may be downregulated during the early stages of schistosomula development and is subsequently upregulated as the parasite matures to the adult stage. To test for biological relevance, we developed a transfection-based RNA interference method to knockdown the expression of the proteasome subunit, SmRPN11/POH1. Transfection of in vitro transformed S. mansoni schistosomula with specific short-interfering RNAs (siRNAs) diminished SmRPN11/POH1 expression nearly 80%, as determined by quantitative RT-PCR analysis, and also decreased parasite viability 78%, whereas no significant effect could be seen after treatment with the same amount of an irrelevant siRNA. These results indicate that the subunit SmRPN11/POH1 is an essential gene in schistosomes and further suggest an important role for the proteasome in parasite development and survival. PMID:17892869

  10. Comprehensive bioinformatics analysis of cell-free protein synthesis: identification of multiple protein properties that correlate with successful expression.

    PubMed

    Kurotani, Atsushi; Takagi, Tetsuo; Toyama, Mitsutoshi; Shirouzu, Mikako; Yokoyama, Shigeyuki; Fukami, Yasuo; Tokmakov, Alexander A

    2010-04-01

    High-throughput cell-free protein synthesis is being used increasingly in structural/functional genomics projects. However, the factors determining expression success are poorly understood. Here, we evaluated the expression of 3066 human proteins and their domains in a bacterial cell-free system and analyzed the correlation of protein expression with 39 physicochemical and structural properties of proteins. As a result of the bioinformatics analysis performed, we determined the 18 most influential features that affect protein amenability to cell-free expression. They include protein length; hydrophobicity; pI; content of charged, nonpolar, and aromatic residues;, cysteine content; solvent accessibility; presence of coiled coil; content of intrinsically disordered and structured (alpha-helix and beta-sheet) sequence; number of disulfide bonds and functional domains; presence of transmembrane regions; PEST motifs; and signaling sequences. This study represents the first comprehensive bioinformatics analysis of heterologous protein synthesis in a cell-free system. The rules and correlations revealed here provide a plethora of important insights into rationalization of cell-free protein production and can be of practical use for protein engineering with the aim of increasing expression success.-Kurotani, A., Takagi, T., Toyama, M., Shirouzu, M., Yokoyama, S., Fukami, Y., Tokmakov, A. A. Comprehensive bioinformatics analysis of cell-free protein synthesis: identification of multiple protein properties that correlate with successful expression. PMID:19940260

  11. Survey of MapReduce frame operation in bioinformatics.

    PubMed

    Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

    2014-07-01

    Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. PMID:23396756

  12. [Phylogenetic and Bioinformatics Analysis of Replicase Gene Sequence of Cucumber Green Mottle Mosaic Virus].

    PubMed

    Liang, Chaoqiong; Meng, Yan; Luo, Laixin; Liu, Pengfei; Li, Jianqiang

    2015-11-01

    kD proteins of tested CGMMV isolates. The current results that there was no significant difference between the replicase gene sequences, it was stable and conservative for intra-species and clearly difference for inter-species. CGMMV-No. 1, CGMMV-No. 3, CGMMV-No. 4 and CGMMV-No. 5 had. a close genetic relationship with Shandong and Liangning isolates (Accession No. KJ754195 and EF611826), they are potentially originate from the same source. CGMMV-No. 2 was closer with Korea isolate. High sequence similarity of tested samples were gathered for a class in phylogenetic tree. It didn't show regularity of the bioinformatics analysis results of 129 kD and 57 kD proteins of tested CGMMV isolates. There was no corresponding relationship among the molecular phylogeny and the bioinformatics analysis of the tested CGMMV isolates. PMID:26951006

  13. Functional analysis of the mRNA profile of neutrophil gelatinase-associated lipocalin overexpression in esophageal squamous cell carcinoma using multiple bioinformatic tools

    PubMed Central

    WU, BING-LI; LI, CHUN-QUAN; DU, ZE-PENG; ZHOU, FEI; XIE, JIAN-JUN; LUO, LIE-WEI; WU, JIAN-YI; ZHANG, PI-XIAN; XU, LI-YAN; LI, EN-MIN

    2014-01-01

    Neutrophil gelatinase-associated lipocalin (NGAL) is a member of the lipocalin superfamily; dysregulated expression of NGAL has been observed in several benign and malignant diseases. In the present study, differentially expressed genes, in comparison with those of control cells, in the mRNA expression profile of EC109 esophageal squamous cell carcinoma (ESCC) cells following NGAL overexpression were analyzed by multiple bioinformatic tools for a comprehensive understanding. A total of 29 gene ontology (GO) terms associated with immune function, chromatin structure and gene transcription were identified among the differentially expressed genes (DEGs) in NGAL overexpressing cells. In addition to the detected GO categories, the results from the functional annotation chart revealed that the differentially expressed genes were also associated with 101 functional annotation category terms. A total of 59 subpathways associated locally with the differentially expressed genes were identified by subpathway analysis, a markedly greater total that detected by traditional pathway enrichment analysis only. Promoter analysis indicated that the potential transcription factors Snail, deltaEF1, Mycn, Arnt, MNB1A, PBF, E74A, Ubx, SPI1 and GATA2 were unique to the downregulated DEG promoters, while bZIP910, ZNF42 and SOX9 were unique for the upregulated DEG promoters. In conclusion, the understanding of the role of NGAL overexpression in ESCC has been improved through the present bioinformatic analysis. PMID:25109818

  14. Analysis of Metagenomics Next Generation Sequence Data for Fungal ITS Barcoding: Do You Need Advance Bioinformatics Experience?

    PubMed Central

    Ahmed, Abdalla

    2016-01-01

    During the last few decades, most of microbiology laboratories have become familiar in analyzing Sanger sequence data for ITS barcoding. However, with the availability of next-generation sequencing platforms in many centers, it has become important for medical mycologists to know how to make sense of the massive sequence data generated by these new sequencing technologies. In many reference laboratories, the analysis of such data is not a big deal, since suitable IT infrastructure and well-trained bioinformatics scientists are always available. However, in small research laboratories and clinical microbiology laboratories the availability of such resources are always lacking. In this report, simple and user-friendly bioinformatics work-flow is suggested for fast and reproducible ITS barcoding of fungi. PMID:27507959

  15. Bioinformatic Analysis of Pathogenic Missense Mutations of Activin Receptor Like Kinase 1 Ectodomain

    PubMed Central

    Scotti, Claudia; Olivieri, Carla; Boeri, Laura; Canzonieri, Cecilia; Ornati, Federica; Buscarini, Elisabetta; Pagella, Fabio; Danesino, Cesare

    2011-01-01

    Activin A receptor, type II-like kinase 1 (also called ALK1), is a serine-threonine kinase predominantly expressed on endothelial cells surface. Mutations in its ACVRL1 encoding gene (12q11-14) cause type 2 Hereditary Haemorrhagic Telangiectasia (HHT2), an autosomal dominant multisystem vascular dysplasia. The study of the structural effects of mutations is crucial to understand their pathogenic mechanism. However, while an X-ray structure of ALK1 intracellular domain has recently become available (PDB ID: 3MY0), structure determination of ALK1 ectodomain (ALK1EC) has been elusive so far. We here describe the building of a homology model for ALK1EC, followed by an extensive bioinformatic analysis, based on a set of 38 methods, of the effect of missense mutations at the sequence and structural level. ALK1EC potential interaction mode with its ligand BMP9 was then predicted combining modelling and docking data. The calculated model of the ALK1EC allowed mapping and a preliminary characterization of HHT2 associated mutations. Major structural changes and loss of stability of the protein were predicted for several mutations, while others were found to interfere mainly with binding to BMP9 or other interactors, like Endoglin (CD105), whose encoding ENG gene (9q34) mutations are known to cause type 1 HHT. This study gives a preliminary insight into the potential structure of ALK1EC and into the structural effects of HHT2 associated mutations, which can be useful to predict the potential effect of each single mutation, to devise new biological experiments and to interpret the biological significance of new mutations, private mutations, or non-synonymous polymorphisms. PMID:22028876

  16. Bioinformatics analysis and expression study of fumarate hydratase in lung cancer

    PubMed Central

    Ming, Zongjuan; Jiang, Meihua; Li, Wei; Fan, Na; Deng, Wenjing; Zhong, Yujie; Zhang, Yuping; Zhang, Qiuhong; Yang, Shuanying

    2014-01-01

    Background As its etiology and pathogenesis is obscure, illustrating the molecular mechanism of lung cancer has become a serious and urgent task. Studies have shown that fumarate hydratase (FH) is a tumor suppressor related to tumorigenesis, development, and invasion. Our aim was to analyze the biological information of FH, and detect the messenger ribonucleic acid (mRNA) and protein expression of FH in lung cancer cells to explore its role in tumorigenesis and in the development of lung cancer. Method We analyzed the biological characteristics of FH, then utilized reverse transcription-polymerase chain reaction (RT-PCR) to study FH mRNA expression in A549 and 16 human bronchial epithelial (HBE) cell lines. The protein expression of FH was detected in 57 cases of human lung cancer tissues and 19 cases of normal lung tissues by immunohistochemistry. Results 1. Bioinformatic analysis: FH mainly exist in the mitochondria; the common structural elements of FH are mainly α-helix, random coil, β-turn, and extended strand; there are five possible transmembrane domains in the entire polypeptide chain; FH is a hydrophilic and soluble protein. 2. RT-PCR result: FH mRNA expression was downregulated in A549 cells compared with 16HBE cells. 3. Immunohistochemistry: FH protein expression was significantly lower in lung cancer cells than in normal lung tissues (P < 0.05), but was not correlated with the patients' age, gender, tumor size, pathological type, or lymph node, distant, or tumor node metastasis stage. Conclusion FH was under-expressed in lung cancer, suggesting that it may be an indicator of tumorigenesis and could be a potential target for therapies against lung cancer in the future. PMID:26767050

  17. Identification of MicroRNAs from Eugenia uniflora by High-Throughput Sequencing and Bioinformatics Analysis

    PubMed Central

    Guzman, Frank; Almerão, Mauricio P.; Körbes, Ana P.; Loss-Morais, Guilherme; Margis, Rogerio

    2012-01-01

    Background microRNAs or miRNAs are small non-coding regulatory RNAs that play important functions in the regulation of gene expression at the post-transcriptional level by targeting mRNAs for degradation or inhibiting protein translation. Eugenia uniflora is a plant native to tropical America with pharmacological and ecological importance, and there have been no previous studies concerning its gene expression and regulation. To date, no miRNAs have been reported in Myrtaceae species. Results Small RNA and RNA-seq libraries were constructed to identify miRNAs and pre-miRNAs in Eugenia uniflora. Solexa technology was used to perform high throughput sequencing of the library, and the data obtained were analyzed using bioinformatics tools. From 14,489,131 small RNA clean reads, we obtained 1,852,722 mature miRNA sequences representing 45 conserved families that have been identified in other plant species. Further analysis using contigs assembled from RNA-seq allowed the prediction of secondary structures of 25 known and 17 novel pre-miRNAs. The expression of twenty-seven identified miRNAs was also validated using RT-PCR assays. Potential targets were predicted for the most abundant mature miRNAs in the identified pre-miRNAs based on sequence homology. Conclusions This study is the first large scale identification of miRNAs and their potential targets from a species of the Myrtaceae family without genomic sequence resources. Our study provides more information about the evolutionary conservation of the regulatory network of miRNAs in plants and highlights species-specific miRNAs. PMID:23166775

  18. A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research

    ERIC Educational Resources Information Center

    Campbell, Chad E.; Nehm, Ross H.

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…

  19. Bioinformatics analysis identifies several intrinsically disordered human E3 ubiquitin-protein ligases

    PubMed Central

    Nielsen, Sofie V.; Lindorff-Larsen, Kresten; Hartmann-Petersen, Rasmus

    2016-01-01

    The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is

  20. Advantages and disadvantages in usage of bioinformatic programs in promoter region analysis

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2015-09-01

    An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.

  1. Towards understanding the lifespan extension by reduced insulin signaling: bioinformatics analysis of DAF-16/FOXO direct targets in Caenorhabditis elegans

    PubMed Central

    Li, Yan-Hui; Zhang, Gai-Gai

    2016-01-01

    DAF-16, the C. elegans FOXO transcription factor, is an important determinant in aging and longevity. In this work, we manually curated FOXODB http://lyh.pkmu.cn/foxodb/, a database of FOXO direct targets. It now covers 208 genes. Bioinformatics analysis on 109 DAF-16 direct targets in C. elegans found interesting results. (i) DAF-16 and transcription factor PQM-1 co-regulate some targets. (ii) Seventeen targets directly regulate lifespan. (iii) Four targets are involved in lifespan extension induced by dietary restriction. And (iv) DAF-16 direct targets might play global roles in lifespan regulation. PMID:27027346

  2. A critical analysis of assessment quality in genomics and bioinformatics education research.

    PubMed

    Campbell, Chad E; Nehm, Ross H

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are necessary tools for answering this question, their outputs are dependent on their quality. Our study 1) reviews the central importance of reliability and construct validity evidence in the development and evaluation of science assessments and 2) examines the extent to which published assessments in genomics and bioinformatics education (GBE) have been developed using such evidence. We identified 95 GBE articles (out of 226) that contained claims of knowledge increases, affective changes, or skill acquisition. We found that 1) the purpose of most of these studies was to assess summative learning gains associated with curricular change at the undergraduate level, and 2) a minority (<10%) of studies provided any reliability or validity evidence, and only one study out of the 95 sampled mentioned both validity and reliability. Our findings raise concerns about the quality of evidence derived from these instruments. We end with recommendations for improving assessment quality in GBE. PMID:24006400

  3. Dysregulation of TFDP1 and of the cell cycle pathway in high-grade glioblastoma multiforme: a bioinformatic analysis.

    PubMed

    Lu, X; Lv, X D; Ren, Y H; Yang, W D; Li, Z B; Zhang, L; Bai, X F

    2016-01-01

    Despite extensive research, the prognosis of high-grade glioblastoma multiforme (GBM) has improved only slightly because of the limited response to standard treatments. Recent advances (discoveries of molecular biomarkers) provide new opportunities for the treatment of GBM. The aim of the present study was to identify diagnostic biomarkers of high-grade GBM. First, we combined 3 microarray expression datasets to screen them for genes differentially expressed in patients with high-grade GBM relative to healthy subjects. Next, the target network was constructed via the empirical Bayesian coexpression approach, and centrality analysis and a molecular complex detection (MCODE) algorithm were performed to explore hub genes and functional modules. Finally, a validation test was conducted to verify the bioinformatic results. A total of 277 differentially expressed genes were identified according to the criteria P < 0.05 and |log2(fold change)| ≥ 1.5. These genes were most significantly enriched in the cell cycle pathway. Centrality analysis uncovered 9 hub genes; among them, TFDP1 showed the highest degree of connectivity (43) and is a known participant in the cell cycle pathway; this finding pointed to the important role of TFDP1 in the progression of high-grade GBM. Experimental validation mostly supported the bioinformatic results. According to our study results, the gene TFDP1 and the cell cycle pathway are strongly associated with high-grade GBM; this result may provide new insights into the pathogenesis of GBM. PMID:27323154

  4. GProX, a user-friendly platform for bioinformatics analysis and visualization of quantitative proteomics data.

    PubMed

    Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy

    2011-08-01

    Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. PMID:21602510

  5. In the Spotlight: Bioinformatics

    PubMed Central

    Wang, May Dongmei

    2016-01-01

    During 2012, next generation sequencing (NGS) has attracted great attention in the biomedical research community, especially for personalized medicine. Also, third generation sequencing has become available. Therefore, state-of-art sequencing technology and analysis are reviewed in this Bioinformatics spotlight on 2012. Next-generation sequencing (NGS) is high-throughput nucleic acid sequencing technology with wide dynamic range and single base resolution. The full promise of NGS depends on the optimization of NGS platforms, sequence alignment and assembly algorithms, data analytics, novel algorithms for integrating NGS data with existing genomic, proteomic, or metabolomic data, and quantitative assessment of NGS technology in comparing to more established technologies such as microarrays. NGS technology has been predicated to become a cornerstone of personalized medicine. It is argued that NGS is a promising field for motivated young researchers who are looking for opportunities in bioinformatics. PMID:23192635

  6. Bioinformatics and functional magnetic resonance imaging in clinical populations: practical aspects of data collection, analysis, interpretation, and management.

    PubMed

    Vincent, Diana J; Hurd, Mark W

    2005-10-15

    In this paper the authors review the issues associated with bioinformatics and functional magnetic resonance (fMR) imaging in the context of neurosurgery. They discuss the practical aspects of data collection, analysis, interpretation, and the management of large data sets, and they consider the challenges involved in the adoption of fMR imaging into clinical neurosurgical practice. Their goal is to provide neurosurgeons and other clinicians with a better understanding of some of the current issues associated with bioinformatics or neuroinformatics and fMR imaging. Thousands to tens of thousands of images are typically acquired during an fMR imaging session. It is essential to follow an activation task paradigm exactly to obtain an accurate representation of cortical activation. These images are then interactively postprocessed offline to produce an activation map, or in some cases a series of maps. The maps may then be viewed and interpreted in consultation with a neurosurgeon and/or other clinicians. After this consultation, long-term archiving of the processed fMR activation maps along with the standard structural MR images is a complex but necessary final step in this process. The fMR modality represents a valuable tool in the neurosurgical planning process that is still in the developmental stages for routine clinical use, but holds exceptional promise for patient care. PMID:16241106

  7. Bioinformatics Knowledge Map for Analysis of Beta-Catenin Function in Cancer

    PubMed Central

    Arighi, Cecilia N.; Wu, Cathy H.

    2015-01-01

    Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge “maps” of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease. PMID:26509276

  8. Phylogenetic trees in bioinformatics

    SciTech Connect

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  9. Functional and bioinformatics analysis of an exopolysaccharide-related gene (epsN) from Lactobacillus kefiranofaciens ZW3.

    PubMed

    Wang, Jingrui; Tang, Wei; Zheng, Yongna; Xing, Zhuqing; Wang, Yanping

    2016-09-01

    A novel lactic acid bacteria strain Lactobacillus kefiranofaciens ZW3 exhibited the characteristics of high production of exopolysaccharide (EPS). The epsN gene, located in the eps gene cluster of this strain, is associated with EPS biosynthesis. Bioinformatics analysis of this gene was performed. The conserved domain analysis showed that the EpsN protein contained MATE-Wzx-like domains. Then the epsN gene was amplified to construct the recombinant expression vector pMG36e-epsN. The results showed that the EPS yields of the recombinants were significantly improved. By determining the yields of EPS and intracellular polysaccharide, it was considered that epsN gene could play its Wzx flippase role in the EPS biosynthesis. This is the first time to prove the effect of EpsN on L. kefiranofaciens EPS biosynthesis and further prove its functional property. PMID:27084765

  10. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  11. Analysis of RNAseq datasets from a comparative infectious disease zebrafish model using GeneTiles bioinformatics.

    PubMed

    Veneman, Wouter J; de Sonneville, Jan; van der Kolk, Kees-Jan; Ordas, Anita; Al-Ars, Zaid; Meijer, Annemarie H; Spaink, Herman P

    2015-03-01

    We present a RNA deep sequencing (RNAseq) analysis of a comparison of the transcriptome responses to infection of zebrafish larvae with Staphylococcus epidermidis and Mycobacterium marinum bacteria. We show how our developed GeneTiles software can improve RNAseq analysis approaches by more confidently identifying a large set of markers upon infection with these bacteria. For analysis of RNAseq data currently, software programs such as Bowtie2 and Samtools are indispensable. However, these programs that are designed for a LINUX environment require some dedicated programming skills and have no options for visualisation of the resulting mapped sequence reads. Especially with large data sets, this makes the analysis time consuming and difficult for non-expert users. We have applied the GeneTiles software to the analysis of previously published and newly obtained RNAseq datasets of our zebrafish infection model, and we have shown the applicability of this approach also to published RNAseq datasets of other organisms by comparing our data with a published mammalian infection study. In addition, we have implemented the DEXSeq module in the GeneTiles software to identify genes, such as glucagon A, that are differentially spliced under infection conditions. In the analysis of our RNAseq data, this has led to the possibility to improve the size of data sets that could be efficiently compared without using problem-dedicated programs, leading to a quick identification of marker sets. Therefore, this approach will also be highly useful for transcriptome analyses of other organisms for which well-characterised genomes are available. PMID:25503064

  12. Molecular characterization and bioinformatics analysis of Ncoa7B, a novel ovulation-associated and reproduction system-specific Ncoa7 isoform.

    PubMed

    Shkolnik, Ketty; Ben-Dor, Shifra; Galiani, Dalia; Hourvitz, Ariel; Dekel, Nava

    2008-03-01

    In the present work, we employed bioinformatics search tools to select ovulation-associated cDNA clones with a preference for those representing putative novel genes. Detailed characterization of one of these transcripts, 6C3, by real-time PCR and RACE analyses led to identification of a novel ovulation-associated gene, designated Ncoa7B. This gene was found to exhibit a significant homology to the Ncoa7 gene that encodes a conserved tissue-specific nuclear receptor coactivator. Unlike Ncoa7, Ncoa7B possesses a unique and highly conserved exon at the 5' end and encodes a protein with a unique N-terminal sequence. Extensive bioinformatics analysis has revealed that Ncoa7B has one identifiable domain, TLDc, which has recently been suggested to be involved in protection from oxidative DNA damage. An alignment of TLDc domain containing proteins was performed, and the closest relative identified was OXR1, which also has a corresponding, highly related short isoform, with just a TLDc domain. Moreover, Ncoa7B expression, as seen to date, seems to be restricted to mammals, while other TLDc family members have no such restriction. Multiple tissue analysis revealed that unlike Ncoa7, which was abundant in a variety of tissues with the highest expression in the brain, Ncoa7B mRNA expression is restricted to the reproductive system organs, particularly the uterus and the ovary. The ovarian expression of Ncoa7B was stimulated by human chorionic gonadotropin. Additionally, using real-time PCR, we demonstrated the involvement of multiple signaling pathways for Ncoa7B expression on preovulatory follicles. PMID:18299425

  13. Bioinformatics analysis of microRNA and putative target genes in bovine mammary tissue infected with Streptococcus uberis.

    PubMed

    Naeem, A; Zhong, K; Moisá, S J; Drackley, J K; Moyes, K M; Loor, J J

    2012-11-01

    MicroRNA (miRNA) are small single-stranded noncoding RNA with important roles in regulating innate immunity in nonruminants via transcriptional and posttranscriptional mechanisms. Mastitis causes significant losses in the dairy industry and a wealth of large-scale mRNA expression data from mammary tissue have provided fundamental insights into the tissue adaptations to pathogens. We studied the expression of 14 miRNA (miR-10a, -15b, -16a, -17, -21, -31, -145, -146a, -146b, -155, -181a, -205, -221, and -223) associated with regulation of innate immunity and mammary epithelial cell function in tissue challenged with Streptococcus uberis. Those data, along with microarray expression of 2,102 differentially expressed genes, were used for bioinformatics analysis to uncover putative target genes and the most affected biological pathways and functions. Three miRNA (181a, 16, and 31) were downregulated approximately 3- to 5-fold and miR-223 was upregulated approximately 2.5-fold in infected versus healthy tissue. Among differentially expressed genes due to infection, bioinformatics analysis revealed that the studied miRNA share in the regulation of a large number of metabolic (SCD, CD36, GPAM, and FASN), immune/oxidative stress (TNF, IL6, IL10, SOD2, LYZ, and TLR4), and cellular proliferation/differentiation (FOS and CASP4) target genes. This level of complex regulation was underscored by the coordinate effect revealed by bioinformatics on various cellular pathways within the Kyoto Encyclopedia of Genes and Genomes database. Most pathways associated with "cellular processes," "organismal systems," and "diseases" were activated by putative target genes of miR-31 and miR-16a, with an overlapping activation of "immune system" and "signal transduction." A pronounced effect and activation of miR-31 target genes was observed within "folding, sorting, and degradation," "cell growth and death," and "cell communication" pathways, whereas a marked inhibition of "lipid metabolism

  14. Emerging bioinformatics approaches for analysis of NGS-derived coding and non-coding RNAs in neurodegenerative diseases

    PubMed Central

    Guffanti, Alessandro; Simchovitz, Alon; Soreq, Hermona

    2014-01-01

    Neurodegenerative diseases in general and specifically late-onset Alzheimer’s disease (LOAD) involve a genetically complex and largely obscure ensemble of causative and risk factors accompanied by complex feedback responses. The advent of “high-throughput” transcriptome investigation technologies such as microarray and deep sequencing is increasingly being combined with sophisticated statistical and bioinformatics analysis methods complemented by knowledge-based approaches such as Bayesian Networks or network and graph analyses. Together, such “integrative” studies are beginning to identify co-regulated gene networks linked with biological pathways and potentially modulating disease predisposition, outcome, and progression. Specifically, bioinformatics analyses of integrated microarray and genotyping data in cases and controls reveal changes in gene expression of both protein-coding and small and long regulatory RNAs; highlight relevant quantitative transcriptional differences between LOAD and non-demented control brains and demonstrate reconfiguration of functionally meaningful molecular interaction structures in LOAD. These may be measured as changes in connectivity in “hub nodes” of relevant gene networks (Zhang etal., 2013). We illustrate here the open analytical questions in the transcriptome investigation of neurodegenerative disease studies, proposing “ad hoc” strategies for the evaluation of differential gene expression and hints for a simple analysis of the non-coding RNA (ncRNA) part of such datasets. We then survey the emerging role of long ncRNAs (lncRNAs) in the healthy and diseased brain transcriptome and describe the main current methods for computational modeling of gene networks. We propose accessible modular and pathway-oriented methods and guidelines for bioinformatics investigations of whole transcriptome next generation sequencing datasets. We finally present methods and databases for functional interpretations of lncRNAs and

  15. Putative lipoproteins identified by bioinformatic genome analysis of Leifsonia xyli ssp. xyli, the causative agent of sugarcane ratoon stunting disease.

    PubMed

    Sutcliffe, Iain C; Hutchings, Matthew I

    2007-01-01

    SUMMARY Leifsonia xyli ssp. xyli is the causative agent of ratoon stunting disease, a major cause of economic loss in sugarcane crops. Understanding of the biology of this pathogen has been hampered by its fastidious growth characteristics in vitro. However, the recent release of a genome sequence for this organism has allowed significant novel insights. Further to this, we have performed a bioinformatic analysis of the lipoproteins encoded in the L. xyli genome. These analyses suggest that lipoproteins represent c. 2.0% of the L. xyli predicted proteome. Functional analyses suggest that lipoproteins make an important contribution to the physiology of the pathogen and may influence its ability to cause disease in planta. PMID:20507484

  16. Bioinformatics analysis and expression of a novel protein ROP48 in Toxoplasma gondii.

    PubMed

    Zhou, Jian; Wang, Lin; Zhou, Aihua; Lu, Gang; Li, Qihang; Wang, Zhilin; Zhu, Meiyan; Zhou, Huaiyu; Cong, Hua; He, Shenyi

    2016-06-01

    Toxoplasma gondii is an obligate intracellular apicomplexan parasite, and can infect warmblooded animals and humans all over the world. In the past years, ROP family genes encoding particular proteins of T. gondii had made a great contribution to toxoplasmosis. In this study, we used multiple bioinformatics approaches to predict the physical and chemical characteristics, transmembrane domain, epitope, and topological structure of the rhoptry protein 48 (ROP48). The results indicated that ROP48 protein was mainly located in the membrane and had several positive linear-B cell epitopes and Th-cell epitopes, which suggested that ROP48 is a potential DNA vaccine candidate against toxoplasmosis. Then the PCR product amplified from the ROP48 cDNA was inserted into a pEASY-T1 vector to build a recombinant cloning plasmid. After sequencing, ROP48 was subcloned into a eukaryotic expression plasmid pEGFP-C1 to obtain pEGFP-C1-ROP48 (pROP48). After identification by PCR and restriction enzyme digestion, the recombinant plasmid pROP48 was transfected into HEK 293-T cell and identified by RT-PCR. The results showed that the eukaryotic expression plasmid pROP48 was constructed and transfected to the cells of HEK 293-T successfully. Western blotting showed that the expressed proteins can be recognized by anti-STAg mouse sera. PMID:27078655

  17. Entropy-Based Analysis and Bioinformatics-Inspired Integration of Global Economic Information Transfer

    PubMed Central

    An, Sungbae; Kwon, Young-Kyun; Yoon, Sungroh

    2013-01-01

    The assessment of information transfer in the global economic network helps to understand the current environment and the outlook of an economy. Most approaches on global networks extract information transfer based mainly on a single variable. This paper establishes an entirely new bioinformatics-inspired approach to integrating information transfer derived from multiple variables and develops an international economic network accordingly. In the proposed methodology, we first construct the transfer entropies (TEs) between various intra- and inter-country pairs of economic time series variables, test their significances, and then use a weighted sum approach to aggregate information captured in each TE. Through a simulation study, the new method is shown to deliver better information integration compared to existing integration methods in that it can be applied even when intra-country variables are correlated. Empirical investigation with the real world data reveals that Western countries are more influential in the global economic network and that Japan has become less influential following the Asian currency crisis. PMID:23300959

  18. Combined expressional analysis, bioinformatics and targeted proteomics identify new potential therapeutic targets in glioblastoma stem cells

    PubMed Central

    Stangeland, Biljana; Mughal, Awais A.; Grieg, Zanina; Sandberg, Cecilie Jonsgar; Joel, Mrinal; Nygård, Ståle; Meling, Torstein; Murrell, Wayne; Vik Mo, Einar O.; Langmoen, Iver A.

    2015-01-01

    Glioblastoma (GBM) is both the most common and the most lethal primary brain tumor. It is thought that GBM stem cells (GSCs) are critically important in resistance to therapy. Therefore, there is a strong rationale to target these cells in order to develop new molecular therapies. To identify molecular targets in GSCs, we compared gene expression in GSCs to that in neural stem cells (NSCs) from the adult human brain, using microarrays. Bioinformatic filtering identified 20 genes (PBK/TOPK, CENPA, KIF15, DEPDC1, CDC6, DLG7/DLGAP5/HURP, KIF18A, EZH2, HMMR/RHAMM/CD168, NOL4, MPP6, MDM1, RAPGEF4, RHBDD1, FNDC3B, FILIP1L, MCC, ATXN7L4/ATXN7L1, P2RY5/LPAR6 and FAM118A) that were consistently expressed in GSC cultures and consistently not expressed in NSC cultures. The expression of these genes was confirmed in clinical samples (TCGA and REMBRANDT). The first nine genes were highly co-expressed in all GBM subtypes and were part of the same protein-protein interaction network. Furthermore, their combined up-regulation correlated negatively with patient survival in the mesenchymal GBM subtype. Using targeted proteomics and the COGNOSCENTE database we linked these genes to GBM signalling pathways. Nine genes: PBK, CENPA, KIF15, DEPDC1, CDC6, DLG7, KIF18A, EZH2 and HMMR should be further explored as targets for treatment of GBM. PMID:26295306

  19. Protectome Analysis: A New Selective Bioinformatics Tool for Bacterial Vaccine Candidate Discovery

    PubMed Central

    Altindis, Emrah; Cozzi, Roberta; Di Palo, Benedetta; Necchi, Francesca; Mishra, Ravi P.; Fontana, Maria Rita; Soriani, Marco; Bagnoli, Fabio; Maione, Domenico; Grandi, Guido; Liberatori, Sabrina

    2015-01-01

    New generation vaccines are in demand to include only the key antigens sufficient to confer protective immunity among the plethora of pathogen molecules. In the last decade, large-scale genomics-based technologies have emerged. Among them, the Reverse Vaccinology approach was successfully applied to the development of an innovative vaccine against Neisseria meningitidis serogroup B, now available on the market with the commercial name BEXSERO® (Novartis Vaccines). The limiting step of such approaches is the number of antigens to be tested in in vivo models. Several laboratories have been trying to refine the original approach in order to get to the identification of the relevant antigens straight from the genome. Here we report a new bioinformatics tool that moves a first step in this direction. The tool has been developed by identifying structural/functional features recurring in known bacterial protective antigens, the so called “Protectome space,” and using such “protective signatures” for protective antigen discovery. In particular, we applied this new approach to Staphylococcus aureus and Group B Streptococcus and we show that not only already known protective antigens were re-discovered, but also two new protective antigens were identified. PMID:25368410

  20. Bioinformatics analysis and construction of phylogenetic tree of aquaporins from Echinococcus granulosus.

    PubMed

    Wang, Fen; Ye, Bin

    2016-09-01

    Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus. PMID:27164831

  1. Entropy-based analysis and bioinformatics-inspired integration of global economic information transfer.

    PubMed

    Kim, Jinkyu; Kim, Gunn; An, Sungbae; Kwon, Young-Kyun; Yoon, Sungroh

    2013-01-01

    The assessment of information transfer in the global economic network helps to understand the current environment and the outlook of an economy. Most approaches on global networks extract information transfer based mainly on a single variable. This paper establishes an entirely new bioinformatics-inspired approach to integrating information transfer derived from multiple variables and develops an international economic network accordingly. In the proposed methodology, we first construct the transfer entropies (TEs) between various intra- and inter-country pairs of economic time series variables, test their significances, and then use a weighted sum approach to aggregate information captured in each TE. Through a simulation study, the new method is shown to deliver better information integration compared to existing integration methods in that it can be applied even when intra-country variables are correlated. Empirical investigation with the real world data reveals that Western countries are more influential in the global economic network and that Japan has become less influential following the Asian currency crisis. PMID:23300959

  2. Phylogenetic and bioinformatic analysis of gap junction-related proteins, innexins, pannexins and connexins.

    PubMed

    Fushiki, Daisuke; Hamada, Yasuo; Yoshimura, Ryoichi; Endo, Yasuhisa

    2010-04-01

    All multi-cellular animals, including hydra, insects and vertebrates, develop gap junctions, which communicate directly with neighboring cells. Gap junctions consist of protein families called connexins in vertebrates and innexins in invertebrates. Connexins and innexins have no homology in their amino acid sequence, but both are thought to have some similar characteristics, such as a tetra-membrane-spanning structure, formation of a channel by hexamer, and transmission of small molecules (e.g. ions) to neighboring cells. Pannexins were recently identified as a homolog of innexins in vertebrate genomes. Although pannexins are thought to share the function of intercellular communication with connexins and innexins, there is little information about the relationship among these three protein families of gap junctions. We phylgenetically and bioinformatically examined these protein families and other tetra-membrane-spanning proteins using a database and three analytical softwares. The clades formed by pannexin families do not belong to the species classification but do to paralogs of each member of pannexins. Amino acid sequences of pannexins are closely related to those of innexins but less to those of connexins. These data suggest that innexins and pannexins have a common origin, but the relationship between innexins/pannexins and connexins is as slight as that of other tetra-membrane-spanning members. PMID:20460741

  3. Bioinformatics analysis of plant orthologous introns: identification of an intronic tRNA-like sequence.

    PubMed

    Akkuratov, Evgeny E; Walters, Lorraine; Saha-Mandal, Arnab; Khandekar, Sushant; Crawford, Erin; Zirbel, Craig L; Leisner, Scott; Prakash, Ashwin; Fedorova, Larisa; Fedorov, Alexei

    2014-09-10

    Orthologous introns have identical positions relative to the coding sequence in orthologous genes of different species. By analyzing the complete genomes of five plants we generated a database of 40,512 orthologous intron groups of dicotyledonous plants, 28,519 orthologous intron groups of angiosperms, and 15,726 of land plants (moss and angiosperms). Multiple sequence alignments of each orthologous intron group were obtained using the Mafft algorithm. The number of conserved regions in plant introns appeared to be hundreds of times fewer than that in mammals or vertebrates. Approximately three quarters of conserved intronic regions among angiosperms and dicots, in particular, correspond to alternatively-spliced exonic sequences. We registered only a handful of conserved intronic ncRNAs of flowering plants. However, the most evolutionarily conserved intronic region, which is ubiquitous for all plants examined in this study, including moss, possessed multiple structural features of tRNAs, which caused us to classify it as a putative tRNA-like ncRNA. Intronic sequences encoding tRNA-like structures are not unique to plants. Bioinformatics examination of the presence of tRNA inside introns revealed an unusually long-term association of four glycine tRNAs inside the Vac14 gene of fish, amniotes, and mammals. PMID:25014137

  4. Phylogenetic, Expression, and Bioinformatic Analysis of the ABC1 Gene Family in Populus trichocarpa

    PubMed Central

    Zhang, Haizhen; Chen, Yunlin; Xu, Xuemei; Mao, Xuliang; Li, Chenghao

    2013-01-01

    We studied 17 ABC1 genes in Populus trichocarpa, all of which contained an ABC1 domain consisting of about 120 amino acid residues. Most of the ABC1 gene products were located in the mitochondria or chloroplasts. All had a conserved VAVK-like motif and a DFG motif. Phylogenetic analysis grouped the genes into three subgroups. In addition, the chromosomal locations of the genes on the 19 Populus chromosomes were determined. Gene structure was studied through exon/intron organization and the MEME motif finder, while heatmap was used to study the expression diversity using EST libraries. According to the heatmap, PtrABC1P14 was highlighted because of the high expression in tension wood which related to secondary cell wall formation and cellulose synthesis, thus making a contribution to follow-up experiment in wood formation. Promoter cis-element analysis indicated that almost all of the ABC1 genes contained one or two cis-elements related to ABA signal transduction pathway and drought stress. Quantitative real-time PCR was carried out to evaluate the expression of all of the genes under abiotic stress conditions (ABA, CdCl2, high temperature, high salinity, and drought); the results showed that some of the genes were affected by these stresses and confirmed the results of promoter cis-element analysis. PMID:24163630

  5. Molecular mechanisms associated with breast cancer based on integrated gene expression profiling by bioinformatics analysis.

    PubMed

    Wu, Di; Han, Bing; Guo, Liang; Fan, Zhimin

    2016-07-01

    In this study, we aimed to gain more insights into the underlying molecular mechanisms responsible for breast cancer (BC) progression. Three gene expression profiles of human BC were integrated and used to screen the differentially expressed genes (DEGs) between healthy breast samples and BC samples. Protein-protein interaction (PPI) network of DEGs was constructed by mapping DEGs into the Search Tool for the Retrieval of Interacting Genes (STRING) database; then the subnetworks of PPI were constructed with plug-in, MCODE and DEGs in Subnetwork 1 were analysed based on Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway database ( http://www.genome.jp/kegg /). In addition, co-expression network of DEGs was established using the Cytoscape. Totalally 931 DEGs were selected, including 340 up-regulated genes and 591 down-regulated genes. KEGG pathway analysis for DEGs in Subnetwork 1 showed that the pathogenesis of BC was associated with cell cycle, oocyte meiosis, progesterone-mediated oocyte maturation and p53 signalling pathways. Meanwhile, the most significant-related DEGs were found by co-expression network analysis of DEGs. In conclusion, CCNG1 might be involved in the progression of BC via inhibiting cell proliferation, and ADAMTS1 might play a crucial role in BC development through the regulation of angiogenesis. PMID:26804550

  6. Expression and Bioinformatics Analysis of Pectate Lyase Gene from Bacillus subtilis521

    NASA Astrophysics Data System (ADS)

    Xiao, Jing; Lu, Fu-Ping; Li, Yu; Li, Jin-Ting

    In order to exploit new genetic resources, Pectate lyase(PEL) gene was amplified by PCR using the genome DNA from an alkaline Bacillus subtilis521. The PCR product was inserted into pET22b(+) vector. The recombinant plasmids were cloned in E.coli DH5α and then expressed in E.coli BL21. When cultured in the optimized medium, the positive clones E.coli BL21(pET22b(+)pel)showed intracellular pectate lyase activity of 90.0 U/mL. It was indicated that we had obtained the correct PEL gene. The pel has an open reading frame of 1263 nucleotides and codes for a product of 420 amino acids with a calculated molecular mass of 45.5 kD. Based on computer assisted analysis, a signal peptides and two conserved domains were revealed. The sequence analysis for PEL showed that it shares 26-82% homology with other strains in GenBank. In addition, the advanced structure of PEL were also predicted and analysed. This study will help to the experimental design of PEL fermentation and production purification and enzyme evolution.

  7. Molecular cloning and bioinformatic analysis of the Streptococcus agalactiae neuA gene isolated from tilapia.

    PubMed

    Wang, E L; Wang, K Y; Chen, D F; Geng, Y; Huang, L Y; Wang, J; He, Y

    2015-01-01

    Cytidine monophosphate (CMP) N-acetylneuraminic acid (NeuNAc) synthetase, which is encoded by the neuA gene, can catalyze the activation of sialic acid with CMP, and plays an important role in Streptococcus agalactiae infection pathogenesis. To study the structure and function of the S. agalactiae neuA gene, we isolated it from diseased tilapia, amplified it using polymerase chain reaction (PCR) with specific primers, and cloned it into a pMD19-T vector. The recombinant plasmid was confirmed by PCR and restriction enzyme digestion, and identified by sequencing. Molecular characterization analyses of the neuA nucleotide amino acid sequence were performed using bioinformatic tools and an online server. The results showed that the neuA nucleotide sequence contained a complete coding region, which comprised 1242 bp, encoding 413 amino acids (aa). The aa sequence was highly conserved and contained a Glyco_tranf_GTA_type superfamily and an SGNH_hydrolase superfamily conserved domain, which are related to sialic acid activation catalysis. The NeuA protein possessed many important sites related to post-translational modification, including 28 potential phosphorylation sites and 2 potential N-glycosylation sites, had no signal peptides or transmembrane regions, and was predicted to reside in the cytoplasm. Moreover, the protein had some B-cell epitopes, which suggests its potential in development of a vaccine against S. agalactiae infection. The codon usage frequency of neuA differed greatly in Escherichia coli and Homo sapiens genes, and neuA may be more efficiently expressed in eukaryotes (yeast). S. agalactiae neuA from tilapia maintains high structural homology and sequence identity with CMP-NeuNAc synthetases from other bacteria. PMID:26125800

  8. In Vitro Mutational and Bioinformatics Analysis of the M71 Odorant Receptor and Its Superfamily

    PubMed Central

    Tomoiaga, Delia; D’Hulst, Charlotte; Krampis, Konstantinos; Feinstein, Paul

    2015-01-01

    We performed an extensive mutational analysis of the canonical mouse odorant receptor (OR) M71 to determine the properties of ORs that inhibit plasma membrane trafficking in heterologous expression systems. We employed the use of the M71::GFP fusion protein to directly assess plasma membrane localization and functionality of M71 in heterologous cells in vitro or in olfactory sensory neurons (OSNs) in vivo. OSN expression of M71::GFP show only small differences in activity compared to untagged M71. However, M71::GFP could not traffic to the plasma membrane even in the presence of proposed accessory proteins RTP1S or mβ2AR. To ask if ORs contain an internal “kill sequence”, we mutated ~15 of the most highly conserved OR specific amino acids not found amongst the trafficking non-OR GPCR superfamily; none of these mutants rescued trafficking. Addition of various amino terminal signal sequences or different glycosylation motifs all failed to produce trafficking. The addition of the amino and carboxy terminal domains of mβ2AR or the mutation Y289A in the highly conserved GPCR motif NPxxY does not rescue plasma membrane trafficking. The failure of targeted mutagenesis on rescuing plasma membrane localization in heterologous cells suggests that OR trafficking deficits may not be attributable to conserved collinear motifs, but rather the overall amino acid composition of the OR family. Thus, we performed an in silico analysis comparing the OR and other amine receptor superfamilies. We find that ORs contain fewer charged residues and more hydrophobic residues distributed throughout the protein and a conserved overall amino acid composition. From our analysis, we surmise that it may be difficult to traffic ORs at high levels to the cell surface in vitro, without making significant amino acid modifications. Finally, we observed specific increases in methionine and histidine residues as well as a marked decrease in tryptophan residues, suggesting that these changes

  9. Critical genes in head and neck squamous cell carcinoma revealed by bioinformatic analysis of gene expression data.

    PubMed

    Wang, B; Wang, T; Cao, X L; Li, Y

    2015-01-01

    In this study, bioinformatic analysis of gene expression data of head and neck squamous cell carcinoma (HNSCC) was performed to identify critical genes. Gene expression data of HNSCC were downloaded from the Cancer Genome Atlas (TCGA) and differentially expressed genes were determined through significance analysis of microarrays. Protein-protein interaction networks were constructed and used to identify hub genes. Functional enrichment analysis was performed with DAVID. Relevant microRNAs, transcription factors, and small molecule drugs were predicted by the Fisher exact test. Survival analysis was performed with the Kaplan-Meier plot from a package for survival analysis in R. In the five groups of HNSCC patients, a total of 5946 DEGs were identified in group 1, 4575 DEGs in group 2, 5580 DEGs in group 3, 8017 DEGs in group 4, and 5469 DEGs in group 5. DEGs in the cell cycle and immune response were significantly over-represented. Five PPI networks were constructed from which hub genes were acquired, such as minichromosome maintenance complex component 7 (MCM7), MCM2, decorin (DCN), retinoblastoma 1 (RB1), and tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein gamma (YWHAG). No significant difference in survival was observed among the 5 groups; however, a significant difference existed between two combined groups (groups 1, 3, and 5 vs groups 2 and 4). Our study revealed critical genes in HNSCC, which could supplement the knowledge about the pathogenesis of HNSCC and provide clues for future therapy development. PMID:26782382

  10. Diagnosis of an imprinted-gene syndrome by a novel bioinformatics analysis of whole-genome sequences from a family trio.

    PubMed

    Bodian, Dale L; Solomon, Benjamin D; Khromykh, Alina; Thach, Dzung C; Iyer, Ramaswamy K; Link, Kathleen; Baker, Robin L; Baveja, Rajiv; Vockley, Joseph G; Niederhuber, John E

    2014-11-01

    Whole-genome sequencing and whole-exome sequencing are becoming more widely applied in clinical medicine to help diagnose rare genetic diseases. Identification of the underlying causative mutations by genome-wide sequencing is greatly facilitated by concurrent analysis of multiple family members, most often the mother-father-proband trio, using bioinformatics pipelines that filter genetic variants by mode of inheritance. However, current pipelines are limited to Mendelian inheritance patterns and do not specifically address disorders caused by mutations in imprinted genes, such as forms of Angelman syndrome and Beckwith-Wiedemann syndrome. Using publicly available tools, we implemented a genetic inheritance search mode to identify imprinted-gene mutations. Application of this search mode to whole-genome sequences from a family trio led to a diagnosis for a proband for whom extensive clinical testing and Mendelian inheritance-based sequence analysis were nondiagnostic. The condition in this patient, IMAGe syndrome, is likely caused by the heterozygous mutation c.832A>G (p.Lys278Glu) in the imprinted gene CDKN1C. The genotypes and disease status of six members of the family are consistent with maternal expression of the gene, and allele-biased expression was confirmed by RNA-Seq for the heterozygotes. This analysis demonstrates that an imprinted-gene search mode is a valuable addition to genome sequence analysis pipelines for identifying disease-causative variants. PMID:25614875

  11. Diagnosis of an imprinted-gene syndrome by a novel bioinformatics analysis of whole-genome sequences from a family trio

    PubMed Central

    Bodian, Dale L; Solomon, Benjamin D; Khromykh, Alina; Thach, Dzung C; Iyer, Ramaswamy K; Link, Kathleen; Baker, Robin L; Baveja, Rajiv; Vockley, Joseph G; Niederhuber, John E

    2014-01-01

    Whole-genome sequencing and whole-exome sequencing are becoming more widely applied in clinical medicine to help diagnose rare genetic diseases. Identification of the underlying causative mutations by genome-wide sequencing is greatly facilitated by concurrent analysis of multiple family members, most often the mother–father–proband trio, using bioinformatics pipelines that filter genetic variants by mode of inheritance. However, current pipelines are limited to Mendelian inheritance patterns and do not specifically address disorders caused by mutations in imprinted genes, such as forms of Angelman syndrome and Beckwith–Wiedemann syndrome. Using publicly available tools, we implemented a genetic inheritance search mode to identify imprinted-gene mutations. Application of this search mode to whole-genome sequences from a family trio led to a diagnosis for a proband for whom extensive clinical testing and Mendelian inheritance-based sequence analysis were nondiagnostic. The condition in this patient, IMAGe syndrome, is likely caused by the heterozygous mutation c.832A>G (p.Lys278Glu) in the imprinted gene CDKN1C. The genotypes and disease status of six members of the family are consistent with maternal expression of the gene, and allele-biased expression was confirmed by RNA-Seq for the heterozygotes. This analysis demonstrates that an imprinted-gene search mode is a valuable addition to genome sequence analysis pipelines for identifying disease-causative variants. PMID:25614875

  12. Bioinformatics analysis of the early inflammatory response in a rat thermal injury model

    PubMed Central

    Yang, Eric; Maguire, Timothy; Yarmush, Martin L; Berthiaume, Francois; Androulakis, Ioannis P

    2007-01-01

    Background Thermal injury is among the most severe forms of trauma and its effects are both local and systemic. Response to thermal injury includes cellular protection mechanisms, inflammation, hypermetabolism, prolonged catabolism, organ dysfunction and immuno-suppression. It has been hypothesized that gene expression patterns in the liver will change with severe burns, thus reflecting the role the liver plays in the response to burn injury. Characterizing the molecular fingerprint (i.e., expression profile) of the inflammatory response resulting from burns may help elucidate the activated mechanisms and suggest new therapeutic intervention. In this paper we propose a novel integrated framework for analyzing time-series transcriptional data, with emphasis on the burn-induced response within the context of the rat animal model. Our analysis robustly identifies critical expression motifs, indicative of the dynamic evolution of the inflammatory response and we further propose a putative reconstruction of the associated transcription factor activities. Results Implementation of our algorithm on data obtained from an animal (rat) burn injury study identified 281 genes corresponding to 4 unique profiles. Enrichment evaluation upon both gene ontologies and transcription factors, verifies the inflammation-specific character of the selections and the rationalization of the burn-induced inflammatory response. Conducting the transcription network reconstruction and analysis, we have identified transcription factors, including AHR, Octamer Binding Proteins, Kruppel-like Factors, and cell cycle regulators as being highly important to an organism's response to burn response. These transcription factors are notable due to their roles in pathways that play a part in the gross physiological response to burn such as changes in the immune response and inflammation. Conclusion Our results indicate that our novel selection/classification algorithm has been successful in selecting out

  13. Flux Analysis of the Trypanosoma brucei Glycolysis Based on a Multiobjective-Criteria Bioinformatic Approach

    PubMed Central

    Ghozlane, Amine; Bringaud, Frédéric; Soueidan, Hayssam; Dutour, Isabelle; Jourdan, Fabien; Thébault, Patricia

    2012-01-01

    Trypanosoma brucei is a protozoan parasite of major of interest in discovering new genes for drug targets. This parasite alternates its life cycle between the mammal host(s) (bloodstream form) and the insect vector (procyclic form), with two divergent glucose metabolism amenable to in vitro culture. While the metabolic network of the bloodstream forms has been well characterized, the flux distribution between the different branches of the glucose metabolic network in the procyclic form has not been addressed so far. We present a computational analysis (called Metaboflux) that exploits the metabolic topology of the procyclic form, and allows the incorporation of multipurpose experimental data to increase the biological relevance of the model. The alternatives resulting from the structural complexity of networks are formulated as an optimization problem solved by a metaheuristic where experimental data are modeled in a multiobjective function. Our results show that the current metabolic model is in agreement with experimental data and confirms the observed high metabolic flexibility of glucose metabolism. In addition, Metaboflux offers a rational explanation for the high flexibility in the ratio between final products from glucose metabolism, thsat is, flux redistribution through the malic enzyme steps. PMID:23097667

  14. A Bioinformatic Strategy for the Detection, Classification and Analysis of Bacterial Autotransporters

    PubMed Central

    Celik, Nermin; Webb, Chaille T.; Leyton, Denisse L.; Holt, Kathryn E.; Heinz, Eva; Gorrell, Rebecca; Kwok, Terry; Naderer, Thomas; Strugnell, Richard A.; Speed, Terence P.; Teasdale, Rohan D.; Likić, Vladimir A.; Lithgow, Trevor

    2012-01-01

    Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters. PMID:22905239

  15. Identification of novel highly expressed genes in pancreatic ductal adenocarcinomas through a bioinformatics analysis of expressed sequence tags.

    PubMed

    Cao, Dengfeng; Hustinx, Steven R; Sui, Guoping; Bala, P; Sato, Norihiro; Martin, Sean; Maitra, Anirban; Murphy, Kathleen M; Cameron, John L; Yeo, Charles J; Kern, Scott E; Goggins, Michael; Pandey, Akhilesh; Hruban, Ralph H

    2004-11-01

    In most microarray experiments, a significant fraction of the differentially expressed mRNAs identified correspond to expressed sequence tags (ESTs) and are generally discarded from further analyses. We used careful bioinformatics analyses to characterize those ESTs that were found to be highly overexpressed in a series of pancreatic adenocarcinomas. cDNA was prepared from 60 non-neoplastic samples (normal pancreas [n = 20], normal colon [n = 10], or normal duodenal mucosal [n = 30]) and from 64 pancreatic cancers (resected cancers [n = 50] or cancer cell lines [n = 14]) and hybridized to the complete Affymetrix Human Genome U133 GeneChip(R) set (arrays U133A and B) for simultaneous analysis of 45,000 fragments corresponding to 33,000 known genes and 6,000 ESTs. The GeneExpress(R) software system Fold Change Analysis Tool was used and 60 ESTs were identified that were expressed at levels at least 3-fold greater in the pancreatic cancers as compared to normal tissues. Searches against the human genomic sequence and comparative genomic analysis of human and mouse genomes was carried out using basic local alignment search tools (BLAST), BLASTN, and BLASTX, for identifying protein coding genes corresponding to the ESTs. Subsequently, in order to pick the most relevant candidate genes for a more detailed analysis, we looked for domains/motifs in the open reading frames using SMART and Pfam programs. We were able to definitively map 43 of the 60 ESTs to known or novel genes, and 15 of the ESTs could be localized in close proximity to a gene in the human genome although we were unable to establish that the EST was indeed derived from those genes. The differential expression of a subset of genes was confirmed at the protein level by immunohistochemical labeling of tissue microarrays (inhibin beta A [INHBA] and CD29) and/or at the transcript level by RT-PCR (INHBA, AKAP12, ELK3, FOXQ1, EIF5A2, and EFNA5). We conclude that bioinformatics tools can be used to characterize

  16. Construction of a public CHO cell line transcript database using versatile bioinformatics analysis pipelines.

    PubMed

    Rupp, Oliver; Becker, Jennifer; Brinkrolf, Karina; Timmermann, Christina; Borth, Nicole; Pühler, Alfred; Noll, Thomas; Goesmann, Alexander

    2014-01-01

    Chinese hamster ovary (CHO) cell lines represent the most commonly used mammalian expression system for the production of therapeutic proteins. In this context, detailed knowledge of the CHO cell transcriptome might help to improve biotechnological processes conducted by specific cell lines. Nevertheless, very few assembled cDNA sequences of CHO cells were publicly released until recently, which puts a severe limitation on biotechnological research. Two extended annotation systems and web-based tools, one for browsing eukaryotic genomes (GenDBE) and one for viewing eukaryotic transcriptomes (SAMS), were established as the first step towards a publicly usable CHO cell genome/transcriptome analysis platform. This is complemented by the development of a new strategy to assemble the ca. 100 million reads, sequenced from a broad range of diverse transcripts, to a high quality CHO cell transcript set. The cDNA libraries were constructed from different CHO cell lines grown under various culture conditions and sequenced using Roche/454 and Illumina sequencing technologies in addition to sequencing reads from a previous study. Two pipelines to extend and improve the CHO cell line transcripts were established. First, de novo assemblies were carried out with the Trinity and Oases assemblers, using varying k-mer sizes. The resulting contigs were screened for potential CDS using ESTScan. Redundant contigs were filtered out using cd-hit-est. The remaining CDS contigs were re-assembled with CAP3. Second, a reference-based assembly with the TopHat/Cufflinks pipeline was performed, using the recently published draft genome sequence of CHO-K1 as reference. Additionally, the de novo contigs were mapped to the reference genome using GMAP and merged with the Cufflinks assembly using the cuffmerge software. With this approach 28,874 transcripts located on 16,492 gene loci could be assembled. Combining the results of both approaches, 65,561 transcripts were identified for CHO cell lines

  17. Genome-wide expression profiling and bioinformatics analysis of diurnally regulated genes in the mouse prefrontal cortex

    PubMed Central

    Yang, Shuzhang; Wang, Kai; Valladares, Otto; Hannenhalli, Sridhar; Bucan, Maja

    2007-01-01

    Background The prefrontal cortex is important in regulating sleep and mood. Diurnally regulated genes in the prefrontal cortex may be controlled by the circadian system, by sleep:wake states, or by cellular metabolism or environmental responses. Bioinformatics analysis of these genes will provide insights into a wide-range of pathways that are involved in the pathophysiology of sleep disorders and psychiatric disorders with sleep disturbances. Results We examined gene expression in the mouse prefrontal cortex at four time points during a 24 hour (12 hour light:12 hour dark) cycle using microarrays, and identified 3,890 transcripts corresponding to 2,927 genes with diurnally regulated expression patterns. We show that 16% of the genes identified in our study are orthologs of identified clock, clock controlled or sleep/wakefulness induced genes in the mouse liver and suprachiasmatic nucleus, rat cortex and cerebellum, or Drosophila head. The diurnal expression patterns were confirmed for 16 out of 18 genes in an independent set of RNA samples. The diurnal genes fall into eight temporal categories with distinct functional attributes, as assessed by Gene Ontology classification and analysis of enriched transcription factor binding sites. Conclusion Our analysis demonstrates that approximately 10% of transcripts have diurnally regulated expression patterns in the mouse prefrontal cortex. Functional annotation of these genes will be important for the selection of candidate genes for behavioral mutants in the mouse and for genetic studies of disorders associated with anomalies in the sleep:wake cycle and circadian rhythm. PMID:18028544

  18. Bioinformatics investigation of therapeutic mechanisms of Xuesaitong capsule treating ischemic cerebrovascular rat model with comparative transcriptome analysis

    PubMed Central

    Liao, Jiangquan; Wei, Benjun; Chen, Hengwen; Liu, Yongmei; Wang, Jie

    2016-01-01

    Background: Xuesaitong soft capsule (XST) which consists of panax notoginseng saponin (PNS) has been used to treat ischemic cerebrovascular diseases in China. The therapeutic mechanism of XST has not been elucidated yet from prospective of genomics and bioinformatics. Methods: A transcriptome analysis was performed to review series concerning middle cerebral artery occlusion (MCAO) rat model and XST intervention after MCAO from Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were compared between blank group and model group, model group and XST group. Functional enrichment and pathway analysis were performed. Protein-Protein interaction network was constructed. The overlapping genes from two DEGs sets were screened out and profound analysis was performed. Results: Two series including 22 samples were obtained. 870 DEGs were identified between blank group and model group, and 1189 DEGs were identified between model group and XST group. GO terms and KEGG pathways of MCAO and XST intervention were significantly enriched. PPI networks were constructed to demonstrate the gene-gene interactions. The overlapping genes from two DEGs sets were highlighted. ANTXR2, FHL3, PRCP, TYROBP, TAF9B, FGFR2, BCL11B, RB1CC1 and MBNL2 were the pivotal genes and possible action sites of XST therapeutic mechanisms. Conclusion: MCAO is a pathological process with multiple. PMID:27347353

  19. Screening of gene signatures for rheumatoid arthritis and osteoarthritis based on bioinformatics analysis.

    PubMed

    He, Peiheng; Zhang, Ziji; Liao, Weiming; Xu, Dongliang; Fu, Ming; Kang, Yan

    2016-08-01

    The current study aimed to identify gene signatures during rheumatoid arthritis (RA) and osteoarthritis (OA), and used these to elucidate the underlying modular mechanisms. Using the Gene Expression Omnibus database, the present study obtained the GSE7669 mRNA expression microarray data from RA and OA synovial fibroblasts (n=6 each). The differentially expressed genes (DEGs) in RA synovial samples compared with OA samples were identified using the Linear Models for Microarray Analysis package. The Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses were performed using the Database for Annotation Visualization and Integrated Discovery. A protein‑protein interaction network was constructed and the modules were further analyzed using the Molecular Complex Detection plugin of Cytoscape. A total of 181 DEGs were identified by comparing RA and OA synovial samples (96 up‑ and 85 downregulated genes). The significant DEGs in module 1, including collagen, type I, α 1 (COL1A1), COL3A1, COL4A1 and COL11A1, were predominantly enriched in the extracellular matrix (ECM)‑receptor interaction and focal adhesion pathways. Additionally, significant DEGs in module 2, including radical S‑adenosyl methionine domain containing 2 (RSAD2), 2'‑5'‑oligoadenylate synthetase 2 (OAS2), myxovirus (influenza virus) resistance 1 (MX1) and ISG15 ubiquitin‑like modifier (ISG15), were predominantly associated with immune function pathways. In conclusion, the present study indicated that RSAD2, OAS2, MX1 and ISG15 may be notable gene signatures in RA development via regulation of the immune response. COL3A1, COL4A1, COL1A1 and COL11A1 may be important gene signatures in OA development via involvement in the pathways of ECM-receptor interactions and focal adhesions. PMID:27356888

  20. Screening of gene signatures for rheumatoid arthritis and osteoarthritis based on bioinformatics analysis

    PubMed Central

    He, Peiheng; Zhang, Ziji; Liao, Weiming; Xu, Dongliang; Fu, Ming; Kang, Yan

    2016-01-01

    The current study aimed to identify gene signatures during rheumatoid arthritis (RA) and osteoarthritis (OA), and used these to elucidate the underlying modular mechanisms. Using the Gene Expression Omnibus database, the present study obtained the GSE7669 mRNA expression microarray data from RA and OA synovial fibroblasts (n=6 each). The differentially expressed genes (DEGs) in RA synovial samples compared with OA samples were identified using the Linear Models for Microarray Analysis package. The Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses were performed using the Database for Annotation Visualization and Integrated Discovery. A protein-protein interaction network was constructed and the modules were further analyzed using the Molecular Complex Detection plugin of Cytoscape. A total of 181 DEGs were identified by comparing RA and OA synovial samples (96 up- and 85 downregulated genes). The significant DEGs in module 1, including collagen, type I, α 1 (COL1A1), COL3A1, COL4A1 and COL11A1, were predominantly enriched in the extracellular matrix (ECM)-receptor interaction and focal adhesion pathways. Additionally, significant DEGs in module 2, including radical S-adenosyl methionine domain containing 2 (RSAD2), 2′-5′-oligoadenylate synthetase 2 (OAS2), myxovirus (influenza virus) resistance 1 (MX1) and ISG15 ubiquitin-like modifier (ISG15), were predominantly associated with immune function pathways. In conclusion, the present study indicated that RSAD2, OAS2, MX1 and ISG15 may be notable gene signatures in RA development via regulation of the immune response. COL3A1, COL4A1, COL1A1 and COL11A1 may be important gene signatures in OA development via involvement in the pathways of ECM-receptor interactions and focal adhesions. PMID:27356888

  1. Coupling in silico and in vitro analysis of peptide-MHC binding: a bioinformatic approach enabling prediction of superbinding peptides and anchorless epitopes.

    PubMed

    Doytchinova, Irini A; Walshe, Valerie A; Jones, Nicola A; Gloster, Simone E; Borrow, Persephone; Flower, Darren R

    2004-06-15

    The ability to define and manipulate the interaction of peptides with MHC molecules has immense immunological utility, with applications in epitope identification, vaccine design, and immunomodulation. However, the methods currently available for prediction of peptide-MHC binding are far from ideal. We recently described the application of a bioinformatic prediction method based on quantitative structure-affinity relationship methods to peptide-MHC binding. In this study we demonstrate the predictivity and utility of this approach. We determined the binding affinities of a set of 90 nonamer peptides for the MHC class I allele HLA-A*0201 using an in-house, FACS-based, MHC stabilization assay, and from these data we derived an additive quantitative structure-affinity relationship model for peptide interaction with the HLA-A*0201 molecule. Using this model we then designed a series of high affinity HLA-A2-binding peptides. Experimental analysis revealed that all these peptides showed high binding affinities to the HLA-A*0201 molecule, significantly higher than the highest previously recorded. In addition, by the use of systematic substitution at principal anchor positions 2 and 9, we showed that high binding peptides are tolerant to a wide range of nonpreferred amino acids. Our results support a model in which the affinity of peptide binding to MHC is determined by the interactions of amino acids at multiple positions with the MHC molecule and may be enhanced by enthalpic cooperativity between these component interactions. PMID:15187128

  2. Identification and Characterization of miRNAs in Chondrus crispus by High-Throughput Sequencing and Bioinformatics Analysis.

    PubMed

    Gao, Fan; Nan, FangRu; Song, Wei; Feng, Jia; Lv, JunPing; Xie, ShuLian

    2016-01-01

    Chondrus crispus, an economically and medicinally important red alga, is a medicinally active substance and important for anti-tumor research. In this study, 117 C. crispus miRNAs (108 conserved and 9 novel) were identified from 2,416,181 small-RNA reads using high-throughput sequencing and bioinformatics methods. According to the BLAST search against the miRBase database, these miRNAs belonged to 110 miRNA families. Sequence alignment combined with homology searching revealed both the conservation and diversity of predicted potential miRNA families in different plant species. Four and 19 randomly selected miRNAs were validated by northern blotting and stem-loop quantitative real-time reverse transcription polymerase chain reaction detection, respectively. The validation rates (75% and 94.7%) demonstrated that most of the identified miRNAs could be credible. A total of 160 potential target genes were predicted and functionally annotated by Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis. We also analyzed the interrelationship of miRNAs, miRNA-target genes and target genes in C. crispus by constructing a Cytoscape network. The 117 miRNAs identified in our study should supply large quantities of information that will be important for red algae small RNA research. PMID:27193824

  3. Identification and Characterization of miRNAs in Chondrus crispus by High-Throughput Sequencing and Bioinformatics Analysis

    PubMed Central

    Gao, Fan; Nan, FangRu; Song, Wei; Feng, Jia; Lv, JunPing; Xie, ShuLian

    2016-01-01

    Chondrus crispus, an economically and medicinally important red alga, is a medicinally active substance and important for anti-tumor research. In this study, 117 C. crispus miRNAs (108 conserved and 9 novel) were identified from 2,416,181 small-RNA reads using high-throughput sequencing and bioinformatics methods. According to the BLAST search against the miRBase database, these miRNAs belonged to 110 miRNA families. Sequence alignment combined with homology searching revealed both the conservation and diversity of predicted potential miRNA families in different plant species. Four and 19 randomly selected miRNAs were validated by northern blotting and stem-loop quantitative real-time reverse transcription polymerase chain reaction detection, respectively. The validation rates (75% and 94.7%) demonstrated that most of the identified miRNAs could be credible. A total of 160 potential target genes were predicted and functionally annotated by Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis. We also analyzed the interrelationship of miRNAs, miRNA-target genes and target genes in C. crispus by constructing a Cytoscape network. The 117 miRNAs identified in our study should supply large quantities of information that will be important for red algae small RNA research. PMID:27193824

  4. Molecular identification and bioinformatics analysis of a potential anti-vector vaccine candidate, 15-kDa salivary gland protein (Salp15), from Ixodes affinis ticks.

    PubMed

    Sultana, Hameeda; Patel, Unnati; Toliver, Marcée; Maggi, Ricardo G; Neelakanta, Girish

    2016-02-01

    Salp15, a 15-kDa salivary gland protein plays an important role in tick blood-feeding and transmission of Borrelia burgdorferi, the causative agent of Lyme borreliosis. The comparative studies reveal that Salp15 is a genetically conserved protein across various Ixodes species. In this study, we have identified a Salp15 homolog, designated as Iaff15, from Ixodes affinis ticks that are the principal enzootic vectors of B. burgdorferi sensu stricto in the southeastern part of the United States. Comparison of the annotated amino acid sequences showed that Iaff15 share 81% homology with I. sinensis Salp15 homolog and 64% homology with I. scapularis Salp15. Phylogenetic analysis revealed that Iaff15 come within the same clade with I. sinensis, I. scapularis, and I. pacificus Salp15 homologs. The bioinformatics analysis of the posttranslational modifications prediction revealed that all the Salp15 family members contain glycosylation sites. In addition, Iaff15 carried a higher number of Casein Kinase II phosphorylation sites in comparison to the other Salp15 family members. Collectively, high sequence conservation distributed over the entire amino acids sequence not only suggests an important role for Iaff15 in I. affinis blood feeding and vector-pathogen interactions but may also lead to the development of an anti-vector vaccine against this group of ticks. PMID:26296588

  5. Bioinformatics tools for analysing viral genomic data.

    PubMed

    Orton, R J; Gu, Q; Hughes, J; Maabar, M; Modha, S; Vattipally, S B; Wilkie, G S; Davison, A J

    2016-04-01

    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing. PMID:27217183

  6. Bioinformatics and genomic medicine.

    PubMed

    Kim, Ju Han

    2002-01-01

    Bioinformatics is a rapidly emerging field of biomedical research. A flood of large-scale genomic and postgenomic data means that many of the challenges in biomedical research are now challenges in computational science. Clinical informatics has long developed methodologies to improve biomedical research and clinical care by integrating experimental and clinical information systems. The informatics revolution in both bioinformatics and clinical informatics will eventually change the current practice of medicine, including diagnostics, therapeutics, and prognostics. Postgenome informatics, powered by high-throughput technologies and genomic-scale databases, is likely to transform our biomedical understanding forever, in much the same way that biochemistry did a generation ago. This paper describes how these technologies will impact biomedical research and clinical care, emphasizing recent advances in biochip-based functional genomics and proteomics. Basic data preprocessing with normalization and filtering, primary pattern analysis, and machine-learning algorithms are discussed. Use of integrative biochip informatics technologies, including multivariate data projection, gene-metabolic pathway mapping, automated biomolecular annotation, text mining of factual and literature databases, and the integrated management of biomolecular databases, are also discussed. PMID:12544491

  7. Bioinformatics-Aided Venomics

    PubMed Central

    Kaas, Quentin; Craik, David J.

    2015-01-01

    Venomics is a modern approach that combines transcriptomics and proteomics to explore the toxin content of venoms. This review will give an overview of computational approaches that have been created to classify and consolidate venomics data, as well as algorithms that have helped discovery and analysis of toxin nucleic acid and protein sequences, toxin three-dimensional structures and toxin functions. Bioinformatics is used to tackle specific challenges associated with the identification and annotations of toxins. Recognizing toxin transcript sequences among second generation sequencing data cannot rely only on basic sequence similarity because toxins are highly divergent. Mass spectrometry sequencing of mature toxins is challenging because toxins can display a large number of post-translational modifications. Identifying the mature toxin region in toxin precursor sequences requires the prediction of the cleavage sites of proprotein convertases, most of which are unknown or not well characterized. Tracing the evolutionary relationships between toxins should consider specific mechanisms of rapid evolution as well as interactions between predatory animals and prey. Rapidly determining the activity of toxins is the main bottleneck in venomics discovery, but some recent bioinformatics and molecular modeling approaches give hope that accurate predictions of toxin specificity could be made in the near future. PMID:26110505

  8. Identification of potential therapeutic target genes and mechanisms in head and neck squamous cell carcinoma by bioinformatics analysis

    PubMed Central

    KUANG, JING; ZHAO, MEI; LI, HUILIAN; DANG, WEI; LI, WEI

    2016-01-01

    The present study aimed to identify the potential target genes and underlying molecular mechanisms involved in head and neck squamous cell carcinoma (HNSCC) by bioinformatics analysis. Microarray data of a Gene Expression Omnibus series GSE6631 was downloaded from the Gene Expression Omnibus database, which was generated from paired samples of HNSCC and normal tissue from 22 patients, and was used to identify differentially expressed genes (DEGs). Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes enrichment analyses were performed to investigate the functions of the identified DEGs. Furthermore, the protein-protein interaction (PPI) network of these DEGs was constructed using Cytoscape software. Between HNSCC and normal samples there was a difference in 419 DEGs, including 196 upregulated and 223 downregulated genes. The upregulated DEGs were mainly enriched in GO terms of cell adhesion, extracellular matrix (ECM) organization and collagen metabolic process, while the downregulated DEGs were mainly associated with epidermis development and epidermal cell differentiation. The DEGs were enriched in pathways such as ECM-receptor interaction, focal adhesion and drug metabolism. Fibronectin 1 (FN1), epidermal growth factor receptor (EGFR), collagen type I alpha 1 (COL1A1) and matrix metallopeptidase-9 (MMP-9) were hub nodes in the PPI network. These results suggested that cell adhesion and drug metabolism may be associated with HNSCC development, and genes such as FN1, EGFR, COL4A1 and MMP-9 may be potential therapeutic target genes in HNSCC. PMID:27123054

  9. Bioinformatics analysis and characteristics of VP23 encoded by the newly identified UL18 gene of duck enteritis virus

    NASA Astrophysics Data System (ADS)

    Chen, Xiwen; Cheng, Anchun; Wang, Mingshu; Xiang, Jun

    2011-10-01

    In this study, the predicted information about structures and functions of VP23 encoded by the newly identified DEV UL18 gene through bioinformatics softwares and tools. The DEV UL18 was predicted to encode a polypeptide with 322 amino acids, termed VP23, with a putative molecular mass of 35.250 kDa and a predicted isoelectric point (PI) of 8.37, no signal peptide and transmembrane domain in the polypeptide. The prediction of subcellular localization showed that the DEV-VP23 located at endoplasmic reticulum with 33.3%, mitochondrial with 22.2%, extracellular, including cell wall with 11.1%, vesicles of secretory system with 11.1%, Golgi with 11.1%, and plasma membrane with 11.1%. The acid sequence of analysis showed that the potential antigenic epitopes are situated in 45-47, 53-60, 102-105, 173-180, 185-189, 260-265, 267-271, and 292-299 amino acids. All the consequences inevitably provide some insights for further research about the DEV-VP23 and also provide a fundament for further study on the the new type clinical diagnosis of DEV and can be used for the development of new DEV vaccine.

  10. A bioinformatics analysis of Lamin-A regulatory network: a perspective on epigenetic involvement in Hutchinson-Gilford progeria syndrome.

    PubMed

    Arancio, Walter

    2012-04-01

    Hutchinson-Gilford progeria syndrome (HGPS) is a rare human genetic disease that leads to premature aging. HGPS is caused by mutation in the Lamin-A (LMNA) gene that leads, in affected young individuals, to the accumulation of the progerin protein, usually present only in aging differentiated cells. Bioinformatics analyses of the network of interactions of the LMNA gene and transcripts are presented. The LMNA gene network has been analyzed using the BioGRID database (http://thebiogrid.org/) and related analysis tools such as Osprey (http://biodata.mshri.on.ca/osprey/servlet/Index) and GeneMANIA ( http://genemania.org/). The network of interaction of LMNA transcripts has been further analyzed following the competing endogenous (ceRNA) hypotheses (RNA cross-talk via microRNAs [miRNAs]) and using the miRWalk database and tools (www.ma.uni-heidelberg.de/apps/zmf/mirwalk/). These analyses suggest particular relevance of epigenetic modifiers (via acetylase complexes and specifically HTATIP histone acetylase) and adenosine triphosphate (ATP)-dependent chromatin remodelers (via pBAF, BAF, and SWI/SNF complexes). PMID:22533413

  11. Ready to use bioinformatics analysis as a tool to predict immobilisation strategies for protein direct electron transfer (DET).

    PubMed

    Cazelles, R; Lalaoui, N; Hartmann, T; Leimkühler, S; Wollenberger, U; Antonietti, M; Cosnier, S

    2016-11-15

    Direct electron transfer (DET) to proteins is of considerable interest for the development of biosensors and bioelectrocatalysts. While protein structure is mainly used as a method of attaching the protein to the electrode surface, we employed bioinformatics analysis to predict the suitable orientation of the enzymes to promote DET. Structure similarity and secondary structure prediction were combined underlying localized amino-acids able to direct one of the enzyme's electron relays toward the electrode surface by creating a suitable bioelectrocatalytic nanostructure. The electro-polymerization of pyrene pyrrole onto a fluorine-doped tin oxide (FTO) electrode allowed the targeted orientation of the formate dehydrogenase enzyme from Rhodobacter capsulatus (RcFDH) by means of hydrophobic interactions. Its electron relays were directed to the FTO surface, thus promoting DET. The reduction of nicotinamide adenine dinucleotide (NAD(+)) generating a maximum current density of 1μAcm(-2) with 10mM NAD(+) leads to a turnover number of 0.09electron/s/molRcFDH. This work represents a practical approach to evaluate electrode surface modification strategies in order to create valuable bioelectrocatalysts. PMID:27156017

  12. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  13. Visualising "Junk" DNA through Bioinformatics

    ERIC Educational Resources Information Center

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  14. A bioinformatics insight to rhizobial globins: gene identification and mapping, polypeptide sequence and phenetic analysis, and protein modeling.

    PubMed Central

    Gesto-Borroto, Reinier; Sánchez-Sánchez, Miriam; Arredondo-Peter, Raúl

    2015-01-01

    Globins (Glbs) are proteins widely distributed in organisms. Three evolutionary families have been identified in Glbs: the M, S and T Glb families. The M Glbs include flavohemoglobins (fHbs) and single-domain Glbs (SDgbs); the S Glbs include globin-coupled sensors (GCSs), protoglobins and sensor single domain globins, and the T Glbs include truncated Glbs (tHbs). Structurally, the M and S Glbs exhibit 3/3-folding whereas the T Glbs exhibit 2/2-folding. Glbs are widespread in bacteria, including several rhizobial genomes. However, only few rhizobial Glbs have been characterized. Hence, we characterized Glbs from 62 rhizobial genomes using bioinformatics methods such as data mining in databases, sequence alignment, phenogram construction and protein modeling. Also, we analyzed soluble extracts from Bradyrhizobium japonicum USDA38 and USDA58 by (reduced + carbon monoxide (CO) minus reduced) differential spectroscopy. Database searching showed that only fhb, sdgb, gcs and thb genes exist in the rhizobia analyzed in this work. Promoter analysis revealed that apparently several rhizobial glb genes are not regulated by a -10 promoter but might be regulated by -35 and Fnr (fumarate-nitrate reduction regulator)-like promoters. Mapping analysis revealed that rhizobial fhbs and thbs are flanked by a variety of genes whereas several rhizobial sdgbs and gcss are flanked by genes coding for proteins involved in the metabolism of nitrates and nitrites and chemotaxis, respectively. Phenetic analysis showed that rhizobial Glbs segregate into the M, S and T Glb families, while structural analysis showed that predicted rhizobial SDgbs and fHbs and GCSs globin domain and tHbs fold into the 3/3- and 2/2-folding, respectively. Spectra from B. japonicum USDA38 and USDA58 soluble extracts exhibited peaks and troughs characteristic of bacterial and vertebrate Glbs thus indicating that putative Glbs are synthesized in B. japonicum USDA38 and USDA58. PMID:26594329

  15. Identification and characterization of microRNAs in Eucheuma denticulatum by high-throughput sequencing and bioinformatics analysis.

    PubMed

    Gao, Fan; Nan, Fangru; Feng, Jia; Lv, Junping; Liu, Qi; Xie, Shulian

    2016-01-01

    Eucheuma denticulatum, an economically and industrially important red alga, is a valuable marine resource. Although microRNAs (miRNAs) play an essential role in gene post-transcriptional regulation, no research has been conducted to identify and characterize miRNAs in E. denticulatum. In this study, we identified 134 miRNAs (133 conserved miRNAs and one novel miRNA) from 2,997,135 small-RNA reads by high-throughput sequencing combined with bioinformatics analysis. BLAST searching against miRBase uncovered 126 potential miRNA families. A conservation and diversity analysis of predicted miRNA families in different plant species was performed by comparative alignment and homology searching. A total of 4 and 13 randomly selected miRNAs were respectively validated by northern blotting and stem-loop reverse transcription PCR, thereby demonstrating the reliability of the miRNA sequencing data. Altogether, 871 potential target genes were predicted using psRobot and TargetFinder. Target genes classification and enrichment were conducted based on Gene Ontology analysis. The functions of target gene products and associated metabolic pathways were predicted by Kyoto Encyclopedia of Genes and Genomes pathway analysis. A Cytoscape network was constructed to explore the interrelationships of miRNAs, miRNA-target genes and target genes. A large number of miRNAs with diverse target genes will play important roles for further understanding some essential biological processes in E. denticulatum. The uncovered information can serve as an important reference for the protection and utilization of this unique red alga in the future. PMID:26717154

  16. Bioinformatics and Moonlighting Proteins

    PubMed Central

    Hernández, Sergio; Franco, Luís; Calvo, Alejandra; Ferragut, Gabriela; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2015-01-01

    Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein–protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations – it requires the existence of multialigned family protein sequences – but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses. PMID:26157797

  17. Bioinformatics and Moonlighting Proteins.

    PubMed

    Hernández, Sergio; Franco, Luís; Calvo, Alejandra; Ferragut, Gabriela; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2015-01-01

    Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein-protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations - it requires the existence of multialigned family protein sequences - but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses. PMID:26157797

  18. Using bioinformatics tools for the sequence analysis of immunoglobulins and T cell receptors.

    PubMed

    Lefranc, Marie-Paule

    2006-03-01

    The huge potential repertoire of 10(12) immunoglobulins and 10(12) T cell receptors per individual results from complex mechanisms of combinatorial diversity between the variable (V), diversity (D), and junction (J) genes, nucleotide deletions and insertions (N-diversity) at the junctions and, for the immunoglobulins, somatic hypermutations. The accurate analysis of rearranged immunoglobulin and T cell receptor sequences, and the annotation of the junctions, therefore represent a huge challenge. The IMGT Scientific chart rules, based on the IMGT-ONTOLOGY concepts, were the prerequisites for the implementation of the IMGT/V-QUEST and IMGT/JunctionAnalysis tools. IMGT/V-QUEST analyzes germline V and rearranged V-J or V-D-J nucleotide sequences. IMGT/JunctionAnalysis is the first tool that automatically analyzes the complex junctions in detail. These interactive tools are easy to use and freely available on the Web (http://imgt.cines.fr), either separately or integrated. PMID:18432961

  19. Bioinformatics analysis of differentially expressed pathways related to the metastatic characteristics of osteosarcoma

    PubMed Central

    Sun, Wei; Ma, Xiaojun; Shen, Jiakang; Yin, Fei; Wang, Chongren; Cai, Zhengdong

    2016-01-01

    In this study, gene expression data of osteosarcoma (OSA) were analyzed to identify metastasis-related biological pathways. Four gene expression data sets (GSE21257, GSE9508, GSE49003 and GSE66673) were downloaded from Gene Expression Omnibus (GEO). An analysis of differentially expressed genes (DEGs) was performed using the Significance Analysis of Microarray (SAM) method. Gene expression levels were converted into scores of pathways by the Functional Analysis of Individual Microarray Expression (FAIME) algorithm and the differentially expressed pathways (DEPs) were then disclosed by a t-test. The distinguishing and prediction ability of the DEPs for metastatic and non-metastatic OSA was further confirmed using the principal component analysis (PCA) method and 3 gene expression data sets (GSE9508, GSE49003 and GSE66673) based on the support vector machines (SVM) model. A total of 616 downregulated and 681 upregulated genes were identified in the data set, GSE21257. The DEGs could not be used to distinguish metastatic OSA from non-metastatic OSA, as shown by PCA. Thus, an analysis of DEPs was further performed, resulting in 14 DEPs, such as NRAS signaling, Toll-like receptor (TLR) signaling, matrix metalloproteinase (MMP) regulation of cytokines and tumor necrosis factor receptor-associated factor (TRAF)-mediated interferon regulatory factor 7 (IRF7) activation. Cluster analysis indicated that these pathways could be used to distinguish between metastatic OSA from non-metastatic OSA. The prediction accuracy was 91, 66.7 and 87.5% for the data sets, GSE9508, GSE49003 and GSE66673, respectively. The results of PCA further validated that the DEPs could be used to distinguish metastatic OSA from non-metastatic OSA. On the whole, several DEPs were identified in metastatic OSA compared with non-metastatic OSA. Further studies on these pathways and relevant genes may help to enhance our understanding of the molecular mechanisms underlying metastasis and may thus aid in

  20. Antimicrobial Protein Candidates from the Thermophilic Geobacillus sp. Strain ZGt-1: Production, Proteomics, and Bioinformatics Analysis

    PubMed Central

    Alkhalili, Rawana N.; Bernfur, Katja; Dishisha, Tarek; Mamo, Gashaw; Schelin, Jenny; Canbäck, Björn; Emanuelsson, Cecilia; Hatti-Kaul, Rajni

    2016-01-01

    A thermophilic bacterial strain, Geobacillus sp. ZGt-1, isolated from Zara hot spring in Jordan, was capable of inhibiting the growth of the thermophilic G. stearothermophilus and the mesophilic Bacillus subtilis and Salmonella typhimurium on a solid cultivation medium. Antibacterial activity was not observed when ZGt-1 was cultivated in a liquid medium; however, immobilization of the cells in agar beads that were subjected to sequential batch cultivation in the liquid medium at 60 °C showed increasing antibacterial activity up to 14 cycles. The antibacterial activity was lost on protease treatment of the culture supernatant. Concentration of the protein fraction by ammonium sulphate precipitation followed by denaturing polyacrylamide gel electrophoresis separation and analysis of the gel for antibacterial activity against G. stearothermophilus showed a distinct inhibition zone in 15–20 kDa range, suggesting that the active molecule(s) are resistant to denaturation by SDS. Mass spectrometric analysis of the protein bands around the active region resulted in identification of 22 proteins with molecular weight in the range of interest, three of which were new and are here proposed as potential antimicrobial protein candidates by in silico analysis of their amino acid sequences. Mass spectrometric analysis also indicated the presence of partial sequences of antimicrobial enzymes, amidase and dd-carboxypeptidase. PMID:27548162

  1. Antimicrobial Protein Candidates from the Thermophilic Geobacillus sp. Strain ZGt-1: Production, Proteomics, and Bioinformatics Analysis.

    PubMed

    Alkhalili, Rawana N; Bernfur, Katja; Dishisha, Tarek; Mamo, Gashaw; Schelin, Jenny; Canbäck, Björn; Emanuelsson, Cecilia; Hatti-Kaul, Rajni

    2016-01-01

    A thermophilic bacterial strain, Geobacillus sp. ZGt-1, isolated from Zara hot spring in Jordan, was capable of inhibiting the growth of the thermophilic G. stearothermophilus and the mesophilic Bacillus subtilis and Salmonella typhimurium on a solid cultivation medium. Antibacterial activity was not observed when ZGt-1 was cultivated in a liquid medium; however, immobilization of the cells in agar beads that were subjected to sequential batch cultivation in the liquid medium at 60 °C showed increasing antibacterial activity up to 14 cycles. The antibacterial activity was lost on protease treatment of the culture supernatant. Concentration of the protein fraction by ammonium sulphate precipitation followed by denaturing polyacrylamide gel electrophoresis separation and analysis of the gel for antibacterial activity against G. stearothermophilus showed a distinct inhibition zone in 15-20 kDa range, suggesting that the active molecule(s) are resistant to denaturation by SDS. Mass spectrometric analysis of the protein bands around the active region resulted in identification of 22 proteins with molecular weight in the range of interest, three of which were new and are here proposed as potential antimicrobial protein candidates by in silico analysis of their amino acid sequences. Mass spectrometric analysis also indicated the presence of partial sequences of antimicrobial enzymes, amidase, and dd-carboxypeptidase. PMID:27548162

  2. Microarray gene expression profiling and bioinformatics analysis of premature ovarian failure in a rat model.

    PubMed

    Li, Ji; Fan, Shengjun; Han, Dongwei; Xie, Jiaming; Kuang, Haixue; Ge, Pengling

    2014-12-01

    Premature ovarian failure (POF) remains one of the major gynecological problems worldwide which affected 1% of women. Even though tremendous achievements had been acquired as opposed to years past, molecular pathogenesis associated with POF is still unclear and needs to be well-defined. The aim of this study was to analyze the gene expression profiles in the POF rat model. To predict potential regulating factors, we firstly treated female Sprague Dawley (SD) rat with 4-vinylcyclohexene diepoxide (VCD). Total RNA from ovarian tissue was converted to cDNA and hybridized to mRNA Chip array. The differentially expressed genes (DEGs) were identified by two-sample t test and assessed using hierarchical clustering and Principal Component Analysis methods. Potential regulatory targets associated with these DEGs were constructed using BisoGenet in Cytoscape. Gene Ontology (GO) and functional enrichment analysis were performed using BiNGO and DAVID, respectively. As the results, 25 DEGs were found to be closely associated with POF initiation. Hierarchical clustering and Principal Component Analysis on the transcriptional profiles revealed an excellent separation of the vehicle and POF compartments. Pathway enrichment analysis based on the disease-gene interaction network analysis led to the identification of two core signaling pathways that were strongly affected during POF initiation and progression: immune response and cardiovascular disorders. In conclusion, we constructed a gene regulatory network associated with POF using the microarray gene expression profiling, and screened out some genes or transcription factors that may be used as potential molecular therapeutic targets for POF. PMID:25445499

  3. Bioinformatics analysis of differentially expressed pathways related to the metastatic characteristics of osteosarcoma.

    PubMed

    Sun, Wei; Ma, Xiaojun; Shen, Jiakang; Yin, Fei; Wang, Chongren; Cai, Zhengdong

    2016-08-01

    In this study, gene expression data of osteosarcoma (OSA) were analyzed to identify metastasis-related biological pathways. Four gene expression data sets (GSE21257, GSE9508, GSE49003 and GSE66673) were downloaded from Gene Expression Omnibus (GEO). An analysis of differentially expressed genes (DEGs) was performed using the Significance Analysis of Microarray (SAM) method. Gene expression levels were converted into scores of pathways by the Functional Analysis of Individual Microarray Expression (FAIME) algorithm and the differentially expressed pathways (DEPs) were then disclosed by a t-test. The distinguishing and prediction ability of the DEPs for metastatic and non-metastatic OSA was further confirmed using the principal component analysis (PCA) method and 3 gene expression data sets (GSE9508, GSE49003 and GSE66673) based on the support vector machines (SVM) model. A total of 616 downregulated and 681 upregulated genes were identified in the data set, GSE21257. The DEGs could not be used to distinguish metastatic OSA from non-metastatic OSA, as shown by PCA. Thus, an analysis of DEPs was further performed, resulting in 14 DEPs, such as NRAS signaling, Toll-like receptor (TLR) signaling, matrix metalloproteinase (MMP) regulation of cytokines and tumor necrosis factor receptor-associated factor (TRAF)-mediated interferon regulatory factor 7 (IRF7) activation. Cluster analysis indicated that these pathways could be used to distinguish between metastatic OSA from non-metastatic OSA. The prediction accuracy was 91, 66.7 and 87.5% for the data sets, GSE9508, GSE49003 and GSE66673, respectively. The results of PCA further validated that the DEPs could be used to distinguish metastatic OSA from non-metastatic OSA. On the whole, several DEPs were identified in metastatic OSA compared with non-metastatic OSA. Further studies on these pathways and relevant genes may help to enhance our understanding of the molecular mechanisms underlying metastasis

  4. Bioinformatics analysis of codon usage patterns and influencing factors in Penaeus monodon nudivirus.

    PubMed

    Tyagi, Anuj; Singh, Niraj K; Gurtler, Volker; Karunasagar, Indrani

    2016-02-01

    Penaeus monodon nudivirus (PmNV) is one of the most important and most commonly reported shrimp viruses. In the present study, codon usage of PmNV was studied in detail. Based on effective number of codons (ENC) values, strong to low codon usage bias was observed in PmNV genes. Nucleotide composition-ENC correlation analysis and the GC3 versus ENC relationship indicated that compositional constraint has a major effect on codon usage of PmNV. At the whole-genome level, relative synonymous codon usage (RSCU) analysis showed almost complete antagonism between the codon usage pattern of PmNV and its host P. monodon. However, codon adaptive index (CAI) values indicated that forces of selective/translational constraints have been able to overcome this antagonism in some genes. PMID:26586333

  5. FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections

    PubMed Central

    Alam, Khalid K; Chang, Jonathan L; Burke, Donald H

    2015-01-01

    High-throughput sequence (HTS) analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.). FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html. PMID:25734917

  6. Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures.

    PubMed

    Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka

    2015-11-01

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety. PMID:26551289

  7. Genetic overlap between type 2 diabetes and major depressive disorder identified by bioinformatics analysis

    PubMed Central

    Ji, Hong-Fang; Zhuang, Qi-Shuai; Shen, Liang

    2016-01-01

    Our study investigated the shared genetic etiology underlying type 2 diabetes (T2D) and major depressive disorder (MDD) by analyzing large-scale genome wide association studies statistics. A total of 496 shared SNPs associated with both T2D and MDD were identified at p-value ≤ 1.0E-07. Functional enrichment analysis showed that the enriched pathways pertained to immune responses (Fc gamma R-mediated phagocytosis, T cell and B cell receptors signaling), cell signaling (MAPK, Wnt signaling), lipid metabolism, and cancer associated pathways. The findings will have potential implications for future interventional studies of the two diseases. PMID:27007159

  8. Bioinformatics analysis of the structural and evolutionary characteristics for toll-like receptor 15

    PubMed Central

    Wang, Jinlan; Chang, Fen

    2016-01-01

    Toll-like receptors (TLRs) play important role in the innate immune system. TLR15 is reported to have a unique role in defense against pathogens, but its structural and evolution characterizations are still poorly understood. In this study, we identified 57 completed TLR15 genes from avian and reptilian genomes. TLR15 clustered into an individual clade and was closely related to family 1 on the phylogenetic tree. Unlike the TLRs in family 1 with the broken asparagine ladders in the middle, TLR15 ectodomain had an intact asparagine ladder that is critical to maintain the overall shape of ectodomain. The conservation analysis found that TLR15 ectodomain had a highly evolutionarily conserved region on the convex surface of LRR11 module, which is probably involved in TLR15 activation process. Furthermore, the protein–protein docking analysis indicated that TLR15 TIR domains have the potential to form homodimers, the predicted interaction interface of TIR dimer was formed mainly by residues from the BB-loops and αC-helixes. Although TLR15 mainly underwent purifying selection, we detected 27 sites under positive selection for TLR15, 24 of which are located on its ectodomain. Our observations suggest the structural features of TLR15 which may be relevant to its function, but which requires further experimental validation. PMID:27257554

  9. Evaluation of Bioinformatic Programmes for the Analysis of Variants within Splice Site Consensus Regions

    PubMed Central

    Tang, Rongying; Prosser, Debra O.; Love, Donald R.

    2016-01-01

    The increasing diagnostic use of gene sequencing has led to an expanding dataset of novel variants that lie within consensus splice junctions. The challenge for diagnostic laboratories is the evaluation of these variants in order to determine if they affect splicing or are merely benign. A common evaluation strategy is to use in silico analysis, and it is here that a number of programmes are available online; however, currently, there are no consensus guidelines on the selection of programmes or protocols to interpret the prediction results. Using a collection of 222 pathogenic mutations and 50 benign polymorphisms, we evaluated the sensitivity and specificity of four in silico programmes in predicting the effect of each variant on splicing. The programmes comprised Human Splice Finder (HSF), Max Entropy Scan (MES), NNSplice, and ASSP. The MES and ASSP programmes gave the highest performance based on Receiver Operator Curve analysis, with an optimal cut-off of score reduction of 10%. The study also showed that the sensitivity of prediction is affected by the level of conservation of individual positions, with in silico predictions for variants at positions −4 and +7 within consensus splice sites being largely uninformative. PMID:27313609

  10. Bioinformatics analysis of the structural and evolutionary characteristics for toll-like receptor 15.

    PubMed

    Wang, Jinlan; Zhang, Zheng; Chang, Fen; Yin, Deling

    2016-01-01

    Toll-like receptors (TLRs) play important role in the innate immune system. TLR15 is reported to have a unique role in defense against pathogens, but its structural and evolution characterizations are still poorly understood. In this study, we identified 57 completed TLR15 genes from avian and reptilian genomes. TLR15 clustered into an individual clade and was closely related to family 1 on the phylogenetic tree. Unlike the TLRs in family 1 with the broken asparagine ladders in the middle, TLR15 ectodomain had an intact asparagine ladder that is critical to maintain the overall shape of ectodomain. The conservation analysis found that TLR15 ectodomain had a highly evolutionarily conserved region on the convex surface of LRR11 module, which is probably involved in TLR15 activation process. Furthermore, the protein-protein docking analysis indicated that TLR15 TIR domains have the potential to form homodimers, the predicted interaction interface of TIR dimer was formed mainly by residues from the BB-loops and αC-helixes. Although TLR15 mainly underwent purifying selection, we detected 27 sites under positive selection for TLR15, 24 of which are located on its ectodomain. Our observations suggest the structural features of TLR15 which may be relevant to its function, but which requires further experimental validation. PMID:27257554

  11. A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1.

    PubMed

    Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K; Putonti, Catherine

    2016-01-01

    As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543

  12. [Gene cloning and bioinformatics analysis of new gene for chlorogenic acid biosynthesis of Lonicera hypoglauca].

    PubMed

    Yu, Shu-lin; Huang, Lu-qi; Yuan, Yuan; Qi, Lin-jie; Liu, Da-hui

    2015-03-01

    To obtain the key genes for chlorogenic acid biosynthesis of Lonicera hypoglauca, four new genes ware obtained from the our dataset of L. hypoglauca. And we also predicted the structure and function of LHPAL4, LHHCT1 , LHHCT2 and LHHCT3 proteins. The phylogenetic tree showed that LHPAL4 was closely related with LHPAL1, LHHCT1 was closely related with LHHCT3, LHHCT2 clustered into a single group. By Real-time PCR to detect the gene expressed level in different organs of L. hypoglauca, we found that the transcripted level of LHPAL4, LHHCT1 and LHHCT3 was the highest in defeat flowers, and the transcripted level of LHHCT2 was the highest in leaves. These result provided a basis to further analysis the mechanism of active ingredients in different organs, as well as the element for in vitro biosynthesis of active ingredients. PMID:26087546

  13. A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

    PubMed Central

    Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine

    2016-01-01

    As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543

  14. Identification of key pathways and genes in colorectal cancer using bioinformatics analysis.

    PubMed

    Liang, Bin; Li, Chunning; Zhao, Jianying

    2016-10-01

    Colorectal cancer (CRC) is the most common malignant tumor of digestive system. The aim of this study was to identify gene signatures during CRC and uncover their potential mechanisms. The gene expression profiles of GSE21815 were downloaded from GEO database. The GSE21815 dataset contained 141 samples, including 132 CRC and 9 normal colon epitheliums. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed, and protein-protein interaction (PPI) network of the differentially expressed genes (DEGs) was constructed by Cytoscape software. In total, 3500 DEGs were identified in CRC, including 1370 up-regulated genes and 2130 down-regulated genes. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including cell cycle, cell division, and cell proliferation; the down-regulated DEGs were significantly enriched in biological processes, including immune response, intracellular signaling cascade and defense response. KEGG pathway analysis showed the up-regulated DEGs were enriched in cell cycle and DNA replication, while the down-regulated DEGs were enriched in drug metabolism, metabolism of xenobiotics by cytochrome P450, and retinol metabolism pathways. The top 10 hub genes, GNG2, AGT, SAA1, ADCY5, LPAR1, NMU, IL8, CXCL12, GNAI1, and CCR2 were identified from the PPI network, and sub-networks revealed these genes were involved in significant pathways, including G protein-coupled receptors signaling pathway, gastrin-CREB signaling pathway via PKC and MAPK, and extracellular matrix organization. In conclusion, the present study indicated that the identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of CRC, and might be used as molecular targets and diagnostic biomarkers for the treatment of CRC. PMID:27581154

  15. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  16. Identification of neglected cestode Taenia multiceps microRNAs by illumina sequencing and bioinformatic analysis

    PubMed Central

    2013-01-01

    Background Worldwide, but especially in developing countries, coenurosis of sheep and other livestock is caused by Taenia multiceps larvae, and zoonotic infections occur in humans. Infections frequently lead to host death, resulting in huge socioeconomic losses. MicroRNAs (miRNAs) have important roles in the post-transcriptional regulation of a large number of animal genes by imperfectly binding target mRNAs. To date, there have been no reports of miRNAs in T. multiceps. Results In this study, we obtained 12.8 million high quality raw reads from adult T. multiceps small RNA library using Illumina sequencing technology. A total of 796 conserved miRNA families (containing 1,006 miRNAs) from 170,888 unique miRNAs were characterized using miRBase (Release 17.0). Here, we selected three conserved miRNA/miRNA* (antisense strand) duplexes at random and amplified their corresponding precursors using a PCR-based method. Furthermore, 20 candidate novel miRNA precursors were verified by genomic PCR. Among these, six corresponding T. multiceps miRNAs are considered specific for Taeniidae because no homologs were found in other species annotated in miRBase. In addition, 181,077 target sites within T. multiceps transcriptome were predicted for 20 candidate newly miRNAs. Conclusions Our large-scale investigation of miRNAs in adult T. multiceps provides a substantial platform for improving our understanding of the molecular regulation of T. multiceps and other cestodes development. PMID:23941076

  17. The Cinnamyl Alcohol Dehydrogenase Gene Family in Melon (Cucumis melo L.): Bioinformatic Analysis and Expression Patterns

    PubMed Central

    Jin, Yazhong; Zhang, Chong; Liu, Wei; Qi, Hongyan; Chen, Hao; Cao, Songxiao

    2014-01-01

    Cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis. However, little was known about CADs in melon. Five CAD-like genes were identified in the genome of melons, namely CmCAD1 to CmCAD5. The signal peptides analysis and CAD proteins prediction showed no typical signal peptides were found in all CmCADs and CmCAD proteins may locate in the cytoplasm. Multiple alignments implied that some motifs may be responsible for the high specificity of these CAD proteins, and may be one of the key residues in the catalytic mechanism. The phylogenetic tree revealed seven groups of CAD and melon CAD genes fell into four main groups. CmCAD1 and CmCAD2 belonged to the bona fide CAD group, in which these CAD genes, as representative from angiosperms, were involved in lignin synthesis. Other CmCADs were distributed in group II, V and VII, respectively. Semi-quantitative PCR and real time qPCR revealed differential expression of CmCADs, and CmCAD5 was expressed in different vegetative tissues except mature leaves, with the highest expression in flower, while CmCAD2 and CmCAD5 were strongly expressed in flesh during development. Promoter analysis revealed several motifs of CAD genes involved in the gene expression modulated by various hormones. Treatment of abscisic acid (ABA) elevated the expression of CmCADs in flesh, whereas the transcript levels of CmCAD1 and CmCAD5 were induced by auxin (IAA); Ethylene induced the expression of CmCADs, while 1-MCP repressed the effect, apart from CmCAD4. Taken together, these data suggested that CmCAD4 may be a pseudogene and that all other CmCADs may be involved in the lignin biosynthesis induced by both abiotic and biotic stresses and in tissue-specific developmental lignification through a CAD genes family network, and CmCAD2 may be the main CAD enzymes for lignification of melon flesh and CmCAD5 may also function in flower development. PMID:25019207

  18. The origins of bioinformatics.

    PubMed

    Hagen, J B

    2000-12-01

    Bioinformatics is often described as being in its infancy, but computers emerged as important tools in molecular biology during the early 1960s. A decade before DNA sequencing became feasible, computational biologists focused on the rapidly accumulating data from protein biochemistry. Without the benefits of super computers or computer networks, these scientists laid important conceptual and technical foundations for bioinformatics today. PMID:11252753

  19. Modular analysis of bioinformatics demonstrates a critical role for NF-κB in macrophage activation.

    PubMed

    Zhang, Yingmei; Wang, Yingmei; Lu, Ming; Qiao, Xin; Sun, Bei; Zhang, Weihui; Xue, Dongbo

    2014-08-01

    To achieve the goal of identifying the gene groups that regulated macrophage activation, a total of 925 differentially expressed genes of activated macrophages were found at the intersection of the three series (GSE5099-1, GSE5099-2, and GSE18686) from the Gene Expression Omnibus (GEO) database, and a sub-network was constructed based on the protein-protein interaction (PPI) network. Four communities (K = 3) were identified from the sub-network using the CFinder software. Community 1 was considered as the gene group of interest base on the heat map. GO-BP and KEGG enrichment analysis with the DAVID software showed that the functions of the 14 genes in community 1 were mainly related to the NF-κB pathway. A network was constructed using the Cytoscape software. The diagram showed that STAT1, NFKBIA, NFKAIB, JUN, and RELA were the key genes in the regulation of macrophage activation. Among these genes, RELA (NF-κB P65) was an important member of the NF-κB family, while NFKBIA (IκBα) and NFKAIB (IκBβ) were the inhibitory factors of NF-κB. Small molecules capable of regulating these five genes were identified via the CMap software, and a network diagram was generated using the Cytoscape software to provide a reference for the development of new drugs that regulate macrophage activation. PMID:24577727

  20. A Comparative Structural Bioinformatics Analysis of the Insulin Receptor Family Ectodomain Based on Phylogenetic Information

    PubMed Central

    Rentería, Miguel E.; Gandhi, Neha S.; Vinuesa, Pablo; Helmerhorst, Erik; Mancera, Ricardo L.

    2008-01-01

    The insulin receptor (IR), the insulin-like growth factor 1 receptor (IGF1R) and the insulin receptor-related receptor (IRR) are covalently-linked homodimers made up of several structural domains. The molecular mechanism of ligand binding to the ectodomain of these receptors and the resulting activation of their tyrosine kinase domain is still not well understood. We have carried out an amino acid residue conservation analysis in order to reconstruct the phylogeny of the IR Family. We have confirmed the location of ligand binding site 1 of the IGF1R and IR. Importantly, we have also predicted the likely location of the insulin binding site 2 on the surface of the fibronectin type III domains of the IR. An evolutionary conserved surface on the second leucine-rich domain that may interact with the ligand could not be detected. We suggest a possible mechanical trigger of the activation of the IR that involves a slight ‘twist’ rotation of the last two fibronectin type III domains in order to face the likely location of insulin. Finally, a strong selective pressure was found amongst the IRR orthologous sequences, suggesting that this orphan receptor has a yet unknown physiological role which may be conserved from amphibians to mammals. PMID:18989367

  1. The Alcohol Dehydrogenase Gene Family in Melon (Cucumis melo L.): Bioinformatic Analysis and Expression Patterns

    PubMed Central

    Jin, Yazhong; Zhang, Chong; Liu, Wei; Tang, Yufan; Qi, Hongyan; Chen, Hao; Cao, Songxiao

    2016-01-01

    Alcohol dehydrogenases (ADH), encoded by multigene family in plants, play a critical role in plant growth, development, adaptation, fruit ripening and aroma production. Thirteen ADH genes were identified in melon genome, including 12 ADHs and one formaldehyde dehydrogenease (FDH), designated CmADH1-12 and CmFDH1, in which CmADH1 and CmADH2 have been isolated in Cantaloupe. ADH genes shared a lower identity with each other at the protein level and had different intron-exon structure at nucleotide level. No typical signal peptides were found in all CmADHs, and CmADH proteins might locate in the cytoplasm. The phylogenetic tree revealed that 13 ADH genes were divided into three groups respectively, namely long-, medium-, and short-chain ADH subfamily, and CmADH1,3-11, which belongs to the medium-chain ADH subfamily, fell into six medium-chain ADH subgroups. CmADH12 may belong to the long-chain ADH subfamily, while CmFDH1 may be a Class III ADH and serve as an ancestral ADH in melon. Expression profiling revealed that CmADH1, CmADH2, CmADH10 and CmFDH1 were moderately or strongly expressed in different vegetative tissues and fruit at medium and late developmental stages, while CmADH8 and CmADH12 were highly expressed in fruit after 20 days. CmADH3 showed preferential expression in young tissues. CmADH4 only had slight expression in root. Promoter analysis revealed several motifs of CmADH genes involved in the gene expression modulated by various hormones, and the response pattern of CmADH genes to ABA, IAA and ethylene were different. These CmADHs were divided into ethylene-sensitive and –insensitive groups, and the functions of CmADHs were discussed. PMID:27242871

  2. Transcriptome Bioinformatical Analysis of Vertebrate Stages of Schistosoma japonicum Reveals Alternative Splicing Events

    PubMed Central

    Wang, Xinye; Xu, Xindong; Lu, Xingyu; Zhang, Yuanbin; Pan, Weiqing

    2015-01-01

    Alternative splicing is a molecular process that contributes greatly to the diversification of proteome and to gene functions. Understanding the mechanisms of stage-specific alternative splicing can provide a better understanding of the development of eukaryotes and the functions of different genes. Schistosoma japonicum is an infectious blood-dwelling trematode with a complex lifecycle that causes the tropical disease schistosomiasis. In this study, we analyzed the transcriptome of Schistosoma japonicum to discover alternative splicing events in this parasite, by applying RNA-seq to cDNA library of adults and schistosomula. Results were validated by RT-PCR and sequencing. We found 11,623 alternative splicing events among 7,099 protein encoding genes and average proportion of alternative splicing events per gene was 42.14%. We showed that exon skip is the most common type of alternative splicing events as found in high eukaryotes, whereas intron retention is the least common alternative splicing type. According to intron boundary analysis, the parasite possesses same intron boundaries as other organisms, namely the classic “GT-AG” rule. And in alternative spliced introns or exons, this rule is less strict. And we have attempted to detect alternative splicing events in genes encoding proteins with signal peptides and transmembrane helices, suggesting that alternative splicing could change subcellular locations of specific gene products. Our results indicate that alternative splicing is prevalent in this parasitic worm, and that the worm is close to its hosts. The revealed secretome involved in alternative splicing implies new perspective into understanding interaction between the parasite and its host. PMID:26407301

  3. Transcriptome Bioinformatical Analysis of Vertebrate Stages of Schistosoma japonicum Reveals Alternative Splicing Events.

    PubMed

    Wang, Xinye; Xu, Xindong; Lu, Xingyu; Zhang, Yuanbin; Pan, Weiqing

    2015-01-01

    Alternative splicing is a molecular process that contributes greatly to the diversification of proteome and to gene functions. Understanding the mechanisms of stage-specific alternative splicing can provide a better understanding of the development of eukaryotes and the functions of different genes. Schistosoma japonicum is an infectious blood-dwelling trematode with a complex lifecycle that causes the tropical disease schistosomiasis. In this study, we analyzed the transcriptome of Schistosoma japonicum to discover alternative splicing events in this parasite, by applying RNA-seq to cDNA library of adults and schistosomula. Results were validated by RT-PCR and sequencing. We found 11,623 alternative splicing events among 7,099 protein encoding genes and average proportion of alternative splicing events per gene was 42.14%. We showed that exon skip is the most common type of alternative splicing events as found in high eukaryotes, whereas intron retention is the least common alternative splicing type. According to intron boundary analysis, the parasite possesses same intron boundaries as other organisms, namely the classic "GT-AG" rule. And in alternative spliced introns or exons, this rule is less strict. And we have attempted to detect alternative splicing events in genes encoding proteins with signal peptides and transmembrane helices, suggesting that alternative splicing could change subcellular locations of specific gene products. Our results indicate that alternative splicing is prevalent in this parasitic worm, and that the worm is close to its hosts. The revealed secretome involved in alternative splicing implies new perspective into understanding interaction between the parasite and its host. PMID:26407301

  4. Identification of Immunoreactive Leishmania infantum Protein Antigens to Asymptomatic Dog Sera through Combined Immunoproteomics and Bioinformatics Analysis.

    PubMed

    Agallou, Maria; Athanasiou, Evita; Samiotaki, Martina; Panayotou, George; Karagouni, Evdokia

    2016-01-01

    Leishmania infantum is the etiologic agent of zoonotic visceral leishmaniasis (VL) in countries in the Mediterranean basin, where dogs are the domestic reservoirs and represent important elements in the transmission of the disease. Since the major focal areas of human VL exhibit a high prevalence of seropositive dogs, the control of canine VL could reduce the infection rate in humans. Efforts toward this have focused on the improvement of diagnostic tools, as well as on vaccine development. The identification of parasite antigens including suitable major histocompatibility complex (MHC) class I- and/or II-restricted epitopes is very important since disease protection is characterized by strong and long-lasting CD8+ T and CD4+ Th1 cell-dominated immunity. In the present study, total protein extract from late-log phase L. infantum promastigotes was analyzed by two-dimensional western blots and probed with sera from asymptomatic and symptomatic dogs. A total of 42 protein spots were found to differentially react with IgG from asymptomatic dogs, while 17 of these identified by Coommasie stain were extracted and analyzed. Of these, 21 proteins were identified by mass spectrometry; they were mainly involved in metabolism and stress responses. An in silico analysis predicted that the chaperonin HSP60, dihydrolipoamide dehydrogenase, enolase, cyclophilin 2, cyclophilin 40, and one hypothetical protein contain promiscuous MHCI and/or MHCII epitopes. Our results suggest that the combination of immunoproteomics and bioinformatics analyses is a promising method for the identification of novel candidate antigens for vaccine development or with potential use in the development of sensitive diagnostic tests. PMID:26906226

  5. Identification of Immunoreactive Leishmania infantum Protein Antigens to Asymptomatic Dog Sera through Combined Immunoproteomics and Bioinformatics Analysis

    PubMed Central

    Samiotaki, Martina; Panayotou, George; Karagouni, Evdokia

    2016-01-01

    Leishmania infantum is the etiologic agent of zoonotic visceral leishmaniasis (VL) in countries in the Mediterranean basin, where dogs are the domestic reservoirs and represent important elements in the transmission of the disease. Since the major focal areas of human VL exhibit a high prevalence of seropositive dogs, the control of canine VL could reduce the infection rate in humans. Efforts toward this have focused on the improvement of diagnostic tools, as well as on vaccine development. The identification of parasite antigens including suitable major histocompatibility complex (MHC) class I- and/or II-restricted epitopes is very important since disease protection is characterized by strong and long-lasting CD8+ T and CD4+ Th1 cell-dominated immunity. In the present study, total protein extract from late-log phase L. infantum promastigotes was analyzed by two-dimensional western blots and probed with sera from asymptomatic and symptomatic dogs. A total of 42 protein spots were found to differentially react with IgG from asymptomatic dogs, while 17 of these identified by Coommasie stain were extracted and analyzed. Of these, 21 proteins were identified by mass spectrometry; they were mainly involved in metabolism and stress responses. An in silico analysis predicted that the chaperonin HSP60, dihydrolipoamide dehydrogenase, enolase, cyclophilin 2, cyclophilin 40, and one hypothetical protein contain promiscuous MHCI and/or MHCII epitopes. Our results suggest that the combination of immunoproteomics and bioinformatics analyses is a promising method for the identification of novel candidate antigens for vaccine development or with potential use in the development of sensitive diagnostic tests. PMID:26906226

  6. Bioinformatic analysis of microRNA networks following the activation of the constitutive androstane receptor (CAR) in mouse liver.

    PubMed

    Hao, Ruixin; Su, Shengzhong; Wan, Yinan; Shen, Frank; Niu, Ben; Coslo, Denise M; Albert, Istvan; Han, Xing; Omiecinski, Curtis J

    2016-09-01

    The constitutive androstane receptor (CAR; NR1I3) is a member of the nuclear receptor superfamily that functions as a xenosensor, serving to regulate xenobiotic detoxification, lipid homeostasis and energy metabolism. CAR activation is also a key contributor to the development of chemical hepatocarcinogenesis in mice. The underlying pathways affected by CAR in these processes are complex and not fully elucidated. MicroRNAs (miRNAs) have emerged as critical modulators of gene expression and appear to impact many cellular pathways, including those involved in chemical detoxification and liver tumor development. In this study, we used deep sequencing approaches with an Illumina HiSeq platform to differentially profile microRNA expression patterns in livers from wild type C57BL/6J mice following CAR activation with the mouse CAR-specific ligand activator, 1,4-bis-[2-(3,5,-dichloropyridyloxy)] benzene (TCPOBOP). Bioinformatic analyses and pathway evaluations were performed leading to the identification of 51 miRNAs whose expression levels were significantly altered by TCPOBOP treatment, including mmu-miR-802-5p and miR-485-3p. Ingenuity Pathway Analysis of the differentially expressed microRNAs revealed altered effector pathways, including those involved in liver cell growth and proliferation. A functional network among CAR targeted genes and the affected microRNAs was constructed to illustrate how CAR modulation of microRNA expression may potentially mediate its biological role in mouse hepatocyte proliferation. This article is part of a Special Issue entitled: Xenobiotic nuclear receptors: New Tricks for An Old Dog, edited by Dr. Wen Xie. PMID:27080131

  7. Crimean-Congo Hemorrhagic Fever Virus Gn Bioinformatic Analysis and Construction of a Recombinant Bacmid in Order to Express Gn by Baculovirus Expression System

    PubMed Central

    Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza

    2015-01-01

    Background Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. Objectives The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. Materials and Methods To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Results Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Conclusions Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV. PMID:26862379

  8. Additional EIPC Study Analysis. Final Report

    SciTech Connect

    Hadley, Stanton W; Gotham, Douglas J.; Luciani, Ralph L.

    2014-12-01

    Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 14 topics was developed for further analysis. This paper brings together the earlier interim reports of the first 13 topics plus one additional topic into a single final report.

  9. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  10. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  11. Bioinformatics Visualisation Tools: An Unbalanced Picture.

    PubMed

    Broască, Laura; Ancuşa, Versavia; Ciocârlie, Horia

    2016-01-01

    Visualization tools represent a key element in triggering human creativity while being supported with the analysis power of the machine. This paper analyzes free network visualization tools for bioinformatics, frames them in domain specific requirements and compares them. PMID:27577488

  12. Bioinformatics for Genome Analysis

    SciTech Connect

    Gary J. Olsen

    2005-06-30

    Nesbo, Boucher and Doolittle (2001) used phylogenetic trees of four taxa to assess whether euryarchaeal genes share a common history. They have suggested that of the 521 genes examined, each of the three possible tree topologies relating the four taxa was supported essentially equal numbers of times. They suggest that this might be the result of numerous horizontal gene transfer events, essentially randomizing the relationships between gene histories (as inferred in the 521 gene trees) and organismal relationships (which would be a single underlying tree). Motivated by the fact that the order in which sequences are added to a multiple sequence alignment influences the alignment, and ultimately inferred tree, they were interested in the extent to which the variations among inferred trees might be due to variations in the alignment order. This bears directly on their efforts to evaluate and improve upon methods of multiple sequence alignment. They set out to analyze the influence of alignment order on the tree inferred for 43 genes shared among these same 4 taxa. Because alignments produced by CLUSTALW are directed by a rooted guide tree (the denderogram), there are 15 possible alignment orders of 4 taxa. For each gene they tested all 15 alignment orders, and as a 16th option, allowed CLUSTALW to generate its own guide tree. If we supply all 15 possible rooted guide trees, they expected that at least one of them should be as good at CLUSTAL's own guide tree, but most of the time they differed (sometimes being better than CLUSTAL's default tree and sometimes being worse). The difference seems to be that the user-supplied tree is not given meaningful branch lengths, which effect the assumed probability of amino acid changes. They examined the practicality of modifying CLUSTALW to improve its treatment of user-supplied guide trees. This work became ever increasing bogged down in finding and repairing minor bugs in the CLUSTALW code. This effort was put on hold as we feel that our other proposed approaches will ultimately be better.

  13. Quantitative proteomics and bioinformatic analysis provide new insight into the dynamic response of porcine intestine to Salmonella Typhimurium

    PubMed Central

    Collado-Romero, Melania; Aguilar, Carmen; Arce, Cristina; Lucena, Concepción; Codrea, Marius C.; Morera, Luis; Bendixen, Emoke; Moreno, Ángela; Garrido, Juan J.

    2015-01-01

    The enteropathogen Salmonella Typhimurium (S. Typhimurium) is the most commonly non-typhoideal serotype isolated in pig worldwide. Currently, one of the main sources of human infection is by consumption of pork meat. Therefore, prevention and control of salmonellosis in pigs is crucial for minimizing risks to public health. The aim of the present study was to use isobaric tags for relative and absolute quantification (iTRAQ) to explore differences in the response to Salmonella in two segment of the porcine gut (ileum and colon) along a time course of 1, 2, and 6 days post infection (dpi) with S. Typhimurium. A total of 298 proteins were identified in the infected ileum samples of which, 112 displayed significant expression differences due to Salmonella infection. In colon, 184 proteins were detected in the infected samples of which 46 resulted differentially expressed with respect to the controls. The higher number of changes in protein expression was quantified in ileum at 2 dpi. Further biological interpretation of proteomics data using bioinformatics tools demonstrated that the expression changes in colon were found in proteins involved in cell death and survival, tissue morphology or molecular transport at the early stages and tissue regeneration at 6 dpi. In ileum, however, changes in protein expression were mainly related to immunological and infection diseases, inflammatory response or connective tissue disorders at 1 and 2 dpi. iTRAQ has proved to be a proteomic robust approach allowing us to identify ileum as the earliest response focus upon S. Typhimurium in the porcine gut. In addition, new functions involved in the response to bacteria such as eIF2 signaling, free radical scavengers or antimicrobial peptides (AMP) expression have been identified. Finally, the impairment at of the enterohepatic circulation of bile acids and lipid metabolism by means the under regulation of FABP6 protein and FXR/RXR and LXR/RXR signaling pathway in ileum has been

  14. CDH1/E-cadherin and solid tumors. An updated gene-disease association analysis using bioinformatics tools.

    PubMed

    Abascal, María Florencia; Besso, María José; Rosso, Marina; Mencucci, María Victoria; Aparicio, Evangelina; Szapiro, Gala; Furlong, Laura Inés; Vazquez-Levin, Mónica Hebe

    2016-02-01

    Cancer is a group of diseases that causes millions of deaths worldwide. Among cancers, Solid Tumors (ST) stand-out due to their high incidence and mortality rates. Disruption of cell-cell adhesion is highly relevant during tumor progression. Epithelial-cadherin (protein: E-cadherin, gene: CDH1) is a key molecule in cell-cell adhesion and an abnormal expression or/and function(s) contributes to tumor progression and is altered in ST. A systematic study was carried out to gather and summarize current knowledge on CDH1/E-cadherin and ST using bioinformatics resources. The DisGeNET database was exploited to survey CDH1-associated diseases. Reported mutations in specific ST were obtained by interrogating COSMIC and IntOGen tools. CDH1 Single Nucleotide Polymorphisms (SNP) were retrieved from the dbSNP database. DisGeNET analysis identified 609 genes annotated to ST, among which CDH1 was listed. Using CDH1 as query term, 26 disease concepts were found, 21 of which were neoplasms-related terms. Using DisGeNET ALL Databases, 172 disease concepts were identified. Of those, 80 ST disease-related terms were subjected to manual curation and 75/80 (93.75%) associations were validated. On selected ST, 489 CDH1 somatic mutations were listed in COSMIC and IntOGen databases. Breast neoplasms had the highest CDH1-mutation rate. CDH1 was positioned among the 20 genes with highest mutation frequency and was confirmed as driver gene in breast cancer. Over 14,000 SNP for CDH1 were found in the dbSNP database. This report used DisGeNET to gather/compile current knowledge on gene-disease association for CDH1/E-cadherin and ST; data curation expanded the number of terms that relate them. An updated list of CDH1 somatic mutations was obtained with COSMIC and IntOGen databases and of SNP from dbSNP. This information can be used to further understand the role of CDH1/E-cadherin in health and disease. PMID:26674224

  15. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  16. Genomic and Bioinformatics Analysis of HAdV-4, a Human Adenovirus Causing Acute Respiratory Disease: Implications for Gene Therapy and Vaccine Vector Development

    PubMed Central

    Purkayastha, Anjan; Ditty, Susan E.; Su, Jing; McGraw, John; Hadfield, Ted L.; Tibbetts, Clark; Seto, Donald

    2005-01-01

    Human adenovirus serotype 4 (HAdV-4) is a reemerging viral pathogenic agent implicated in epidemic outbreaks of acute respiratory disease (ARD). This report presents a genomic and bioinformatics analysis of the prototype 35,990-nucleotide genome (GenBank accession no. AY594253). Intriguingly, the genome analysis suggests a closer phylogenetic relationship with the chimpanzee adenoviruses (simian adenoviruses) rather than with other human adenoviruses, suggesting a recent origin of HAdV-4, and therefore species E, through a zoonotic event from chimpanzees to humans. Bioinformatics analysis also suggests a pre-zoonotic recombination event, as well, between species B-like and species C-like simian adenoviruses. These observations may have implications for the current interest in using chimpanzee adenoviruses in the development of vectors for human gene therapy and for DNA-based vaccines. Also, the reemergence, surveillance, and treatment of HAdV-4 as an ARD pathogen is an opportunity to demonstrate the use of genome determination as a tool for viral infectious disease characterization and epidemic outbreak surveillance: for example, rapid and accurate low-pass sequencing and analysis of the genome. In particular, this approach allows the rapid identification and development of unique probes for the differentiation of family, species, serotype, and strain (e.g., pathogen genome signatures) for monitoring epidemic outbreaks of ARD. PMID:15681456

  17. Genomic and bioinformatics analysis of HAdV-4, a human adenovirus causing acute respiratory disease: implications for gene therapy and vaccine vector development.

    PubMed

    Purkayastha, Anjan; Ditty, Susan E; Su, Jing; McGraw, John; Hadfield, Ted L; Tibbetts, Clark; Seto, Donald

    2005-02-01

    Human adenovirus serotype 4 (HAdV-4) is a reemerging viral pathogenic agent implicated in epidemic outbreaks of acute respiratory disease (ARD). This report presents a genomic and bioinformatics analysis of the prototype 35,990-nucleotide genome (GenBank accession no. AY594253). Intriguingly, the genome analysis suggests a closer phylogenetic relationship with the chimpanzee adenoviruses (simian adenoviruses) rather than with other human adenoviruses, suggesting a recent origin of HAdV-4, and therefore species E, through a zoonotic event from chimpanzees to humans. Bioinformatics analysis also suggests a pre-zoonotic recombination event, as well, between species B-like and species C-like simian adenoviruses. These observations may have implications for the current interest in using chimpanzee adenoviruses in the development of vectors for human gene therapy and for DNA-based vaccines. Also, the reemergence, surveillance, and treatment of HAdV-4 as an ARD pathogen is an opportunity to demonstrate the use of genome determination as a tool for viral infectious disease characterization and epidemic outbreak surveillance: for example, rapid and accurate low-pass sequencing and analysis of the genome. In particular, this approach allows the rapid identification and development of unique probes for the differentiation of family, species, serotype, and strain (e.g., pathogen genome signatures) for monitoring epidemic outbreaks of ARD. PMID:15681456

  18. Microbial bioinformatics 2020.

    PubMed

    Pallen, Mark J

    2016-09-01

    Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier! PMID:27471065

  19. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  20. A survey of scholarly literature describing the field of bioinformatics education and bioinformatics educational research.

    PubMed

    Magana, Alejandra J; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students' attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  1. Acid Rain Analysis by Standard Addition Titration.

    ERIC Educational Resources Information Center

    Ophardt, Charles E.

    1985-01-01

    The standard addition titration is a precise and rapid method for the determination of the acidity in rain or snow samples. The method requires use of a standard buret, a pH meter, and Gran's plot to determine the equivalence point. Experimental procedures used and typical results obtained are presented. (JN)

  2. Bioinformatics and the undergraduate curriculum essay.

    PubMed

    Maloney, Mark; Parker, Jeffrey; Leblanc, Mark; Woodard, Craig T; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of bioinformatics as a new discipline has challenged many colleges and universities to keep current with their curricula, often in the face of static or dwindling resources. On the plus side, many bioinformatics modules and related databases and software programs are free and accessible online, and interdisciplinary partnerships between existing faculty members and their support staff have proved advantageous in such efforts. We present examples of strategies and methods that have been successfully used to incorporate bioinformatics content into undergraduate curricula. PMID:20810947

  3. A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins

    PubMed Central

    Milano, Teresa

    2016-01-01

    The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH) architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average) that is homologous to fold type-I pyridoxal 5′-phosphate (PLP) dependent enzymes like aspartate aminotransferase (AAT). These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs). Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups. PMID:27446613

  4. A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins.

    PubMed

    Milano, Teresa; Angelaccio, Sebastiana; Tramonti, Angela; Di Salvo, Martino Luigi; Contestabile, Roberto; Pascarella, Stefano

    2016-01-01

    The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH) architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average) that is homologous to fold type-I pyridoxal 5'-phosphate (PLP) dependent enzymes like aspartate aminotransferase (AAT). These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs). Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups. PMID:27446613

  5. Bioinformatic and phylogenetic analysis of the CLAVATA3/EMBRYO-SURROUNDING REGION (CLE) and the CLE-LIKE signal peptide genes in the Pinophyta

    PubMed Central

    2014-01-01

    Background There is a rapidly growing awareness that plant peptide signalling molecules are numerous and varied and they are known to play fundamental roles in angiosperm plant growth and development. Two closely related peptide signalling molecule families are the CLAVATA3-EMBRYO-SURROUNDING REGION (CLE) and CLE-LIKE (CLEL) genes, which encode precursors of secreted peptide ligands that have roles in meristem maintenance and root gravitropism. Progress in peptide signalling molecule research in gymnosperms has lagged behind that of angiosperms. We therefore sought to identify CLE and CLEL genes in gymnosperms and conduct a comparative analysis of these gene families with angiosperms. Results We undertook a meta-analysis of the GenBank/EMBL/DDBJ gymnosperm EST database and the Picea abies and P. glauca genomes and identified 93 putative CLE genes and 11 CLEL genes among eight Pinophyta species, in the genera Cryptomeria, Pinus and Picea. The predicted conifer CLE and CLEL protein sequences had close phylogenetic relationships with their homologues in Arabidopsis. Notably, perfect conservation of the active CLE dodecapeptide in presumed orthologues of the Arabidopsis CLE41/44-TRACHEARY ELEMENT DIFFERENTIATION (TDIF) protein, an inhibitor of tracheary element (xylem) differentiation, was seen in all eight conifer species. We cloned the Pinus radiata CLE41/44-TDIF orthologues. These genes were preferentially expressed in phloem in planta as expected, but unexpectedly, also in differentiating tracheary element (TE) cultures. Surprisingly, transcript abundances of these TE differentiation-inhibitors sharply increased during early TE differentiation, suggesting that some cells differentiate into phloem cells in addition to TEs in these cultures. Applied CLE13 and CLE41/44 peptides inhibited root elongation in Pinus radiata seedlings. We show evidence that two CLEL genes are alternatively spliced via 3′-terminal acceptor exons encoding separate CLEL peptides

  6. SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells.

    PubMed

    Pantano, Lorena; Estivill, Xavier; Martí, Eulàlia

    2010-03-01

    High-throughput sequencing technologies enable direct approaches to catalog and analyze snapshots of the total small RNA content of living cells. Characterization of high-throughput sequencing data requires bioinformatic tools offering a wide perspective of the small RNA transcriptome. Here we present SeqBuster, a highly versatile and reliable web-based toolkit to process and analyze large-scale small RNA datasets. The high flexibility of this tool is illustrated by the multiple choices offered in the pre-analysis for mapping purposes and in the different analysis modules for data manipulation. To overcome the storage capacity limitations of the web-based tool, SeqBuster offers a stand-alone version that permits the annotation against any custom database. SeqBuster integrates multiple analyses modules in a unique platform and constitutes the first bioinformatic tool offering a deep characterization of miRNA variants (isomiRs). The application of SeqBuster to small-RNA datasets of human embryonic stem cells revealed that most miRNAs present different types of isomiRs, some of them being associated to stem cell differentiation. The exhaustive description of the isomiRs provided by SeqBuster could help to identify miRNA-variants that are relevant in physiological and pathological processes. SeqBuster is available at http://estivill_lab.crg.es/seqbuster. PMID:20008100

  7. SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells

    PubMed Central

    Pantano, Lorena; Estivill, Xavier; Martí, Eulàlia

    2010-01-01

    High-throughput sequencing technologies enable direct approaches to catalog and analyze snapshots of the total small RNA content of living cells. Characterization of high-throughput sequencing data requires bioinformatic tools offering a wide perspective of the small RNA transcriptome. Here we present SeqBuster, a highly versatile and reliable web-based toolkit to process and analyze large-scale small RNA datasets. The high flexibility of this tool is illustrated by the multiple choices offered in the pre-analysis for mapping purposes and in the different analysis modules for data manipulation. To overcome the storage capacity limitations of the web-based tool, SeqBuster offers a stand-alone version that permits the annotation against any custom database. SeqBuster integrates multiple analyses modules in a unique platform and constitutes the first bioinformatic tool offering a deep characterization of miRNA variants (isomiRs). The application of SeqBuster to small-RNA datasets of human embryonic stem cells revealed that most miRNAs present different types of isomiRs, some of them being associated to stem cell differentiation. The exhaustive description of the isomiRs provided by SeqBuster could help to identify miRNA-variants that are relevant in physiological and pathological processes. SeqBuster is available at http://estivill_lab.crg.es/seqbuster. PMID:20008100

  8. An Online Bioinformatics Curriculum

    PubMed Central

    Searls, David B.

    2012-01-01

    Online learning initiatives over the past decade have become increasingly comprehensive in their selection of courses and sophisticated in their presentation, culminating in the recent announcement of a number of consortium and startup activities that promise to make a university education on the internet, free of charge, a real possibility. At this pivotal moment it is appropriate to explore the potential for obtaining comprehensive bioinformatics training with currently existing free video resources. This article presents such a bioinformatics curriculum in the form of a virtual course catalog, together with editorial commentary, and an assessment of strengths, weaknesses, and likely future directions for open online learning in this field. PMID:23028269

  9. Additives

    NASA Technical Reports Server (NTRS)

    Smalheer, C. V.

    1973-01-01

    The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.

  10. Altered hippocampal microRNA expression profiles in neonatal rats caused by sevoflurane anesthesia: MicroRNA profiling and bioinformatics target analysis

    PubMed Central

    Ye, Jishi; Zhang, Zongze; Wang, Yanlin; Chen, Chang; Xu, Xing; Yu, Hui; Peng, Mian

    2016-01-01

    Although accumulating evidence has suggested that microRNAs (miRNAs) have a serious impact on cognitive function and are associated with the etiology of several neuropsychiatric disorders, their expression in sevoflurane-induced neurotoxicity in the developing brain has not been characterized. In the present study, the miRNAs expression pattern in neonatal hippocampus samples (24 h after sevoflurane exposure) was investigated and 9 miRNAs were selected, which were associated with brain development and cognition in order to perform a bioinformatic analysis. Previous microfluidic chip assay had detected 29 upregulated and 24 downregulated miRNAs in the neonatal rat hippocampus, of which 7 selected deregulated miRNAs were identified by the quantitative polymerase chain reaction. A total of 85 targets of selected deregulated miRNAs were analyzed using bioinformatics and the main enriched metabolic pathways, mitogen-activated protein kinase and Wnt pathways may have been involved in molecular mechanisms with regard to neuronal cell body, dendrite and synapse. The observations of the present study provided a novel understanding regarding the regulatory mechanism of miRNAs underlying sevoflurane-induced neurotoxicity, therefore benefitting the improvement of the prevention and treatment strategies of volatile anesthetics related neurotoxicity. PMID:27588052

  11. Bioinformatics and School Biology

    ERIC Educational Resources Information Center

    Dalpech, Roger

    2006-01-01

    The rapidly changing field of bioinformatics is fuelling the need for suitably trained personnel with skills in relevant biological "sub-disciplines" such as proteomics, transcriptomics and metabolomics, etc. But because of the complexity--and sheer weight of data--associated with these new areas of biology, many school teachers feel…

  12. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  13. Bioinformatics analysis of the target gene of fibroblast growth factor receptor 3 in bladder cancer and associated molecular mechanisms

    PubMed Central

    AI, XING; JIA, ZHUO-MIN; WANG, JUAN; DI, GUI-PING; ZHANG, XU; SUN, FENGLING; ZANG, TONG; LIAO, XIUMEI

    2015-01-01

    The aim of the present study was to elucidate the molecular mechanisms of fibroblast growth factor receptor 3 (FGFR3) activation via overexpression or mutation of the FGFR3 target gene in bladder cancer (BC). The transcription profile data GSE41035, which included 18 BC samples, containing 3 independent FGFR3 short hairpin (sh)RNA, and 6 control samples, containing enhanced green fluorescent protein (EGFP) shRNA, were obtained from the National Center of Biotechnology Information Gene Expression Omnibus database. The Limma package with multiple testing correction was used to identify differentially expressed genes (DEGs) between FGFR3 knockdown and control samples. Gene ontology (GO) and pathway enrichment analysis were conducted in order to investigate the DEGs at the functional level. In addition, differential co-expression analysis was employed to construct a gene co-expression network. A total of 196 DEGs were acquired, of which 101 were downregulated and 95 were upregulated. In addition, a gene signature was identified linking FGFR3 signaling with de novo sterol biosynthesis and metabolism using GO and pathway enrichment analysis. Furthermore, the present study demonstrated that the genes NME2, CCNB1 and H2AFZ were significantly associated with BC, as determined by the protein-protein interaction network of DEGs and co-expressed genes. In conclusion, the present study revealed the involvement of FGFR3 in the regulation of sterol biosynthesis and metabolism in the maintenance of BC; in addition, the present study provided a novel insight into the molecular mechanisms of FGFR3 in BC. These results may therefore contribute to the theoretical guidance into the detection and therapy of BC. PMID:26171066

  14. Novel C16orf57 mutations in patients with Poikiloderma with Neutropenia: bioinformatic analysis of the protein and predicted effects of all reported mutations

    PubMed Central

    2012-01-01

    Background Poikiloderma with Neutropenia (PN) is a rare autosomal recessive genodermatosis caused by C16orf57 mutations. To date 17 mutations have been identified in 31 PN patients. Results We characterize six PN patients expanding the clinical phenotype of the syndrome and the mutational repertoire of the gene. We detect the two novel C16orf57 mutations, c.232C>T and c.265+2T>G, as well as the already reported c.179delC, c.531delA and c.693+1G>T mutations. cDNA analysis evidences the presence of aberrant transcripts, and bioinformatic prediction of C16orf57 protein structure gauges the mutations effects on the folded protein chain. Computational analysis of the C16orf57 protein shows two conserved H-X-S/T-X tetrapeptide motifs marking the active site of a two-fold pseudosymmetric structure recalling the 2H phosphoesterase superfamily. Based on this model C16orf57 is likely a 2H-active site enzyme functioning in RNA processing, as a presumptive RNA ligase. According to bioinformatic prediction, all known C16orf57 mutations, including the novel mutations herein described, impair the protein structure by either removing one or both tetrapeptide motifs or by destroying the symmetry of the native folding. Finally, we analyse the geographical distribution of the recurrent mutations that depicts clusters featuring a founder effect. Conclusions In cohorts of patients clinically affected by genodermatoses with overlapping symptoms, the molecular screening of C16orf57 gene seems the proper way to address the correct diagnosis of PN, enabling the syndrome-specific oncosurveillance. The bioinformatic prediction of the C16orf57 protein structure denotes a very basic enzymatic function consistent with a housekeeping function. Detection of aberrant transcripts, also in cells from PN patients carrying early truncated mutations, suggests they might be translatable. Tissue-specific sensitivity to the lack of functionally correct protein accounts for the main cutaneous and

  15. Integrated Study of Globally Expressed microRNAs in IL-1β-stimulated Human Osteoarthritis Chondrocytes and Osteoarthritis Relevant Genes: A Microarray and Bioinformatics Analysis.

    PubMed

    Rasheed, Zafar; Al-Shobaili, Hani A; Rasheed, Naila; Al Salloom, Abdulaziz A M; Al-Shaya, Osama; Mahmood, Amer; Alajez, Nehad M; Alghamdi, Ahmed S S; Mehana, El-Sayed E

    2016-07-01

    This study was undertaken to identify and characterize the globally expressed microRNAs (miRNAs) involved in interleukin-1β (IL-1β)-induced joint damage and to predict whether miRNAs can regulate the catabolic effects in osteoarthritis (OA) chondrocytes. Out of 1347 miRNAs analyzed by microarrays in IL-1β-stimulated OA chondrocytes, 35 miRNAs were down-regulated, 1 miRNA was up-regulated, and the expression of 1311 miRNAs remained unchanged. Bioinformatics analysis showed the key inflammatory mediators and key molecular pathways are targeted by differentially expressed miRNAs. Novel miRNAs identified could have important diagnostic and therapeutic potentials in the development of novel therapeutic strategies for pain managements in OA. PMID:27152662

  16. Bioinformatics for Exploration

    NASA Technical Reports Server (NTRS)

    Johnson, Kathy A.

    2006-01-01

    For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.

  17. Identification of conserved microRNAs and their target genes in Nile tilapia (Oreochromis niloticus) by bioinformatic analysis.

    PubMed

    Li, X H; Wu, J S; Tang, L H; Hu, D

    2015-01-01

    MicroRNAs (miRNAs) are a class of non-coding RNAs that play important roles in posttranscriptional regulation of target genes. miRNAs are involved in multiple biological processes by degrading targeted mRNAs or repressing mRNA translation in various organisms. Their conserved nature in various organisms makes them a good source of new miRNA discovery using comparative genomic approaches. In the present study, conserved Nile tilapia (Oreochromis niloticus) miRNAs were identified using a bioinformatic strategy based on expressed sequence tag and genome survey sequence databases. A total of 21 new miRNAs were detected and were found to belong to 17 families. Using mature miRNA sequences as queries, potential targets for tilapia miRNAs were predicted using a local BLAST program and the miRanda software. Target proteins identified using miRanda and BLAST analyses included transcription factors and molecules important in metabolism, transportation, immunity, stress-related activity, growth, and development. These miRNAs and their targets in tilapia may increase the understanding of the role of miRNAs in regulating the growth and development of tilapia. PMID:25867427

  18. The Cytotoxicity Mechanism of 6-Shogaol-Treated HeLa Human Cervical Cancer Cells Revealed by Label-Free Shotgun Proteomics and Bioinformatics Analysis

    PubMed Central

    Liu, Qun; Peng, Yong-Bo; Qi, Lian-Wen; Cheng, Xiao-Lan; Xu, Xiao-Jun; Liu, Le-Le; Liu, E-Hu; Li, Ping

    2012-01-01

    Cervical cancer is one of the most common cancers among women in the world. 6-Shogaol is a natural compound isolated from the rhizome of ginger (Zingiber officinale). In this paper, we demonstrated that 6-shogaol induced apoptosis and G2/M phase arrest in human cervical cancer HeLa cells. Endoplasmic reticulum stress and mitochondrial pathway were involved in 6-shogaol-mediated apoptosis. Proteomic analysis based on label-free strategy by liquid chromatography chip quadrupole time-of-flight mass spectrometry was subsequently proposed to identify, in a non-target-biased manner, the molecular changes in cellular proteins in response to 6-shogaol treatment. A total of 287 proteins were differentially expressed in response to 24 h treatment with 15 μM 6-shogaol in HeLa cells. Significantly changed proteins were subjected to functional pathway analysis by multiple analyzing software. Ingenuity pathway analysis (IPA) suggested that 14-3-3 signaling is a predominant canonical pathway involved in networks which may be significantly associated with the process of apoptosis and G2/M cell cycle arrest induced by 6-shogaol. In conclusion, this work developed an unbiased protein analysis strategy by shotgun proteomics and bioinformatics analysis. Data observed provide a comprehensive analysis of the 6-shogaol-treated HeLa cell proteome and reveal protein alterations that are associated with its anticancer mechanism. PMID:23243437

  19. Genome-wide bioinformatics analysis of steroid metabolism-associated genes in Nocardioides simplex VKM Ac-2033D.

    PubMed

    Shtratnikova, Victoria Y; Schelkunov, Mikhail I; Fokina, Victoria V; Pekov, Yury A; Ivashina, Tanya; Donova, Marina V

    2016-08-01

    Actinobacteria comprise diverse groups of bacteria capable of full degradation, or modification of different steroid compounds. Steroid catabolism has been characterized best for the representatives of suborder Corynebacterineae, such as Mycobacteria, Rhodococcus and Gordonia, with high content of mycolic acids in the cell envelope, while it is poorly understood for other steroid-transforming actinobacteria, such as representatives of Nocardioides genus belonging to suborder Propionibacterineae. Nocardioides simplex VKM Ac-2033D is an important biotechnological strain which is known for its ability to introduce ∆(1)-double bond in various 1(2)-saturated 3-ketosteroids, and perform convertion of 3β-hydroxy-5-ene steroids to 3-oxo-4-ene steroids, hydrolysis of acetylated steroids, reduction of carbonyl groups at C-17 and C-20 of androstanes and pregnanes, respectively. The strain is also capable of utilizing cholesterol and phytosterol as carbon and energy sources. In this study, a comprehensive bioinformatics genome-wide screening was carried out to predict genes related to steroid metabolism in this organism, their clustering and possible regulation. The predicted operon structure and number of candidate gene copies paralogs have been estimated. Binding sites of steroid catabolism regulators KstR and KstR2 specified for N. simplex VKM Ac-2033D have been calculated de novo. Most of the candidate genes grouped within three main clusters, one of the predicted clusters having no analogs in other actinobacteria studied so far. The results offer a base for further functional studies, expand the understanding of steroid catabolism by actinobacteria, and will contribute to modifying of metabolic pathways in order to generate effective biocatalysts capable of producing valuable bioactive steroids. PMID:26832142

  20. Bioinformatics analysis of organizational and expressional characterizations of the IFNs, IRFs and CRFBs in grass carp Ctenopharyngodon idella.

    PubMed

    Liao, Zhiwei; Wan, Quanyuan; Su, Jianguo

    2016-08-01

    Interferons (IFNs) play crucial roles in the immune response of defense against viral infection and bacteria invasion. In the present study, we systematically identified and characterized the IFNs, their regulatory factors (Interferon Regulatory Factors, IRFs) and receptors (Cytokine Receptor Family B, CRFBs) in grass carp (Ctenopharyngodon idella). Grass carp IFNs can be classified into type I IFN (IFN-I) and type II IFN (IFN-II) like other teleosts. IFN-I consist of two groups with two (group I) or four (group II) cysteines in the mature peptide and can be further divided into three subgroups (IFN-a, -c and -d), containing four members: IFN1, IFN2, IFN3, IFN4 in grass carp. IFN-II contain two members, IFNγ2 with the similarity to mammalian IFNγ and a cyprinid specific IFNγ1 (IFNγ-rel) molecule. mRNA expression analyses of IFNs discovered that IFN1 and IFN-II were sustainably expressed in many tissues, while other IFN members were transiently expressed in specific tissues and time points. In the immune response, IFN transcriptions are primarily regulated through multiple IRFs after grass carp reovirus (GCRV) challenge. IRF family possess thirteen members in grass carp, which can be further divided into four subfamilies (IRF-1, -3, -4 and -5 subfamily), each of them plays different roles in the innate and adaptive immunity via various signaling pathways to interact with IFNs (mainly IFN-I). IFNs have to bind receptors (CRFBs) to perform their functions. CRFBs as IFN receptors contain six members in grass carp. The structure and expression characterizations of IFNs, IRFs and CRFBs were analyzed using bioinformatics tools. These results might provide basic data for the further functional research of IFN system, and deeply understand fish immune mechanisms against virus infection. PMID:27012995

  1. E2F, HSF2, and miR-26 in thyroid carcinoma: bioinformatic analysis of RNA-sequencing data.

    PubMed

    Lu, J C; Zhang, Y P

    2016-01-01

    In this study, we examined the molecular mechanism of thyroid carcinoma (THCA) using bioinformatics. RNA-sequencing data of THCA (N = 498) and normal thyroid tissue (N = 59) were downloaded from The Cancer Genome Atlas. Next, gene expression levels were calculated using the TCC package and differentially expressed genes (DEGs) were identified using the edgeR package. A co-expression network was constructed using the EBcoexpress package and visualized by Cytoscape, and functional and pathway enrichment of DEGs in the co-expression network was analyzed with DAVID and KOBAS 2.0. Moreover, modules in the co-expression network were identified and annotated using MCODE and BiNGO plugins. Small-molecule drugs were analyzed using the cMAP database, and miRNAs and transcription factors regulating DEGs were identified by WebGestalt. A total of 254 up-regulated and 59 down-regulated DEGs were identified between THCA samples and controls. DEGs enriched in biological process terms were related to cell adhesion, death, and growth and negatively correlated with various small-molecule drugs. The co-expression network of the DEGs consisted of hub genes (ITGA3, TIMP1, KRT19, and SERPINA1) and one module (JUN, FOSB, and EGR1). Furthermore, 5 miRNAs and 5 transcription factors were identified, including E2F, HSF2, and miR-26. miR-26 may participate in THCA by targeting CITED1 and PLA2R1; E2F may participate in THCA by regulating ITGA3, TIMP1, KRT19, EGR1, and JUN; HSF2 may be involved in THCA development by regulating SERPINA1 and FOSB; and small-molecule drugs may have anti-THCA effects. Our results provide novel directions for mechanistic studies and drug design of THCA. PMID:26985959

  2. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  3. Bioinformatics of prokaryotic RNAs

    PubMed Central

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880

  4. Bioinformatics of prokaryotic RNAs.

    PubMed

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880

  5. Pattern recognition in bioinformatics.

    PubMed

    de Ridder, Dick; de Ridder, Jeroen; Reinders, Marcel J T

    2013-09-01

    Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained. PMID:23559637

  6. An Integrated Bioinformatics Analysis Reveals Divergent Evolutionary Pattern of Oil Biosynthesis in High- and Low-Oil Plants

    PubMed Central

    Zhang, Li; Wang, Shi-Bo; Li, Qi-Gang; Song, Jian; Hao, Yu-Qi; Zhou, Ling; Zheng, Huan-Quan; Dunwell, Jim M.; Zhang, Yuan-Ming

    2016-01-01

    Seed oils provide a renewable source of food, biofuel and industrial raw materials that is important for humans. Although many genes and pathways for acyl-lipid metabolism have been identified, little is known about whether there is a specific mechanism for high-oil content in high-oil plants. Based on the distinct differences in seed oil content between four high-oil dicots (20~50%) and three low-oil grasses (<3%), comparative genome, transcriptome and differential expression analyses were used to investigate this mechanism. Among 4,051 dicot-specific soybean genes identified from 252,443 genes in the seven species, 54 genes were shown to directly participate in acyl-lipid metabolism, and 93 genes were found to be associated with acyl-lipid metabolism. Among the 93 dicot-specific genes, 42 and 27 genes, including CBM20-like SBDs and GPT2, participate in carbohydrate degradation and transport, respectively. 40 genes highly up-regulated during seed oil rapid accumulation period are mainly involved in initial fatty acid synthesis, triacylglyceride assembly and oil-body formation, for example, ACCase, PP, DGAT1, PDAT1, OLEs and STEROs, which were also found to be differentially expressed between high- and low-oil soybean accessions. Phylogenetic analysis revealed distinct differences of oleosin in patterns of gene duplication and loss between high-oil dicots and low-oil grasses. In addition, seed-specific GmGRF5, ABI5 and GmTZF4 were predicted to be candidate regulators in seed oil accumulation. This study facilitates future research on lipid biosynthesis and potential genetic improvement of seed oil content. PMID:27159078

  7. The 2015 Bioinformatics Open Source Conference (BOSC 2015).

    PubMed

    Harris, Nomi L; Cock, Peter J A; Lapp, Hilmar; Chapman, Brad; Davey, Rob; Fields, Christopher; Hokamp, Karsten; Munoz-Torres, Monica

    2016-02-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included "Data Science;" "Standards and Interoperability;" "Open Science and Reproducibility;" "Translational Bioinformatics;" "Visualization;" and "Bioinformatics Open Source Project Updates". In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled "Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community," that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule. PMID:26914653

  8. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  9. Channelrhodopsins: a bioinformatics perspective.

    PubMed

    Del Val, Coral; Royuela-Flor, José; Milenkovic, Stefan; Bondar, Ana-Nicoleta

    2014-05-01

    Channelrhodopsins are microbial-type rhodopsins that function as light-gated cation channels. Understanding how the detailed architecture of the protein governs its dynamics and specificity for ions is important, because it has the potential to assist in designing site-directed channelrhodopsin mutants for specific neurobiology applications. Here we use bioinformatics methods to derive accurate alignments of channelrhodopsin sequences, assess the sequence conservation patterns and find conserved motifs in channelrhodopsins, and use homology modeling to construct three-dimensional structural models of channelrhodopsins. The analyses reveal that helices C and D of channelrhodopsins contain Cys, Ser, and Thr groups that can engage in both intra- and inter-helical hydrogen bonds. We propose that these polar groups participate in inter-helical hydrogen-bonding clusters important for the protein conformational dynamics and for the local water interactions. This article is part of a Special Issue entitled: Retinal Proteins - You can teach an old dog new tricks. PMID:24252597

  10. Analysis of Ultra-Deep Pyrosequencing and Cloning Based Sequencing of the Basic Core Promoter/Precore/Core Region of Hepatitis B Virus Using Newly Developed Bioinformatics Tools

    PubMed Central

    Yousif, Mukhlid; Bell, Trevor G.; Mudawi, Hatim; Glebe, Dieter; Kramvis, Anna

    2014-01-01

    Aims The aims of this study were to develop bioinformatics tools to explore ultra-deep pyrosequencing (UDPS) data, to test these tools, and to use them to determine the optimum error threshold, and to compare results from UDPS and cloning based sequencing (CBS). Methods Four serum samples, infected with either genotype D or E, from HBeAg-positive and HBeAg-negative patients were randomly selected. UDPS and CBS were used to sequence the basic core promoter/precore region of HBV. Two online bioinformatics tools, the “Deep Threshold Tool” and the “Rosetta Tool” (http://hvdr.bioinf.wits.ac.za/tools/), were built to test and analyze the generated data. Results A total of 10952 reads were generated by UDPS on the 454 GS Junior platform. In the four samples, substitutions, detected at 0.5% threshold or above, were identified at 39 unique positions, 25 of which were non-synonymous mutations. Sample #2 (HBeAg-negative, genotype D) had substitutions in 26 positions, followed by sample #1 (HBeAg-negative, genotype E) in 12 positions, sample #3 (HBeAg-positive, genotype D) in 7 positions and sample #4 (HBeAg-positive, genotype E) in only four positions. The ratio of nucleotide substitutions between isolates from HBeAg-negative and HBeAg-positive patients was 3.5∶1. Compared to genotype E isolates, genotype D isolates showed greater variation in the X, basic core promoter/precore and core regions. Only 18 of the 39 positions identified by UDPS were detected by CBS, which detected 14 of the 25 non-synonymous mutations detected by UDPS. Conclusion UDPS data should be approached with caution. Appropriate curation of read data is required prior to analysis, in order to clean the data and eliminate artefacts. CBS detected fewer than 50% of the substitutions detected by UDPS. Furthermore it is important that the appropriate consensus (reference) sequence is used in order to identify variants correctly. PMID:24740330

  11. Bioinformatic analysis reveals an evolutional selection for DNA:RNA hybrid G-quadruplex structures as putative transcription regulatory elements in warm-blooded animals.

    PubMed

    Xiao, Shan; Zhang, Jia-Yu; Zheng, Ke-Wei; Hao, Yu-Hua; Tan, Zheng

    2013-12-01

    Recently, we reported the co-transcriptional formation of DNA:RNA hybrid G-quadruplex (HQ) structure by the non-template DNA strand and nascent RNA transcript, which in turn modulates transcription under both in vitro and in vivo conditions. Here we present bioinformatic analysis on putative HQ-forming sequences (PHQS) in the genomes of eukaryotic organisms. Starting from amphibian, PHQS motifs are concentrated in the immediate 1000-nt region downstream of transcription start sites, implying their potential role in transcription regulation. Moreover, their occurrence shows a strong bias toward the non-template versus the template strand. PHQS has become constitutional in genes in warm-blooded animals, and the magnitude of the strand bias correlates with the ability of PHQS to form HQ, suggesting a selection based on HQ formation. This strand bias is reversed in lower species, implying that the selection of PHQS/HQ depended on the living temperature of the organisms. In comparison with the putative intramolecular G-quadruplex-forming sequences (PQS), PHQS motifs are far more prevalent and abundant in the transcribed regions, making them the dominant candidates in the formation of G-quadruplexes in transcription. Collectively, these results suggest that the HQ structures are evolutionally selected to function in transcription and other transcription-mediated processes that involve guanine-rich non-template strand. PMID:23999096

  12. Suppression subtractive hybridization (SSH) combined with bioinformatics method: an integrated functional annotation approach for analysis of differentially expressed immune-genes in insects

    PubMed Central

    Badapanda, Chandan

    2013-01-01

    The suppression subtractive hybridization (SSH) approach, a PCR based approach which amplifies differentially expressed cDNAs (complementary DNAs), while simultaneously suppressing amplification of common cDNAs, was employed to identify immuneinducible genes in insects. This technique has been used as a suitable tool for experimental identification of novel genes in eukaryotes as well as prokaryotes; whose genomes have been sequenced, or the species whose genomes have yet to be sequenced. In this article, I have proposed a method for in silico functional characterization of immune-inducible genes from insects. Apart from immune-inducible genes from insects, this method can be applied for the analysis of genes from other species, starting from bacteria to plants and animals. This article is provided with a background of SSH-based method taking specific examples from innate immune-inducible genes in insects, and subsequently a bioinformatics pipeline is proposed for functional characterization of newly sequenced genes. The proposed workflow presented here, can also be applied for any newly sequenced species generated from Next Generation Sequencing (NGS) platforms. PMID:23519487

  13. MISIS-2: A bioinformatics tool for in-depth analysis of small RNAs and representation of consensus master genome in viral quasispecies.

    PubMed

    Seguin, Jonathan; Otten, Patricia; Baerlocher, Loïc; Farinelli, Laurent; Pooggin, Mikhail M

    2016-07-01

    In most eukaryotes, small RNA (sRNA) molecules such as miRNAs, siRNAs and piRNAs regulate gene expression and repress transposons and viruses. AGO/PIWI family proteins sort functional sRNAs based on size, 5'-nucleotide and other sequence features. In plants and some animals, viral sRNAs are extremely diverse and cover the entire viral genome sequences, which allows for de novo reconstruction of a complete viral genome by deep sequencing and bioinformatics analysis of viral sRNAs. Previously, we have developed a tool MISIS to view and analyze sRNA maps of viruses and cellular genome regions which spawn multiple sRNAs. Here we describe a new release of MISIS, MISIS-2, which enables to determine and visualize a consensus sequence and count sRNAs of any chosen sizes and 5'-terminal nucleotide identities. Furthermore we demonstrate the utility of MISIS-2 for identification of single nucleotide polymorphisms (SNPs) at each position of a reference sequence and reconstruction of a consensus master genome in evolving viral quasispecies. MISIS-2 is a Java standalone program. It is freely available along with the source code at the website http://www.fasteris.com/apps. PMID:26994965

  14. PhyloToAST: Bioinformatics tools for species-level analysis and visualization of complex microbial datasets.

    PubMed

    Dabdoub, Shareef M; Fellows, Megan L; Paropkari, Akshay D; Mason, Matthew R; Huja, Sarandeep S; Tsigarida, Alexandra A; Kumar, Purnima S

    2016-01-01

    The 16S rRNA gene is widely used for taxonomic profiling of microbial ecosystems; and recent advances in sequencing chemistry have allowed extremely large numbers of sequences to be generated from minimal amounts of biological samples. Analysis speed and resolution of data to species-level taxa are two important factors in large-scale explorations of complex microbiomes using 16S sequencing. We present here new software, Phylogenetic Tools for Analysis of Species-level Taxa (PhyloToAST), that completely integrates with the QIIME pipeline to improve analysis speed, reduce primer bias (requiring two sequencing primers), enhance species-level analysis, and add new visualization tools. The code is free and open source, and can be accessed at http://phylotoast.org. PMID:27357721

  15. PhyloToAST: Bioinformatics tools for species-level analysis and visualization of complex microbial datasets

    PubMed Central

    Dabdoub, Shareef M.; Fellows, Megan L.; Paropkari, Akshay D.; Mason, Matthew R.; Huja, Sarandeep S.; Tsigarida, Alexandra A.; Kumar, Purnima S.

    2016-01-01

    The 16S rRNA gene is widely used for taxonomic profiling of microbial ecosystems; and recent advances in sequencing chemistry have allowed extremely large numbers of sequences to be generated from minimal amounts of biological samples. Analysis speed and resolution of data to species-level taxa are two important factors in large-scale explorations of complex microbiomes using 16S sequencing. We present here new software, Phylogenetic Tools for Analysis of Species-level Taxa (PhyloToAST), that completely integrates with the QIIME pipeline to improve analysis speed, reduce primer bias (requiring two sequencing primers), enhance species-level analysis, and add new visualization tools. The code is free and open source, and can be accessed at http://phylotoast.org. PMID:27357721

  16. Computed Tomography Inspection and Analysis for Additive Manufacturing Components

    NASA Technical Reports Server (NTRS)

    Beshears, Ronald D.

    2016-01-01

    Computed tomography (CT) inspection was performed on test articles additively manufactured from metallic materials. Metallic AM and machined wrought alloy test articles with programmed flaws were inspected using a 2MeV linear accelerator based CT system. Performance of CT inspection on identically configured wrought and AM components and programmed flaws was assessed using standard image analysis techniques to determine the impact of additive manufacturing on inspectability of objects with complex geometries.

  17. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability

  18. Evaluation of Simultaneous Nutrient and COD Removal with Polyhydroxybutyrate (PHB) Accumulation Using Mixed Microbial Consortia under Anoxic Condition and Their Bioinformatics Analysis

    PubMed Central

    Jena, Jyotsnarani; Kumar, Ravindra; Dixit, Anshuman; Pandey, Sony; Das, Trupti

    2015-01-01

    Simultaneous nitrate-N, phosphate and COD removal was evaluated from synthetic waste water using mixed microbial consortia in an anoxic environment under various initial carbon load (ICL) in a batch scale reactor system. Within 6 hours of incubation, enriched DNPAOs (Denitrifying Polyphosphate Accumulating Microorganisms) were able to remove maximum COD (87%) at 2g/L of ICL whereas maximum nitrate-N (97%) and phosphate (87%) removal along with PHB accumulation (49 mg/L) was achieved at 8 g/L of ICL. Exhaustion of nitrate-N, beyond 6 hours of incubation, had a detrimental effect on COD and phosphate removal rate. Fresh supply of nitrate-N to the reaction medium, beyond 6 hours, helped revive the removal rates of both COD and phosphate. Therefore, it was apparent that in spite of a high carbon load, maximum COD and nutrient removal can be maintained, with adequate nitrate-N availability. Denitrifying condition in the medium was evident from an increasing pH trend. PHB accumulation by the mixed culture was directly proportional to ICL; however the time taken for accumulation at higher ICL was more. Unlike conventional EBPR, PHB depletion did not support phosphate accumulation in this case. The unique aspect of all the batch studies were PHB accumulation was observed along with phosphate uptake and nitrate reduction under anoxic conditions. Bioinformatics analysis followed by pyrosequencing of the mixed culture DNA from the seed sludge revealed the dominance of denitrifying population, such as Corynebacterium, Rhodocyclus and Paraccocus (Alphaproteobacteria and Betaproteobacteria). Rarefaction curve indicated complete bacterial population and corresponding number of OTUs through sequence analysis. Chao1 and Shannon index (H’) was used to study the diversity of sampling. “UCI95” and “LCI95” indicated 95% confidence level of upper and lower values of Chao1 for each distance. Values of Chao1 index supported the results of rarefaction curve. PMID:25689047

  19. [An overview of feature selection algorithm in bioinformatics].

    PubMed

    Li, Xin; Ma, Li; Wang, Jinjia; Zhao, Chun

    2011-04-01

    Feature selection (FS) techniques have become an important tool in bioinformatics field. The core algorithm of it is to select the hidden significant data with low-dimension from high-dimensional data space, and thus to analyse the basic built-in rule of the data. The data of bioinformatics fields are always with high-dimension and small samples, so the research of FS algorithm in the bioinformatics fields has great foreground. In this article, we make the interested reader aware of the possibilities of feature selection, provide basic properties of feature selection techniques, and discuss their uses in the sequence analysis, microarray analysis, mass spectra analysis etc. Finally, the current problems and the prospects of feature selection algorithm in the application of bioinformatics is also discussed. PMID:21604512

  20. Global computing for bioinformatics.

    PubMed

    Loewe, Laurence

    2002-12-01

    Global computing, the collaboration of idle PCs via the Internet in a SETI@home style, emerges as a new way of massive parallel multiprocessing with potentially enormous CPU power. Its relations to the broader, fast-moving field of Grid computing are discussed without attempting a review of the latter. This review (i) includes a short table of milestones in global computing history, (ii) lists opportunities global computing offers for bioinformatics, (iii) describes the structure of problems well suited for such an approach, (iv) analyses the anatomy of successful projects and (v) points to existing software frameworks. Finally, an evaluation of the various costs shows that global computing indeed has merit, if the problem to be solved is already coded appropriately and a suitable global computing framework can be found. Then, either significant amounts of computing power can be recruited from the general public, or--if employed in an enterprise-wide Intranet for security reasons--idle desktop PCs can substitute for an expensive dedicated cluster. PMID:12511066

  1. Sequence and Bioinformatic Analysis of Family 1 Glycoside Hydrolase (GH) 1 Gene from the Oomycete Pythium myriotylum Drechsler.

    PubMed

    Nair, R Aswati; Geethu, C; Sangwan, Amit; Pillai, P Padmesh

    2015-06-01

    The oomycetous phytopathogen Pythium myriotylum secretes cellulases for growth/nutrition of the necrotroph. Cellulases are multi-enzyme system classified into different glycoside hydrolase (GH) families. The present study deals with identification and characterization of GH gene sequence from P. myriotylum by a PCR strategy using consensus primers. Cloning of the full-length gene sequence using genome walker strategy resulted in identification of 1230-bp P. myriotylum GH gene sequence, designated as PmGH1. Analysis revealed that PmGH1 encodes a predicted cytoplasmic 421 amino acid protein with an apparent molecular weight of 46.77 kDa and a theoretical pI of 8.11. Tertiary structure of the deduced amino acid sequence showed typical (α/β)8 barrel folding of family 1 GHs. Sequence characterization of PmGH1 identified the conserved active site residues, viz., Glu 181 and Glu 399, that function as acid-base catalyst and catalytically active nucleophile, respectively. Binding sites for N-acetyl-D-glucosamine (NAG) were revealed in the PmGH1 3D structure with Glu181 and Glu399 positioned on either side to form a catalytic pair. Phylogenetic analysis indicated a closer affiliation of PmGH1 with sequences of GH1 family. Results presented are first attempts providing novel insights into the evolutionary and functional perspectives of the identified P. myriotylum GH. PMID:25877398

  2. Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar

    2014-12-01

    Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation. PMID:25280541

  3. Bioinformatics analysis of thousands of TCGA tumors to determine the involvement of epigenetic regulators in human cancer

    PubMed Central

    2015-01-01

    Background Many cancer cells show distorted epigenetic landscapes. The Cancer Genome Atlas (TCGA) project profiles thousands of tumors, allowing the discovery of somatic alterations in the epigenetic machinery and the identification of potential cancer drivers among members of epigenetic protein families. Methods We integrated mutation, expression, and copy number data from 5943 tumors from 13 cancer types to train a classification model that predicts the likelihood of being an oncogene (OG), tumor suppressor (TSG) or neutral gene (NG). We applied this predictor to epigenetic regulator genes (ERGs), and used differential expression and correlation network analysis to identify dysregulated ERGs along with co-expressed cancer genes. Furthermore, we quantified global proteomic changes by mass spectrometry after EZH2 inhibition. Results Mutation-based classifiers uncovered the OG-like profile of DNMT3A and TSG-like profiles for several ERGs. Differential gene expression and correlation network analyses revealed that EZH2 is the most significantly over-expressed ERG in cancer and is co-regulated with a cell cycle network. Proteomic analysis showed that EZH2 inhibition induced down-regulation of cell cycle regulators in lymphoma cells. Conclusions Using classical driver genes to train an OG/TSG predictor, we determined the most predictive features at the gene level. Our predictor uncovered one OG and several TSGs among ERGs. Expression analyses elucidated multiple dysregulated ERGs including EZH2 as member of a co-expressed cell cycle network. PMID:26110843

  4. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database.

    PubMed

    Field, Helen I; Fenyö, David; Beavis, Ronald C

    2002-01-01

    RADARS, a rapid, automated, data archiving and retrieval software system for high-throughput proteomic mass spectral data processing and storage, is described. The majority of mass spectrometer data files are compatible with RADARS, for consistent processing. The system automatically takes unprocessed data files, identifies proteins via in silico database searching, then stores the processed data and search results in a relational database suitable for customized reporting. The system is robust, used in 24/7 operation, accessible to multiple users of an intranet through a web browser, may be monitored by Virtual Private Network, and is secure. RADARS is scalable for use on one or many computers, and is suited to multiple processor systems. It can incorporate any local database in FASTA format, and can search protein and DNA databases online. A key feature is a suite of visualisation tools (many available gratis), allowing facile manipulation of spectra, by hand annotation, reanalysis, and access to all procedures. We also described the use of Sonar MS/MS, a novel, rapid search engine requiring 40 MB RAM per process for searches against a genomic or EST database translated in all six reading frames. RADARS reduces the cost of analysis by its efficient algorithms: Sonar MS/MS can identifiy proteins without accurate knowledge of the parent ion mass and without protein tags. Statistical scoring methods provide close-to-expert accuracy and brings robust data analysis to the non-expert user. PMID:11788990

  5. Molecular Cloning, Bioinformatics Analysis and Expression of Insulin-Like Growth Factor 2 from Tianzhu White Yak, Bos grunniens

    PubMed Central

    Zhang, Quanwei; Gong, Jishang; Wang, Xueying; Wu, Xiaohu; Li, Yalan; Ma, Youji; Zhang, Yong; Zhao, Xingxu

    2014-01-01

    The IGF family is essential for normal embryonic and postnatal development and plays important roles in the immune system, myogenesis, bone metabolism and other physiological functions, which makes the study of its structure and biological characteristics important. Tianzhu white yak (Bos grunniens) domesticated under alpine hypoxia environments, is well adapted to survive and grow against severe hypoxia and cold temperatures for extended periods. In this study, a full coding sequence of the IGF2 gene of Tianzhu white yak was amplified by reverse transcription PCR and rapid-amplification of cDNA ends (RACE) for the first time. The cDNA sequence revealed an open reading frame of 450 nucleotides, encoding a protein with 179 amino acids. Its expression in different tissues was also studied by Real time PCR. Phylogenetic tree analysis indicated that yak IGF2 was similar to Bos taurus, and 3D structure showed high similarity with the human IGF2. The putative full CDS of yak IGF2 was amplified by PCR in five tissues, and cDNA sequence analysis showed high homology to bovine IGF2. Moreover the super secondary structure prediction showed a similar 3D structure with human IGF2. Its conservation in sequence and structure has facilitated research on IGF2 and its physiological function in yak. PMID:24394317

  6. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  7. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  8. Online Tools for Bioinformatics Analyses in Nutrition Sciences12

    PubMed Central

    Malkaram, Sridhar A.; Hassan, Yousef I.; Zempleni, Janos

    2012-01-01

    Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field. PMID:22983844

  9. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  10. Optimal Multicomponent Analysis Using the Generalized Standard Addition Method.

    ERIC Educational Resources Information Center

    Raymond, Margaret; And Others

    1983-01-01

    Describes an experiment on the simultaneous determination of chromium and magnesium by spectophotometry modified to include the Generalized Standard Addition Method computer program, a multivariate calibration method that provides optimal multicomponent analysis in the presence of interference and matrix effects. Provides instructions for…

  11. Bioinformatics analysis of hepatitis C virus genotype 2a-induced human hepatocellular carcinoma in Huh7 cells

    PubMed Central

    Xu, Ping; Wu, Meiying; Chen, Hui; Xu, Junchi; Wu, Minjuan; Li, Ming; Qian, Feng; Xu, Junhua

    2016-01-01

    Hepatocellular carcinoma (HCC) is a liver cancer that could be induced by hepatitis C virus genotype 2a Japanese fulminant hepatitis-1 (JFH-1) strain. The aim of this study was to investigate the molecular mechanisms of HCC. The microarray data GSE20948 includes 14 JFH-1- and 14 mock (equal volume of medium [control])-infected Huh7 samples. The data were downloaded from the Gene Expression Omnibus. After data processing, soft cluster analyses were performed to identify co-regulated genes with similar temporal expression patterns. Functional and pathway enrichment analyses, as well as functional annotation analysis, were performed. Subsequently, combined networks of protein–protein interaction network, microRNA regulatory network, and transcriptional regulatory network were constructed. Hub nodes, modules, and five clusters of co-regulated genes were also identified. In total, 173 up and 207 down co-regulated genes were separately identified in JFH-1-infected Huh7 cells compared with those of control cells. Functional enrichment analysis indicated that up co-regulated genes were related to skeletal system morphogenesis and neuron differentiation and down co-regulated genes were related to steroid/cholesterol/sterol metabolisms. Hub genes (such as IRF1, GBP1, ICAM1, Foxa1, DHCR7, HMGCS2, and MSMO1) were identified. Transcription factors IRF1 and Foxa1 were the targets of miR-130a, miR-17-5p, and miR-20a. PPARGC1A was targeted by miR-29 family, and MSMO1 was the target of miR-23 family. Hub nodes (such as IRF1, GBP1, ICAM1, Foxa1, DHCR7, HMGCS2, and MSMO1) and microRNAs might be used as candidate biomarkers of JFH-1-infected HCC. PMID:26811688

  12. Bioinformatics analysis of time-series genes profiling to explore key genes affected by age in fracture healing.

    PubMed

    Wang, Wei; Shen, Hao; Xie, Jingjing; Zhou, Qiang; Chen, Yu; Lu, Hua

    2014-06-01

    The present study was aimed to explore possible key genes and bioprocess affected by age during fracture healing. GSE589, GSE592 and GSE1371 were downloaded from gene expression omnibus database. The time-series genes of three age levels rats were firstly identified with hclust function in R. Then functional and pathway enrichment analysis for selected time-series genes were performed. Finally, the VennDiagram package of R language was used to screen overlapping n time-series genes. The expression changes of time-series genes in the rats of three age levels were classified into two types: one was higher expressed at 0 day, decreased at 3 day to 2 week, and increased from 4 to 6 week; the other was the opposite. Functional and pathways enrichment analysis showed that 12 time-series genes of adult and old rats were significantly involved in ECM-receptor interaction pathway. The expression changes of 11 genes were consistent with time axis, 10 genes were up-regulated at 3 days after fracture, and increased slowly in 6 week, while Itga2b was down-regulated. The functions of 106 overlapping genes were all associated with growth and development of bone after fracture. The key genes in ECM-receptor interaction pathway including Spp1, Ibsp, Tnn and Col3a1 have been reported to be related to fracture in literatures. The difference during fracture healing in three age levels rats is mainly related to age. The Spp1, Ibsp, Tnn and Col3a1 are possible potential age-related genes and ECM-receptor interaction pathway is the potential age-related process during fracture healing. PMID:24627361

  13. Bioinformatics analysis of hepatitis C virus genotype 2a-induced human hepatocellular carcinoma in Huh7 cells.

    PubMed

    Xu, Ping; Wu, Meiying; Chen, Hui; Xu, Junchi; Wu, Minjuan; Li, Ming; Qian, Feng; Xu, Junhua

    2016-01-01

    Hepatocellular carcinoma (HCC) is a liver cancer that could be induced by hepatitis C virus genotype 2a Japanese fulminant hepatitis-1 (JFH-1) strain. The aim of this study was to investigate the molecular mechanisms of HCC. The microarray data GSE20948 includes 14 JFH-1- and 14 mock (equal volume of medium [control])-infected Huh7 samples. The data were downloaded from the Gene Expression Omnibus. After data processing, soft cluster analyses were performed to identify co-regulated genes with similar temporal expression patterns. Functional and pathway enrichment analyses, as well as functional annotation analysis, were performed. Subsequently, combined networks of protein-protein interaction network, microRNA regulatory network, and transcriptional regulatory network were constructed. Hub nodes, modules, and five clusters of co-regulated genes were also identified. In total, 173 up and 207 down co-regulated genes were separately identified in JFH-1-infected Huh7 cells compared with those of control cells. Functional enrichment analysis indicated that up co-regulated genes were related to skeletal system morphogenesis and neuron differentiation and down co-regulated genes were related to steroid/cholesterol/sterol metabolisms. Hub genes (such as IRF1, GBP1, ICAM1, Foxa1, DHCR7, HMGCS2, and MSMO1) were identified. Transcription factors IRF1 and Foxa1 were the targets of miR-130a, miR-17-5p, and miR-20a. PPARGC1A was targeted by miR-29 family, and MSMO1 was the target of miR-23 family. Hub nodes (such as IRF1, GBP1, ICAM1, Foxa1, DHCR7, HMGCS2, and MSMO1) and microRNAs might be used as candidate biomarkers of JFH-1-infected HCC. PMID:26811688

  14. Screening feature genes of astrocytoma using a combined method of microarray gene expression profiling and bioinformatics analysis

    PubMed Central

    Cai, Yong; Zhong, Xingming; Wang, Yiqi; Yang, Jianguo

    2015-01-01

    The aim of our study was to find feature genes associated with astrocytoma and correlative gene functions which can distinguish cancer tissue from adjacent non-tumor astrocyte tissues. Gene expression profile GSE15824 was downloaded from Gene Expression Omnibus database which included 8 astrocytoma tissues and 3 adjacent non-tumor astrocyte samples. The raw data were first transformed into probe-level data and the differentially expressed genes (DEGs) between tissues of patients with astrocytoma and normal specimen were identified using T-test in samr package of R. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was applied to analyze the gene ontology (GO) enrichment on gene functions and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Finally, corresponding protein-protein interaction (PPI) networks of DEGs was constructed using the Cytoscape based on the data collected from STRING online datasets. A total of 3072 genes, including 1799 up-regulated genes and 1273 down-regulated genes, were filtered as DEGs, and we learnt that the DEGs including AQP4, PMP2, SRARCL1 and SLC1A2CAMs etc and that AQP4 was most significantly related to cell osmotic pressure. Three feature genes in KEGG pathway are highly enriched in cancer specimen while two genes are in the normal tissues. The discovery of featured genes significantly related to the regulation of cell osmotic pressure, has the potential to use in clinic for diagnosis of astrocytoma in future. In addition, it has a great significance on studying mechanism, distinguishing normal and cancer tissues, and exploring new treatments for astrocytoma. However, further experiments were needed to confirm our result. PMID:26770395

  15. Screening feature genes of astrocytoma using a combined method of microarray gene expression profiling and bioinformatics analysis.

    PubMed

    Cai, Yong; Zhong, Xingming; Wang, Yiqi; Yang, Jianguo

    2015-01-01

    The aim of our study was to find feature genes associated with astrocytoma and correlative gene functions which can distinguish cancer tissue from adjacent non-tumor astrocyte tissues. Gene expression profile GSE15824 was downloaded from Gene Expression Omnibus database which included 8 astrocytoma tissues and 3 adjacent non-tumor astrocyte samples. The raw data were first transformed into probe-level data and the differentially expressed genes (DEGs) between tissues of patients with astrocytoma and normal specimen were identified using T-test in samr package of R. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was applied to analyze the gene ontology (GO) enrichment on gene functions and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Finally, corresponding protein-protein interaction (PPI) networks of DEGs was constructed using the Cytoscape based on the data collected from STRING online datasets. A total of 3072 genes, including 1799 up-regulated genes and 1273 down-regulated genes, were filtered as DEGs, and we learnt that the DEGs including AQP4, PMP2, SRARCL1 and SLC1A2CAMs etc and that AQP4 was most significantly related to cell osmotic pressure. Three feature genes in KEGG pathway are highly enriched in cancer specimen while two genes are in the normal tissues. The discovery of featured genes significantly related to the regulation of cell osmotic pressure, has the potential to use in clinic for diagnosis of astrocytoma in future. In addition, it has a great significance on studying mechanism, distinguishing normal and cancer tissues, and exploring new treatments for astrocytoma. However, further experiments were needed to confirm our result. PMID:26770395

  16. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  17. Bioinformatic and Genetic Association Analysis of MicroRNA Target Sites in One-Carbon Metabolism Genes

    PubMed Central

    Stone, Nicole; Pangilinan, Faith; Molloy, Anne M.; Shane, Barry; Scott, John M.; Ueland, Per Magne; Mills, James L.; Kirke, Peader N.; Sethupathy, Praveen; Brody, Lawrence C.

    2011-01-01

    One-carbon metabolism (OCM) is linked to DNA synthesis and methylation, amino acid metabolism and cell proliferation. OCM dysfunction has been associated with increased risk for various diseases, including cancer and neural tube defects. MicroRNAs (miRNAs) are ∼22 nt RNA regulators that have been implicated in a wide array of basic cellular processes, such as differentiation and metabolism. Accordingly, mis-regulation of miRNA expression and/or activity can underlie complex disease etiology. We examined the possibility of OCM regulation by miRNAs. Using computational miRNA target prediction methods and Monte-Carlo based statistical analyses, we identified two candidate miRNA “master regulators” (miR-22 and miR-125) and one candidate pair of “master co-regulators” (miR-344-5p/484 and miR-488) that may influence the expression of a significant number of genes involved in OCM. Interestingly, miR-22 and miR-125 are significantly up-regulated in cells grown under low-folate conditions. In a complementary analysis, we identified 15 single nucleotide polymorphisms (SNPs) that are located within predicted miRNA target sites in OCM genes. We genotyped these 15 SNPs in a population of healthy individuals (age 18–28, n = 2,506) that was previously phenotyped for various serum metabolites related to OCM. Prior to correction for multiple testing, we detected significant associations between TCblR rs9426 and methylmalonic acid (p  =  0.045), total homocysteine levels (tHcy) (p  =  0.033), serum B12 (p < 0.0001), holo transcobalamin (p < 0.0001) and total transcobalamin (p < 0.0001); and between MTHFR rs1537514 and red blood cell folate (p < 0.0001). However, upon further genetic analysis, we determined that in each case, a linked missense SNP is the more likely causative variant. Nonetheless, our Monte-Carlo based in silico simulations suggest that miRNAs could play an important role in the regulation of OCM. PMID:21765920

  18. Identification of microRNAs in the Toxigenic Dinoflagellate Alexandrium catenella by High-Throughput Illumina Sequencing and Bioinformatic Analysis

    PubMed Central

    Geng, Huili; Sui, Zhenghong; Zhang, Shu; Du, Qingwei; Ren, Yuanyuan; Liu, Yuan; Kong, Fanna; Zhong, Jie; Ma, Qingxia

    2015-01-01

    Micro-ribonucleic acids (miRNAs) are a large group of endogenous, tiny, non-coding RNAs consisting of 19–25 nucleotides that regulate gene expression at either the transcriptional or post-transcriptional level by mediating gene silencing in eukaryotes. They are considered to be important regulators that affect growth, development, and response to various stresses in plants. Alexandrium catenella is an important marine toxic phytoplankton species that can cause harmful algal blooms (HABs). To date, identification and function analysis of miRNAs in A. catenella remain largely unexamined. In this study, high-throughput sequencing was performed on A. catenella to identify and quantitatively profile the repertoire of small RNAs from two different growth phases. A total of 38,092,056 and 32,969,156 raw reads were obtained from the two small RNA libraries, respectively. In total, 88 mature miRNAs belonging to 32 miRNA families were identified. Significant differences were found in the member number, expression level of various families, and expression abundance of each member within a family. A total of 15 potentially novel miRNAs were identified. Comparative profiling showed that 12 known miRNAs exhibited differential expression between the lag phase and the logarithmic phase. Real-time quantitative RT-PCR (qPCR) was performed to confirm the expression of two differentially expressed miRNAs that were one up-regulated novel miRNA (aca-miR-3p-456915), and one down-regulated conserved miRNA (tae-miR159a). The expression trend of the qPCR assay was generally consistent with the deep sequencing result. Target predictions of the 12 differentially expressed miRNAs resulted in 1813target genes. Gene ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG) annotations revealed that some miRNAs were associated with growth and developmental processes of the alga. These results provide insights into the roles that miRNAs play in the growth of

  19. Identification of Genetic Defects in 33 Probands with Stargardt Disease by WES-Based Bioinformatics Gene Panel Analysis

    PubMed Central

    Xin, Wei; Xiao, Xueshan; Li, Shiqiang; Jia, Xiaoyun; Guo, Xiangming; Zhang, Qingjiong

    2015-01-01

    Stargardt disease (STGD) is the most common hereditary macular degeneration in juveniles, with loss of central vision occurring in the first or second decade of life. The aim of this study is to identify the genetic defects in 33 probands with Stargardt disease. Clinical data and genomic DNA were collected from 33 probands from unrelated families with STGD. Variants in coding genes were initially screened by whole exome sequencing. Candidate variants were selected from all known genes associated with hereditary retinal dystrophy and then confirmed by Sanger sequencing. Putative pathogenic variants were further validated in available family members and controls. Potential pathogenic mutations were identified in 19 of the 33 probands (57.6%). These mutations were all present in ABCA4, but not in the other four STGD-associated genes or in genes responsible for other retinal dystrophies. Of the 19 probands, ABCA4 mutations were homozygous in one proband and compound heterozygous in 18 probands, involving 28 variants (13 novel and 15 known). Analysis of normal controls and available family members in 12 of the 19 families further support the pathogenicity of these variants. Clinical manifestation of all probands met the diagnostic criteria of STGD. This study provides an overview of a genetic basis for STGD in Chinese patients. Mutations in ABCA4 are the most common cause of STGD in this cohort. Genetic defects in approximately 42.4% of STGD patients await identification in future studies. PMID:26161775

  20. Expressional and Bioinformatic Analysis of Bovine Filia/Ecat1/Khdc3l Gene: A Comparison with Ovine Species.

    PubMed

    Zahmatkesh, Azadeh; Ansari Mahyari, Saeid; Daliri Joupari, Morteza; Rahmani, Hamidreza; Shirazi, Abolfazl; Amiri Roudbar, Mahmood; Ansari Majd, Saeid

    2016-01-01

    Maternal effect genes have highly impressive effects on pre-implantation development. Filia/Ecat1/Khdc3l is a maternal effect gene found in mouse oocytes and embryos, loss of which causes a 50% decrease in fertility. In the present study, we investigated Filia mRNA expression in bovine oviduct, 30- to 40-day fetus, liver, heart, lung, and oocytes (as a positive control), by RT-PCR and detected it only in oocytes. A 443 bp fragment was amplified only in oocytes and was sequenced as a part of bovine predicted Filia mRNA. We analyzed bovine and ovine Filia N-terminal peptide sequence in PHYRE2, and a KH domain was predicted. Protein alignment using ClustalW indicated a highly identical N-terminal extention between the 2 species. Immunohistochemical analysis using anti-bovine Filia antibody showed the expression of Filia protein in the zone surrounding the nuclear membrane, and in the subcortex of ovine oocytes of primary and antral follicles. However, in the bovine, Filia has been found through the oocyte cytoplasm of antral follicles, and here it is further confirmed in the primary follicles. Our data suggests a difference in Filia expression pattern between cow and sheep, although the sequence is highly conserved. PMID:27070240

  1. Bioinformatics and Molecular Analysis of the Evolutionary Relationship between Bovine Rhinitis A Viruses and Foot-And-Mouth Disease Virus

    PubMed Central

    Rai, Devendra K.; Lawrence, Paul; Pauszek, Steve J.; Piccone, Maria E.; Knowles, Nick J.; Rieder, Elizabeth

    2015-01-01

    Bovine rhinitis viruses (BRVs) cause mild respiratory disease of cattle. In this study, a near full-length genome sequence of a virus named RS3X (formerly classified as bovine rhinovirus type 1), isolated from infected cattle from the UK in the 1960s, was obtained and analyzed. Compared to other closely related Aphthoviruses, major differences were detected in the leader protease (Lpro), P1, 2B, and 3A proteins. Phylogenetic analysis revealed that RS3X was a member of the species bovine rhinitis A virus (BRAV). Using different codon-based and branch-site selection models for Aphthoviruses, including BRAV RS3X and foot-and-mouth disease virus, we observed no clear evidence for genomic regions undergoing positive selection. However, within each of the BRV species, multiple sites under positive selection were detected. The results also suggest that the probability (determined by Recombination Detection Program) for recombination events between BRVs and other Aphthoviruses, including foot-and-mouth disease virus was not significant. In contrast, within BRVs, the probability of recombination increases. The data reported here provide genetic information to assist in the identification of diagnostic signatures and research tools for BRAV. PMID:27081310

  2. Identification of a novel carbohydrate esterase from Bjerkandera adusta: structural and function predictions through bioinformatics analysis and molecular modeling.

    PubMed

    Cuervo-Soto, Laura I; Valdés-García, Gilberto; Batista-García, Ramón; del Rayo Sánchez-Carbente, María; Balcázar-López, Edgar; Lira-Ruan, Verónica; Pastor, Nina; Folch-Mallol, Jorge Luis

    2015-03-01

    A new gene from Bjerkandera adusta strain UAMH 8258 encoding a carbohydrate esterase (designated as BacesI) was isolated and expressed in Pichia pastoris. The gene had an open reading frame of 1410 bp encoding a polypeptide of 470 amino acid residues, the first 18 serving as a secretion signal peptide. Homology and phylogenetic analyses showed that BaCesI belongs to carbohydrate esterases family 4. Three-dimensional modeling of the protein and normal mode analysis revealed a breathing mode of the active site that could be relevant for esterase activity. Furthermore, the overall negative electrostatic potential of this enzyme suggests that it degrades neutral substrates and will not act on negative substrates such as peptidoglycan or p-nitrophenol derivatives. The enzyme shows a specific activity of 1.118 U mg(-1) protein on 2-naphthyl acetate. No activity was detected on p-nitrophenol derivatives as proposed from the electrostatic potential data. The deacetylation activity of the recombinant BaCesI was confirmed by measuring the release of acetic acid from several substrates, including oat xylan, shrimp shell chitin, N-acetylglucosamine, and natural substrates such as sugar cane bagasse and grass. This makes the protein very interesting for the biofuels production industry from lignocellulosic materials and for the production of chitosan from chitin. PMID:25586442

  3. Exploring the immunogenome with bioinformatics.

    PubMed

    de Bono, Bernard; Trowsdale, John

    2003-08-01

    A better description of the immune system can be afforded if the latest developments in bioinformatics are applied to integrate sequence with structure and function. Clear guidelines for the upgrade of the bioinformatic capability of the immunogenetics laboratory are discussed in the light of more powerful methods to detect homology, combined approaches to predict the three dimensional properties of a protein and a robust strategy to represent the biological role of a gene. PMID:14690048

  4. Identification of microRNA-mRNA interactions in atrial fibrillation using microarray expression profiles and bioinformatics analysis

    PubMed Central

    WANG, TAO; WANG, BIN

    2016-01-01

    The present study integrated microRNA (miRNA) and mRNA expression data obtained from atrial fibrillation (AF) tissues and healthy tissues, in order to identify miRNAs and target genes that may be important in the development of AF. The GSE28954 miRNA expression profile and GSE2240 mRNA gene expression profile were downloaded from the Gene Expression Omnibus. Differentially expressed miRNAs and genes (DEGs) in AF tissues, compared with in control samples, were identified and hierarchically clustered. Subsequently, differentially expressed miRNAs and DEGs were searched for in the miRecords database and TarBase, and were used to construct a regulatory network using Cytoscape. Finally, functional analysis of the miRNA-targeted genes was conducted. After data processing, 71 differentially expressed miRNAs and 390 DEGs were identified between AF and normal tissues. A total of 3,506 miRNA-mRNA pairs were selected, of which 372 were simultaneously predicted by both miRecords and TarBase, and were therefore used to construct the miRNA-mRNA regulatory network. Furthermore, 10 miRNAs and 12 targeted mRNAs were detected, which formed 14 interactive pairs. The miRNA-targeted genes were significantly enriched into 14 Gene Ontology (GO) categories, of which the most significant was gene expression regulation (GO 10468), which was associated with 7 miRNAs and 8 target genes. These results suggest that the screened miRNAs and target genes may be target molecules in AF development, and may be beneficial for the early diagnosis and future treatment of AF. PMID:27082053

  5. Bioinformatic analysis of neurotropic HIV envelope sequences identifies polymorphisms in the gp120 bridging sheet that increase macrophage-tropism through enhanced interactions with CCR5

    SciTech Connect

    Mefford, Megan E.; Kunstman, Kevin; Wolinsky, Steven M.; Gabuzda, Dana

    2015-07-15

    Macrophages express low levels of the CD4 receptor compared to T-cells. Macrophage-tropic HIV strains replicating in brain of untreated patients with HIV-associated dementia (HAD) express Envs that are adapted to overcome this restriction through mechanisms that are poorly understood. Here, bioinformatic analysis of env sequence datasets together with functional studies identified polymorphisms in the β3 strand of the HIV gp120 bridging sheet that increase M-tropism. D197, which results in loss of an N-glycan located near the HIV Env trimer apex, was detected in brain in some HAD patients, while position 200 was estimated to be under positive selection. D197 and T/V200 increased fusion and infection of cells expressing low CD4 by enhancing gp120 binding to CCR5. These results identify polymorphisms in the HIV gp120 bridging sheet that overcome the restriction to macrophage infection imposed by low CD4 through enhanced gp120–CCR5 interactions, thereby promoting infection of brain and other macrophage-rich tissues. - Highlights: • We analyze HIV Env sequences and identify amino acids in beta 3 of the gp120 bridging sheet that enhance macrophage tropism. • These amino acids at positions 197 and 200 are present in brain of some patients with HIV-associated dementia. • D197 results in loss of a glycan near the HIV Env trimer apex, which may increase exposure of V3. • These variants may promote infection of macrophages in the brain by enhancing gp120–CCR5 interactions.

  6. Bioinformatics Approach in Plant Genomic Research.

    PubMed

    Ong, Quang; Nguyen, Phuc; Thao, Nguyen Phuong; Le, Ly

    2016-08-01

    The advance in genomics technology leads to the dramatic change in plant biology research. Plant biologists now easily access to enormous genomic data to deeply study plant high-density genetic variation at molecular level. Therefore, fully understanding and well manipulating bioinformatics tools to manage and analyze these data are essential in current plant genome research. Many plant genome databases have been established and continued expanding recently. Meanwhile, analytical methods based on bioinformatics are also well developed in many aspects of plant genomic research including comparative genomic analysis, phylogenomics and evolutionary analysis, and genome-wide association study. However, constantly upgrading in computational infrastructures, such as high capacity data storage and high performing analysis software, is the real challenge for plant genome research. This review paper focuses on challenges and opportunities which knowledge and skills in bioinformatics can bring to plant scientists in present plant genomics era as well as future aspects in critical need for effective tools to facilitate the translation of knowledge from new sequencing data to enhancement of plant productivity. PMID:27499685

  7. Functional and Bioinformatics Analysis of Two Campylobacter jejuni Homologs of the Thiol-Disulfide Oxidoreductase, DsbA

    PubMed Central

    Grabowska, Anna D.; Wywiał, Ewa; Dunin-Horkawicz, Stanislaw; Łasica, Anna M.; Wösten, Marc M. S. M.; Nagy-Staroń, Anna; Godlewska, Renata; Bocian-Ostrzycka, Katarzyna; Pieńkowska, Katarzyna; Łaniewski, Paweł; Bujnicki, Janusz M.; van Putten, Jos P. M.; Jagusztyn-Krynicka, E. Katarzyna

    2014-01-01

    Background Bacterial Dsb enzymes are involved in the oxidative folding of many proteins, through the formation of disulfide bonds between their cysteine residues. The Dsb protein network has been well characterized in cells of the model microorganism Escherichia coli. To gain insight into the functioning of the Dsb system in epsilon-Proteobacteria, where it plays an important role in the colonization process, we studied two homologs of the main Escherichia coli Dsb oxidase (EcDsbA) that are present in the cells of the enteric pathogen Campylobacter jejuni, the most frequently reported bacterial cause of human enteritis in the world. Methods and Results Phylogenetic analysis suggests the horizontal transfer of the epsilon-Proteobacterial DsbAs from a common ancestor to gamma-Proteobacteria, which then gave rise to the DsbL lineage. Phenotype and enzymatic assays suggest that the two C. jejuni DsbAs play different roles in bacterial cells and have divergent substrate spectra. CjDsbA1 is essential for the motility and autoagglutination phenotypes, while CjDsbA2 has no impact on those processes. CjDsbA1 plays a critical role in the oxidative folding that ensures the activity of alkaline phosphatase CjPhoX, whereas CjDsbA2 is crucial for the activity of arylsulfotransferase CjAstA, encoded within the dsbA2-dsbB-astA operon. Conclusions Our results show that CjDsbA1 is the primary thiol-oxidoreductase affecting life processes associated with bacterial spread and host colonization, as well as ensuring the oxidative folding of particular protein substrates. In contrast, CjDsbA2 activity does not affect the same processes and so far its oxidative folding activity has been demonstrated for one substrate, arylsulfotransferase CjAstA. The results suggest the cooperation between CjDsbA2 and CjDsbB. In the case of the CjDsbA1, this cooperation is not exclusive and there is probably another protein to be identified in C. jejuni cells that acts to re-oxidize CjDsbA1. Altogether

  8. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  9. Bioinformatics Analysis of the Complete Genome Sequence of the Mango Tree Pathogen Pseudomonas syringae pv. syringae UMAF0158 Reveals Traits Relevant to Virulence and Epiphytic Lifestyle.

    PubMed

    Martínez-García, Pedro Manuel; Rodríguez-Palenzuela, Pablo; Arrebola, Eva; Carrión, Víctor J; Gutiérrez-Barranquero, José Antonio; Pérez-García, Alejandro; Ramos, Cayo; Cazorla, Francisco M; de Vicente, Antonio

    2015-01-01

    The genome sequence of more than 100 Pseudomonas syringae strains has been sequenced to date; however only few of them have been fully assembled, including P. syringae pv. syringae B728a. Different strains of pv. syringae cause different diseases and have different host specificities; so, UMAF0158 is a P. syringae pv. syringae strain related to B728a but instead of being a bean pathogen it causes apical necrosis of mango trees, and the two strains belong to different phylotypes of pv.syringae and clades of P. syringae. In this study we report the complete sequence and annotation of P. syringae pv. syringae UMAF0158 chromosome and plasmid pPSS158. A comparative analysis with the available sequenced genomes of other 25 P. syringae strains, both closed (the reference genomes DC3000, 1448A and B728a) and draft genomes was performed. The 5.8 Mb UMAF0158 chromosome has 59.3% GC content and comprises 5017 predicted protein-coding genes. Bioinformatics analysis revealed the presence of genes potentially implicated in the virulence and epiphytic fitness of this strain. We identified several genetic features, which are absent in B728a, that may explain the ability of UMAF0158 to colonize and infect mango trees: the mangotoxin biosynthetic operon mbo, a gene cluster for cellulose production, two different type III and two type VI secretion systems, and a particular T3SS effector repertoire. A mutant strain defective in the rhizobial-like T3SS Rhc showed no differences compared to wild-type during its interaction with host and non-host plants and worms. Here we report the first complete sequence of the chromosome of a pv. syringae strain pathogenic to a woody plant host. Our data also shed light on the genetic factors that possibly determine the pathogenic and epiphytic lifestyle of UMAF0158. This work provides the basis for further analysis on specific mechanisms that enable this strain to infect woody plants and for the functional analysis of host specificity in the P

  10. Bioinformatics Analysis of the Complete Genome Sequence of the Mango Tree Pathogen Pseudomonas syringae pv. syringae UMAF0158 Reveals Traits Relevant to Virulence and Epiphytic Lifestyle

    PubMed Central

    Arrebola, Eva; Carrión, Víctor J.; Gutiérrez-Barranquero, José Antonio; Pérez-García, Alejandro; Ramos, Cayo; Cazorla, Francisco M.; de Vicente, Antonio

    2015-01-01

    The genome sequence of more than 100 Pseudomonas syringae strains has been sequenced to date; however only few of them have been fully assembled, including P. syringae pv. syringae B728a. Different strains of pv. syringae cause different diseases and have different host specificities; so, UMAF0158 is a P. syringae pv. syringae strain related to B728a but instead of being a bean pathogen it causes apical necrosis of mango trees, and the two strains belong to different phylotypes of pv.syringae and clades of P. syringae. In this study we report the complete sequence and annotation of P. syringae pv. syringae UMAF0158 chromosome and plasmid pPSS158. A comparative analysis with the available sequenced genomes of other 25 P. syringae strains, both closed (the reference genomes DC3000, 1448A and B728a) and draft genomes was performed. The 5.8 Mb UMAF0158 chromosome has 59.3% GC content and comprises 5017 predicted protein-coding genes. Bioinformatics analysis revealed the presence of genes potentially implicated in the virulence and epiphytic fitness of this strain. We identified several genetic features, which are absent in B728a, that may explain the ability of UMAF0158 to colonize and infect mango trees: the mangotoxin biosynthetic operon mbo, a gene cluster for cellulose production, two different type III and two type VI secretion systems, and a particular T3SS effector repertoire. A mutant strain defective in the rhizobial-like T3SS Rhc showed no differences compared to wild-type during its interaction with host and non-host plants and worms. Here we report the first complete sequence of the chromosome of a pv. syringae strain pathogenic to a woody plant host. Our data also shed light on the genetic factors that possibly determine the pathogenic and epiphytic lifestyle of UMAF0158. This work provides the basis for further analysis on specific mechanisms that enable this strain to infect woody plants and for the functional analysis of host specificity in the P

  11. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  12. Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

    PubMed

    Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

    2012-07-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  13. Integration of Proteomics, Bioinformatics, and Systems Biology in Traumatic Brain Injury Biomarker Discovery

    PubMed Central

    Guingab-Cagmat, J.D.; Cagmat, E.B.; Hayes, R.L.; Anagli, J.

    2013-01-01

    Traumatic brain injury (TBI) is a major medical crisis without any FDA-approved pharmacological therapies that have been demonstrated to improve functional outcomes. It has been argued that discovery of disease-relevant biomarkers might help to guide successful clinical trials for TBI. Major advances in mass spectrometry (MS) have revolutionized the field of proteomic biomarker discovery and facilitated the identification of several candidate markers that are being further evaluated for their efficacy as TBI biomarkers. However, several hurdles have to be overcome even during the discovery phase which is only the first step in the long process of biomarker development. The high-throughput nature of MS-based proteomic experiments generates a massive amount of mass spectral data presenting great challenges in downstream interpretation. Currently, different bioinformatics platforms are available for functional analysis and data mining of MS-generated proteomic data. These tools provide a way to convert data sets to biologically interpretable results and functional outcomes. A strategy that has promise in advancing biomarker development involves the triad of proteomics, bioinformatics, and systems biology. In this review, a brief overview of how bioinformatics and systems biology tools analyze, transform, and interpret complex MS datasets into biologically relevant results is discussed. In addition, challenges and limitations of proteomics, bioinformatics, and systems biology in TBI biomarker discovery are presented. A brief survey of researches that utilized these three overlapping disciplines in TBI biomarker discovery is also presented. Finally, examples of TBI biomarkers and their applications are discussed. PMID:23750150

  14. Novel bioinformatic developments for exome sequencing.

    PubMed

    Lelieveld, Stefan H; Veltman, Joris A; Gilissen, Christian

    2016-06-01

    With the widespread adoption of next generation sequencing technologies by the genetics community and the rapid decrease in costs per base, exome sequencing has become a standard within the repertoire of genetic experiments for both research and diagnostics. Although bioinformatics now offers standard solutions for the analysis of exome sequencing data, many challenges still remain; especially the increasing scale at which exome data are now being generated has given rise to novel challenges in how to efficiently store, analyze and interpret exome data of this magnitude. In this review we discuss some of the recent developments in bioinformatics for exome sequencing and the directions that this is taking us to. With these developments, exome sequencing is paving the way for the next big challenge, the application of whole genome sequencing. PMID:27075447

  15. Analysis and Evaluation of Supersonic Underwing Heat Addition

    NASA Technical Reports Server (NTRS)

    Luidens, Roger W.; Flaherty, Richard J.

    1959-01-01

    The linearized theory for heat addition under a wing has been developed to optimize wing geometry, heat addition, and angle of attack. The optimum wing has all of the thickness on the underside of the airfoil, with maximum-thickness point well downstream, has a moderate thickness ratio, and operates at an optimum angle of attack. The heat addition is confined between the fore Mach waves from under the trailing surface of the wing. By linearized theory, a wing at optimum angle of attack may have a range efficiency about twice that of a wing at zero angle of attack. More rigorous calculations using the method of characteristics for particular flow models were made for heating under a flat-plate wing and for several wings with thickness, both with heat additions concentrated near the wing. The more rigorous calculations yield in practical cases efficiencies about half those estimated by linear theory. An analysis indicates that distributing the heat addition between the fore waves from the undertrailing portion of the wing is a way of improving the performance, and further calculations appear desirable. A comparison of the conventional ramjet-plus wing with underwing heat addition when the heat addition is concentrated near the wing shows the ramjet to be superior on a range basis up to Mach number of about B. The heat distribution under the wing and the assumed ramjet and airframe performance may have a marked effect on this conclusion. Underwing heat addition can be useful in providing high-altitude maneuver capability at high flight Mach numbers for an airplane powered by conventional ramjets during cruise.

  16. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  17. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica) genome: new insights from bioinformatics analysis

    PubMed Central

    2012-01-01

    Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were analysed to predict the

  18. Bioinformatics on the Cloud Computing Platform Azure

    PubMed Central

    Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811

  19. Bioinformatics on the cloud computing platform Azure.

    PubMed

    Shanahan, Hugh P; Owen, Anne M; Harrison, Andrew P

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811

  20. ANALYSIS OF MPC ACCESS REQUIREMENTS FOR ADDITION OF FILLER MATERIALS

    SciTech Connect

    W. Wallin

    1996-09-03

    This analysis is prepared by the Mined Geologic Disposal System (MGDS) Waste Package Development Department (WPDD) in response to a request received via a QAP-3-12 Design Input Data Request (Ref. 5.1) from WAST Design (formerly MRSMPC Design). The request is to provide: Specific MPC access requirements for the addition of filler materials at the MGDS (i.e., location and size of access required). The objective of this analysis is to provide a response to the foregoing request. The purpose of this analysis is to provide a documented record of the basis for the response. The response is stated in Section 8 herein. The response is based upon requirements from an MGDS perspective.

  1. Intrageneric primer design: Bringing bioinformatics tools to the class.

    PubMed

    Lima, André O S; Garcês, Sérgio P S

    2006-09-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private and academic) with a need for bachelor of science students with bioinformatics skills. In consideration of this need, described here is a problem-based class in which students are asked to design a set of intrageneric primers for PCR. The exercise is divided into five classes of 1 h each, in which students use freeware bioinformatics tools and data bases available through the Internet. Besides designing the set of primers, the students will consequently learn the significance and use of the major bioinformatics procedures, such as searching a data base, conducting and analyzing sequence multialignment, comparing sequences with a data base, and selecting primers. PMID:21638710

  2. Translational bioinformatics in psychoneuroimmunology: methods and applications.

    PubMed

    Yan, Qing

    2012-01-01

    Translational bioinformatics plays an indispensable role in transforming psychoneuroimmunology (PNI) into personalized medicine. It provides a powerful method to bridge the gaps between various knowledge domains in PNI and systems biology. Translational bioinformatics methods at various systems levels can facilitate pattern recognition, and expedite and validate the discovery of systemic biomarkers to allow their incorporation into clinical trials and outcome assessments. Analysis of the correlations between genotypes and phenotypes including the behavioral-based profiles will contribute to the transition from the disease-based medicine to human-centered medicine. Translational bioinformatics would also enable the establishment of predictive models for patient responses to diseases, vaccines, and drugs. In PNI research, the development of systems biology models such as those of the neurons would play a critical role. Methods based on data integration, data mining, and knowledge representation are essential elements in building health information systems such as electronic health records and computerized decision support systems. Data integration of genes, pathophysiology, and behaviors are needed for a broad range of PNI studies. Knowledge discovery approaches such as network-based systems biology methods are valuable in studying the cross-talks among pathways in various brain regions involved in disorders such as Alzheimer's disease. PMID:22933157

  3. Application of Bioinformatics in Chronobiology Research

    PubMed Central

    Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía

    2013-01-01

    Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through “omics” projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research. PMID:24187519

  4. Web services at the European Bioinformatics Institute-2009

    PubMed Central

    Mcwilliam, Hamish; Valentin, Franck; Goujon, Mickael; Li, Weizhong; Narayanasamy, Menaka; Martin, Jenny; Miyar, Teresa; Lopez, Rodrigo

    2009-01-01

    The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009–2010. PMID:19435877

  5. Spectral Envelopes and Additive + Residual Analysis/Synthesis

    NASA Astrophysics Data System (ADS)

    Rodet, Xavier; Schwarz, Diemo

    The subject of this chapter is the estimation, representation, modification, and use of spectral envelopes in the context of sinusoidal-additive-plus-residual analysis/synthesis. A spectral envelope is an amplitude-vs-frequency function, which may be obtained from the envelope of a short-time spectrum (Rodet et al., 1987; Schwarz, 1998). [Precise definitions of such an envelope and short-time spectrum (STS) are given in Section 2.] The additive-plus-residual analysis/synthesis method is based on a representation of signals in terms of a sum of time-varying sinusoids and of a non-sinusoidal residual signal [e.g., see Serra (1989), Laroche et al. (1993), McAulay and Quatieri (1995), and Ding and Qian (1997)]. Many musical sound signals may be described as a combination of a nearly periodic waveform and colored noise. The nearly periodic part of the signal can be viewed as a sum of sinusoidal components, called partials, with time-varying frequency and amplitude. Such sinusoidal components are easily observed on a spectral analysis display (Fig. 5.1) as obtained, for instance, from a discrete Fourier transform.

  6. The 2015 Bioinformatics Open Source Conference (BOSC 2015)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J. A.; Lapp, Hilmar

    2016-01-01

    The Bioinformatics Open Source Conference (BOSC) is organized by the Open Bioinformatics Foundation (OBF), a nonprofit group dedicated to promoting the practice and philosophy of open source software development and open science within the biological research community. Since its inception in 2000, BOSC has provided bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community. BOSC offers a focused environment for developers and users to interact and share ideas about standards; software development practices; practical techniques for solving bioinformatics problems; and approaches that promote open science and sharing of data, results, and software. BOSC is run as a two-day special interest group (SIG) before the annual Intelligent Systems in Molecular Biology (ISMB) conference. BOSC 2015 took place in Dublin, Ireland, and was attended by over 125 people, about half of whom were first-time attendees. Session topics included “Data Science;” “Standards and Interoperability;” “Open Science and Reproducibility;” “Translational Bioinformatics;” “Visualization;” and “Bioinformatics Open Source Project Updates”. In addition to two keynote talks and dozens of shorter talks chosen from submitted abstracts, BOSC 2015 included a panel, titled “Open Source, Open Door: Increasing Diversity in the Bioinformatics Open Source Community,” that provided an opportunity for open discussion about ways to increase the diversity of participants in BOSC in particular, and in open source bioinformatics in general. The complete program of BOSC 2015 is available online at http://www.open-bio.org/wiki/BOSC_2015_Schedule. PMID:26914653

  7. Generations of interdisciplinarity in bioinformatics

    PubMed Central

    Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L.

    2016-01-01

    Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this “borderland.” As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature. PMID:27453689

  8. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    NASA Technical Reports Server (NTRS)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  9. Comparison of Online and Onsite Bioinformatics Instruction for a Fully Online Bioinformatics Master’s Program

    PubMed Central

    Obom, Kristina M.; Cummings, Patrick J.

    2007-01-01

    The completely online Master of Science in Bioinformatics program differs from the onsite program only in the mode of content delivery. Analysis of student satisfaction indicates no statistically significant difference between most online and onsite student responses, however, online and onsite students do differ significantly in their responses to a few questions on the course evaluation queries. Analysis of student exam performance using three assessments indicates that there was no significant difference in grades earned by students in online and onsite courses. These results suggest that our model for online bioinformatics education provides students with a rigorous course of study that is comparable to onsite course instruction and possibly provides a more rigorous course load and more opportunities for participation. PMID:23653816

  10. Reproducible Bioinformatics Research for Biologists

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This book chapter describes the current Big Data problem in Bioinformatics and the resulting issues with performing reproducible computational research. The core of the chapter provides guidelines and summaries of current tools/techniques that a noncomputational researcher would need to learn to pe...

  11. Clinical Bioinformatics: challenges and opportunities

    PubMed Central

    2012-01-01

    Background Network Tools and Applications in Biology (NETTAB) Workshops are a series of meetings focused on the most promising and innovative ICT tools and to their usefulness in Bioinformatics. The NETTAB 2011 workshop, held in Pavia, Italy, in October 2011 was aimed at presenting some of the most relevant methods, tools and infrastructures that are nowadays available for Clinical Bioinformatics (CBI), the research field that deals with clinical applications of bioinformatics. Methods In this editorial, the viewpoints and opinions of three world CBI leaders, who have been invited to participate in a panel discussion of the NETTAB workshop on the next challenges and future opportunities of this field, are reported. These include the development of data warehouses and ICT infrastructures for data sharing, the definition of standards for sharing phenotypic data and the implementation of novel tools to implement efficient search computing solutions. Results Some of the most important design features of a CBI-ICT infrastructure are presented, including data warehousing, modularity and flexibility, open-source development, semantic interoperability, integrated search and retrieval of -omics information. Conclusions Clinical Bioinformatics goals are ambitious. Many factors, including the availability of high-throughput "-omics" technologies and equipment, the widespread availability of clinical data warehouses and the noteworthy increase in data storage and computational power of the most recent ICT systems, justify research and efforts in this domain, which promises to be a crucial leveraging factor for biomedical research. PMID:23095472

  12. Computational intelligence techniques in bioinformatics.

    PubMed

    Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I

    2013-12-01

    Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. PMID:23891719

  13. Sensitivity analysis of geometric errors in additive manufacturing medical models.

    PubMed

    Pinto, Jose Miguel; Arrieta, Cristobal; Andia, Marcelo E; Uribe, Sergio; Ramos-Grez, Jorge; Vargas, Alex; Irarrazaval, Pablo; Tejos, Cristian

    2015-03-01

    Additive manufacturing (AM) models are used in medical applications for surgical planning, prosthesis design and teaching. For these applications, the accuracy of the AM models is essential. Unfortunately, this accuracy is compromised due to errors introduced by each of the building steps: image acquisition, segmentation, triangulation, printing and infiltration. However, the contribution of each step to the final error remains unclear. We performed a sensitivity analysis comparing errors obtained from a reference with those obtained modifying parameters of each building step. Our analysis considered global indexes to evaluate the overall error, and local indexes to show how this error is distributed along the surface of the AM models. Our results show that the standard building process tends to overestimate the AM models, i.e. models are larger than the original structures. They also show that the triangulation resolution and the segmentation threshold are critical factors, and that the errors are concentrated at regions with high curvatures. Errors could be reduced choosing better triangulation and printing resolutions, but there is an important need for modifying some of the standard building processes, particularly the segmentation algorithms. PMID:25649961

  14. Bioinformatics in proteomics: application, terminology, and pitfalls.

    PubMed

    Wiemer, Jan C; Prokudin, Alexander

    2004-01-01

    Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages. PMID:15237926

  15. Bioinformatic Insights from Metagenomics through Visualization

    SciTech Connect

    Havre, Susan L.; Webb-Robertson, Bobbie-Jo M.; Shah, Anuj; Posse, Christian; Gopalan, Banu; Brockman, Fred J.

    2005-08-10

    Revised abstract: (remove current and replace with this) Cutting-edge biological and bioinformatics research seeks a systems perspective through the analysis of multiple types of high-throughput and other experimental data for the same sample. Systems-level analysis requires the integration and fusion of such data, typically through advanced statistics and mathematics. Visualization is a complementary com-putational approach that supports integration and analysis of complex data or its derivatives. We present a bioinformatics visualization prototype, Juxter, which depicts categorical information derived from or assigned to these diverse data for the purpose of comparing patterns across categorizations. The visualization allows users to easily discern correlated and anomalous patterns in the data. These patterns, which might not be detected automatically by algorithms, may reveal valuable information leading to insight and discovery. We describe the visualization and interaction capabilities and demonstrate its utility in a new field, metagenomics, which combines molecular biology and genetics to identify and characterize genetic material from multi-species microbial samples.

  16. Receptor-binding sites: bioinformatic approaches.

    PubMed

    Flower, Darren R

    2006-01-01

    It is increasingly clear that both transient and long-lasting interactions between biomacromolecules and their molecular partners are the most fundamental of all biological mechanisms and lie at the conceptual heart of protein function. In particular, the protein-binding site is the most fascinating and important mechanistic arbiter of protein function. In this review, I examine the nature of protein-binding sites found in both ligand-binding receptors and substrate-binding enzymes. I highlight two important concepts underlying the identification and analysis of binding sites. The first is based on knowledge: when one knows the location of a binding site in one protein, one can "inherit" the site from one protein to another. The second approach involves the a priori prediction of a binding site from a sequence or a structure. The full and complete analysis of binding sites will necessarily involve the full range of informatic techniques ranging from sequence-based bioinformatic analysis through structural bioinformatics to computational chemistry and molecular physics. Integration of both diverse experimental and diverse theoretical approaches is thus a mandatory requirement in the evaluation of binding sites and the binding events that occur within them. PMID:16671408

  17. Provenance in bioinformatics workflows

    PubMed Central

    2013-01-01

    In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine. PMID:24564294

  18. An agent-based multilayer architecture for bioinformatics grids.

    PubMed

    Bartocci, Ezio; Cacciagrano, Diletta; Cannata, Nicola; Corradini, Flavio; Merelli, Emanuela; Milanesi, Luciano; Romano, Paolo

    2007-06-01

    Due to the huge volume and complexity of biological data available today, a fundamental component of biomedical research is now in silico analysis. This includes modelling and simulation of biological systems and processes, as well as automated bioinformatics analysis of high-throughput data. The quest for bioinformatics resources (including databases, tools, and knowledge) becomes therefore of extreme importance. Bioinformatics itself is in rapid evolution and dedicated Grid cyberinfrastructures already offer easier access and sharing of resources. Furthermore, the concept of the Grid is progressively interleaving with those of Web Services, semantics, and software agents. Agent-based systems can play a key role in learning, planning, interaction, and coordination. Agents constitute also a natural paradigm to engineer simulations of complex systems like the molecular ones. We present here an agent-based, multilayer architecture for bioinformatics Grids. It is intended to support both the execution of complex in silico experiments and the simulation of biological systems. In the architecture a pivotal role is assigned to an "alive" semantic index of resources, which is also expected to facilitate users' awareness of the bioinformatics domain. PMID:17695749

  19. Nonparametric survival analysis using Bayesian Additive Regression Trees (BART).

    PubMed

    Sparapani, Rodney A; Logan, Brent R; McCulloch, Robert E; Laud, Purushottam W

    2016-07-20

    Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26854022

  20. Precessing rotating flows with additional shear: Stability analysis

    NASA Astrophysics Data System (ADS)

    Salhi, A.; Cambon, C.

    2009-03-01

    We consider unbounded precessing rotating flows in which vertical or horizontal shear is induced by the interaction between the solid-body rotation (with angular velocity Ω0 ) and the additional “precessing” Coriolis force (with angular velocity -ɛΩ0 ), normal to it. A “weak” shear flow, with rate 2ɛ of the same order of the Poincaré “small” ratio ɛ , is needed for balancing the gyroscopic torque, so that the whole flow satisfies Euler’s equations in the precessing frame (the so-called admissibility conditions). The base flow case with vertical shear (its cross-gradient direction is aligned with the main angular velocity) corresponds to Mahalov’s [Phys. Fluids A 5, 891 (1993)] precessing infinite cylinder base flow (ignoring boundary conditions), while the base flow case with horizontal shear (its cross-gradient direction is normal to both main and precessing angular velocities) corresponds to the unbounded precessing rotating shear flow considered by Kerswell [Geophys. Astrophys. Fluid Dyn. 72, 107 (1993)]. We show that both these base flows satisfy the admissibility conditions and can support disturbances in terms of advected Fourier modes. Because the admissibility conditions cannot select one case with respect to the other, a more physical derivation is sought: Both flows are deduced from Poincaré’s [Bull. Astron. 27, 321 (1910)] basic state of a precessing spheroidal container, in the limit of small ɛ . A Rapid distortion theory (RDT) type of stability analysis is then performed for the previously mentioned disturbances, for both base flows. The stability analysis of the Kerswell base flow, using Floquet’s theory, is recovered, and its counterpart for the Mahalov base flow is presented. Typical growth rates are found to be the same for both flows at very small ɛ , but significant differences are obtained regarding growth rates and widths of instability bands, if larger ɛ values, up to 0.2, are considered. Finally, both flow cases

  1. Embracing the Future: Bioinformatics for High School Women

    NASA Astrophysics Data System (ADS)

    Zales, Charlotte Rappe; Cronin, Susan J.

    Sixteen high school women participated in a 5-week residential summer program designed to encourage female and minority students to choose careers in scientific fields. Students gained expertise in bioinformatics through problem-based learning in a complex learning environment of content instruction, speakers, labs, and trips. Innovative hands-on activities filled the program. Students learned biological principles in context and sophisticated bioinformatics tools for processing data. Students additionally mastered a variety of information-searching techniques. Students completed creative individual and group projects, demonstrating the successful integration of biology, information technology, and bioinformatics. Discussions with female scientists allowed students to see themselves in similar roles. Summer residential aspects fostered an atmosphere in which students matured in interacting with others and in their views of diversity.

  2. Using bioinformatics for drug target identification from the genome.

    PubMed

    Jiang, Zhenran; Zhou, Yanhong

    2005-01-01

    Genomics and proteomics technologies have created a paradigm shift in the drug discovery process, with bioinformatics having a key role in the exploitation of genomic, transcriptomic, and proteomic data to gain insights into the molecular mechanisms that underlie disease and to identify potential drug targets. We discuss the current state of the art for some of the bioinformatic approaches to identifying drug targets, including identifying new members of successful target classes and their functions, predicting disease relevant genes, and constructing gene networks and protein interaction networks. In addition, we introduce drug target discovery using the strategy of systems biology, and discuss some of the data resources for the identification of drug targets. Although bioinformatics tools and resources can be used to identify putative drug targets, validating targets is still a process that requires an understanding of the role of the gene or protein in the disease process and is heavily dependent on laboratory-based work. PMID:16336003

  3. ExPASy: SIB bioinformatics resource portal

    PubMed Central

    Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz

    2012-01-01

    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a ‘decentralized’ way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across ‘selected’ resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy. PMID:22661580

  4. Technosciences in Academia: Rethinking a Conceptual Framework for Bioinformatics Undergraduate Curricula

    NASA Astrophysics Data System (ADS)

    Symeonidis, Iphigenia Sofia

    This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.

  5. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    ERIC Educational Resources Information Center

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  6. Bioinformatics for Next Generation Sequencing Data

    PubMed Central

    Magi, Alberto; Benelli, Matteo; Gozzini, Alessia; Girolami, Francesca; Torricelli, Francesca; Brandi, Maria Luisa

    2010-01-01

    The emergence of next-generation sequencing (NGS) platforms imposes increasing demands on statistical methods and bioinformatic tools for the analysis and the management of the huge amounts of data generated by these technologies. Even at the early stages of their commercial availability, a large number of softwares already exist for analyzing NGS data. These tools can be fit into many general categories including alignment of sequence reads to a reference, base-calling and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection and genome browsing. This manuscript aims to guide readers in the choice of the available computational tools that can be used to face the several steps of the data analysis workflow. PMID:24710047

  7. Bioinformatics in Africa: The Rise of Ghana?

    PubMed

    Karikari, Thomas K

    2015-09-01

    Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics. PMID:26378921

  8. Bioinformatics in Africa: The Rise of Ghana?

    PubMed Central

    Karikari, Thomas K.

    2015-01-01

    Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics. PMID:26378921

  9. Bioinformatics by Example: From Sequence to Target

    NASA Astrophysics Data System (ADS)

    Kossida, Sophia; Tahri, Nadia; Daizadeh, Iraj

    2002-12-01

    With the completion of the human genome, and the imminent completion of other large-scale sequencing and structure-determination projects, computer-assisted bioscience is aimed to become the new paradigm for conducting basic and applied research. The presence of these additional bioinformatics tools stirs great anxiety for experimental researchers (as well as for pedagogues), since they are now faced with a wider and deeper knowledge of differing disciplines (biology, chemistry, physics, mathematics, and computer science). This review targets those individuals who are interested in using computational methods in their teaching or research. By analyzing a real-life, pharmaceutical, multicomponent, target-based example the reader will experience this fascinating new discipline.

  10. Technical phosphoproteomic and bioinformatic tools useful in cancer research.

    PubMed

    López, Elena; Wesselink, Jan-Jaap; López, Isabel; Mendieta, Jesús; Gómez-Puertas, Paulino; Muñoz, Sarbelio Rodríguez

    2011-01-01

    Reversible protein phosphorylation is one of the most important forms of cellular regulation. Thus, phosphoproteomic analysis of protein phosphorylation in cells is a powerful tool to evaluate cell functional status. The importance of protein kinase-regulated signal transduction pathways in human cancer has led to the development of drugs that inhibit protein kinases at the apex or intermediary levels of these pathways. Phosphoproteomic analysis of these signalling pathways will provide important insights for operation and connectivity of these pathways to facilitate identification of the best targets for cancer therapies. Enrichment of phosphorylated proteins or peptides from tissue or bodily fluid samples is required. The application of technologies such as phosphoenrichments, mass spectrometry (MS) coupled to bioinformatics tools is crucial for the identification and quantification of protein phosphorylation sites for advancing in such relevant clinical research. A combination of different phosphopeptide enrichments, quantitative techniques and bioinformatic tools is necessary to achieve good phospho-regulation data and good structural analysis of protein studies. The current and most useful proteomics and bioinformatics techniques will be explained with research examples. Our aim in this article is to be helpful for cancer research via detailing proteomics and bioinformatic tools. PMID:21967744

  11. Technical phosphoproteomic and bioinformatic tools useful in cancer research

    PubMed Central

    2011-01-01

    Reversible protein phosphorylation is one of the most important forms of cellular regulation. Thus, phosphoproteomic analysis of protein phosphorylation in cells is a powerful tool to evaluate cell functional status. The importance of protein kinase-regulated signal transduction pathways in human cancer has led to the development of drugs that inhibit protein kinases at the apex or intermediary levels of these pathways. Phosphoproteomic analysis of these signalling pathways will provide important insights for operation and connectivity of these pathways to facilitate identification of the best targets for cancer therapies. Enrichment of phosphorylated proteins or peptides from tissue or bodily fluid samples is required. The application of technologies such as phosphoenrichments, mass spectrometry (MS) coupled to bioinformatics tools is crucial for the identification and quantification of protein phosphorylation sites for advancing in such relevant clinical research. A combination of different phosphopeptide enrichments, quantitative techniques and bioinformatic tools is necessary to achieve good phospho-regulation data and good structural analysis of protein studies. The current and most useful proteomics and bioinformatics techniques will be explained with research examples. Our aim in this article is to be helpful for cancer research via detailing proteomics and bioinformatic tools. PMID:21967744

  12. Omics-bioinformatics in the context of clinical data.

    PubMed

    Mayer, Gert; Heinze, Georg; Mischak, Harald; Hellemons, Merel E; Heerspink, Hiddo J Lambers; Bakker, Stephan J L; de Zeeuw, Dick; Haiduk, Martin; Rossing, Peter; Oberbauer, Rainer

    2011-01-01

    The Omics revolution has provided the researcher with tools and methodologies for qualitative and quantitative assessment of a wide spectrum of molecular players spanning from the genome to the meta-bolome level. As a consequence, explorative analysis (in contrast to purely hypothesis driven research procedures) has become applicable. However, numerous issues have to be considered for deriving meaningful results from Omics, and bioinformatics has to respect these in data analysis and interpretation. Aspects include sample type and quality, concise definition of the (clinical) question, and selection of samples ideally coming from thoroughly defined sample and data repositories. Omics suffers from a principal shortcoming, namely unbalanced sample-to-feature matrix denoted as "curse of dimensionality", where a feature refers to a specific gene or protein among the many thousands assayed in parallel in an Omics experiment. This setting makes the identification of relevant features with respect to a phenotype under analysis error prone from a statistical perspective. From this sample size calculation for screening studies and for verification of results from Omics, bioinformatics is essential. Here we present key elements to be considered for embedding Omics bioinformatics in a quality controlled workflow for Omics screening, feature identification, and validation. Relevant items include sample and clinical data management, minimum sample quality requirements, sample size estimates, and statistical procedures for computing the significance of findings from Omics bioinformatics in validation studies. PMID:21370098

  13. Identification and Comparative Analysis of H2O2-Scavenging Enzymes (Ascorbate Peroxidase and Glutathione Peroxidase) in Selected Plants Employing Bioinformatics Approaches

    PubMed Central

    Ozyigit, Ibrahim I.; Filiz, Ertugrul; Vatansever, Recep; Kurtoglu, Kuaybe Y.; Koc, Ibrahim; Öztürk, Münir X.; Anjum, Naser A.

    2016-01-01

    Among major reactive oxygen species (ROS), hydrogen peroxide (H2O2) exhibits dual roles in plant metabolism. Low levels of H2O2 modulate many biological/physiological processes in plants; whereas, its high level can cause damage to cell structures, having severe consequences. Thus, steady-state level of cellular H2O2 must be tightly regulated. Glutathione peroxidases (GPX) and ascorbate peroxidase (APX) are two major ROS-scavenging enzymes which catalyze the reduction of H2O2 in order to prevent potential H2O2-derived cellular damage. Employing bioinformatics approaches, this study presents a comparative evaluation of both GPX and APX in 18 different plant species, and provides valuable insights into the nature and complex regulation of these enzymes. Herein, (a) potential GPX and APX genes/proteins from 18 different plant species were identified, (b) their exon/intron organization were analyzed, (c) detailed information about their physicochemical properties were provided, (d) conserved motif signatures of GPX and APX were identified, (e) their phylogenetic trees and 3D models were constructed, (f) protein-protein interaction networks were generated, and finally (g) GPX and APX gene expression profiles were analyzed. Study outcomes enlightened GPX and APX as major H2O2-scavenging enzymes at their structural and functional levels, which could be used in future studies in the current direction. PMID:27047498

  14. Identification and Comparative Analysis of H2O2-Scavenging Enzymes (Ascorbate Peroxidase and Glutathione Peroxidase) in Selected Plants Employing Bioinformatics Approaches.

    PubMed

    Ozyigit, Ibrahim I; Filiz, Ertugrul; Vatansever, Recep; Kurtoglu, Kuaybe Y; Koc, Ibrahim; Öztürk, Münir X; Anjum, Naser A

    2016-01-01

    Among major reactive oxygen species (ROS), hydrogen peroxide (H2O2) exhibits dual roles in plant metabolism. Low levels of H2O2 modulate many biological/physiological processes in plants; whereas, its high level can cause damage to cell structures, having severe consequences. Thus, steady-state level of cellular H2O2 must be tightly regulated. Glutathione peroxidases (GPX) and ascorbate peroxidase (APX) are two major ROS-scavenging enzymes which catalyze the reduction of H2O2 in order to prevent potential H2O2-derived cellular damage. Employing bioinformatics approaches, this study presents a comparative evaluation of both GPX and APX in 18 different plant species, and provides valuable insights into the nature and complex regulation of these enzymes. Herein, (a) potential GPX and APX genes/proteins from 18 different plant species were identified, (b) their exon/intron organization were analyzed, (c) detailed information about their physicochemical properties were provided, (d) conserved motif signatures of GPX and APX were identified, (e) their phylogenetic trees and 3D models were constructed, (f) protein-protein interaction networks were generated, and finally (g) GPX and APX gene expression profiles were analyzed. Study outcomes enlightened GPX and APX as major H2O2-scavenging enzymes at their structural and functional levels, which could be used in future studies in the current direction. PMID:27047498

  15. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-01

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. PMID:27318307

  16. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. PMID:21320564

  17. A decade of web server updates at the bioinformatics links directory: 2003–2012

    PubMed Central

    Brazas, Michelle D.; Yim, David; Yeung, Winston; Ouellette, B. F. Francis

    2012-01-01

    The 2012 Bioinformatics Links Directory update marks the 10th special Web Server issue from Nucleic Acids Research. Beginning with content from their 2003 publication, the Bioinformatics Links Directory in collaboration with Nucleic Acids Research has compiled and published a comprehensive list of freely accessible, online tools, databases and resource materials for the bioinformatics and life science research communities. The past decade has exhibited significant growth and change in the types of tools, databases and resources being put forth, reflecting both technology changes and the nature of research over that time. With the addition of 90 web server tools and 12 updates from the July 2012 Web Server issue of Nucleic Acids Research, the Bioinformatics Links Directory at http://bioinformatics.ca/links_directory/ now contains an impressive 134 resources, 455 databases and 1205 web server tools, mirroring the continued activity and efforts of our field. PMID:22700703

  18. Additional analysis of dendrochemical data of Fallon, Nevada.

    PubMed

    Sheppard, Paul R; Helsel, Dennis R; Speakman, Robert J; Ridenour, Gary; Witten, Mark L

    2012-04-01

    Previously reported dendrochemical data showed temporal variability in concentration of tungsten (W) and cobalt (Co) in tree rings of Fallon, Nevada, US. Criticism of this work questioned the use of the Mann-Whitney test for determining change in element concentrations. Here, we demonstrate that Mann-Whitney is appropriate for comparing background element concentrations to possibly elevated concentrations in environmental media. Given that Mann-Whitney tests for differences in shapes of distributions, inter-tree variability (e.g., "coefficient of median variation") was calculated for each measured element across trees within subsites and time periods. For W and Co, the metals of highest interest in Fallon, inter-tree variability was always higher within versus outside of Fallon. For calibration purposes, this entire analysis was repeated at a different town, Sweet Home, Oregon, which has a known tungsten-powder facility, and inter-tree variability of W in tree rings confirmed the establishment date of that facility. Mann-Whitney testing of simulated data also confirmed its appropriateness for analysis of data affected by point-source contamination. This research adds important new dimensions to dendrochemistry of point-source contamination by adding analysis of inter-tree variability to analysis of central tendency. Fallon remains distinctive by a temporal increase in W beginning by the mid 1990s and by elevated Co since at least the early 1990s, as well as by high inter-tree variability for W and Co relative to comparison towns. PMID:22227064

  19. Analysis of Saccharides by the Addition of Amino Acids

    NASA Astrophysics Data System (ADS)

    Ozdemir, Abdil; Lin, Jung-Lee; Gillig, Kent J.; Gulfen, Mustafa; Chen, Chung-Hsuan

    2016-06-01

    In this work, we present the detection sensitivity improvement of electrospray ionization (ESI) mass spectrometry of neutral saccharides in a positive ion mode by the addition of various amino acids. Saccharides of a broad molecular weight range were chosen as the model compounds in the present study. Saccharides provide strong noncovalent interactions with amino acids, and the complex formation enhances the signal intensity and simplifies the mass spectra of saccharides. Polysaccharides provide a polymer-like ESI spectrum with a basic subunit difference between multiply charged chains. The protonated spectra of saccharides are not well identified because of different charge state distributions produced by the same molecules. Depending on the solvent used and other ions or molecules present in the solution, noncovalent interactions with saccharides may occur. These interactions are affected by the addition of amino acids. Amino acids with polar side groups show a strong tendency to interact with saccharides. In particular, serine shows a high tendency to interact with saccharides and significantly improves the detection sensitivity of saccharide compounds.

  20. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control

    PubMed Central

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part’s porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041

  1. Porosity Measurements and Analysis for Metal Additive Manufacturing Process Control.

    PubMed

    Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M

    2014-01-01

    Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part's porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041

  2. Additional EIPC Study Analysis: Interim Report on High Priority Topics

    SciTech Connect

    Hadley, Stanton W

    2013-11-01

    Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 13 topics was developed for further analysis; this paper discusses the first five.

  3. Bioinformatics and molecular modeling in glycobiology

    PubMed Central

    Schloissnig, Siegfried

    2010-01-01

    The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein–carbohydrate interaction are reviewed. PMID:20364395

  4. Biology and bioinformatics of myeloma cell.

    PubMed

    Abroun, Saeid; Saki, Najmaldin; Fakher, Rahim; Asghari, Farahnaz

    2012-12-01

    Multiple myeloma (MM) is a plasma cell disorder that occurs in about 10% of all hematologic cancers. The majority of patients (99%) are over 50 years of age when diagnosed. In the bone marrow (BM), stromal and hematopoietic stem cells (HSCs) are responsible for the production of blood cells. Therefore any destruction or/and changes within the BM undesirably impacts a wide range of hematopoiesis, causing diseases and influencing patient survival. In order to establish an effective therapeutic strategy, recognition of the biology and evaluation of bioinformatics models for myeloma cells are necessary to assist in determining suitable methods to cure or prevent disease complications in patients. This review presents the evaluation of molecular and cellular aspects of MM such as genetic translocation, genetic analysis, cell surface marker, transcription factors, and chemokine signaling pathways. It also briefly reviews some of the mechanisms involved in MM in order to develop a better understanding for use in future studies. PMID:23253865

  5. Disclosure of hydraulic fracturing fluid chemical additives: analysis of regulations.

    PubMed

    Maule, Alexis L; Makey, Colleen M; Benson, Eugene B; Burrows, Isaac J; Scammell, Madeleine K

    2013-01-01

    Hydraulic fracturing is used to extract natural gas from shale formations. The process involves injecting into the ground fracturing fluids that contain thousands of gallons of chemical additives. Companies are not mandated by federal regulations to disclose the identities or quantities of chemicals used during hydraulic fracturing operations on private or public lands. States have begun to regulate hydraulic fracturing fluids by mandating chemical disclosure. These laws have shortcomings including nondisclosure of proprietary or "trade secret" mixtures, insufficient penalties for reporting inaccurate or incomplete information, and timelines that allow for after-the-fact reporting. These limitations leave lawmakers, regulators, public safety officers, and the public uninformed and ill-prepared to anticipate and respond to possible environmental and human health hazards associated with hydraulic fracturing fluids. We explore hydraulic fracturing exemptions from federal regulations, as well as current and future efforts to mandate chemical disclosure at the federal and state level. PMID:23552653

  6. Bioinformatics Analysis Reveals Distinct Molecular Characteristics of Hepatitis B-Related Hepatocellular Carcinomas from Very Early to Advanced Barcelona Clinic Liver Cancer Stages

    PubMed Central

    Hu, Wei; Kou, Yan-Bo; You, Hong-Juan; Liu, Xiao-Mei; Zheng, Kui-Yang; Tang, Ren-Xian

    2016-01-01

    Hepatocellular carcinoma (HCC)is the fifth most common malignancy associated with high mortality. One of the risk factors for HCC is chronic hepatitis B virus (HBV) infection. The treatment strategy for the disease is dependent on the stage of HCC, and the Barcelona clinic liver cancer (BCLC) staging system is used in most HCC cases. However, the molecular characteristics of HBV-related HCC in different BCLC stages are still unknown. Using GSE14520 microarray data from HBV-related HCC cases with BCLC stages from 0 (very early stage) to C (advanced stage) in the gene expression omnibus (GEO) database, differentially expressed genes (DEGs), including common DEGs and unique DEGs in different BCLC stages, were identified. These DEGs were located on different chromosomes. The molecular functions and biology pathways of DEGs were identified by gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and the interactome networks of DEGs were constructed using the NetVenn online tool. The results revealed that both common DEGs and stage-specific DEGs were associated with various molecular functions and were involved in special biological pathways. In addition, several hub genes were found in the interactome networks of DEGs. The identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of HBV-related HCC through the different BCLC stages, and might be used as staging biomarkers or molecular targets for the treatment of HCC with HBV infection. PMID:27454179

  7. Distribution of cold adaptation proteins in microbial mats in Lake Joyce, Antarctica: Analysis of metagenomic data by using two bioinformatics tools.

    PubMed

    Koo, Hyunmin; Hakim, Joseph A; Fisher, Phillip R E; Grueneberg, Alexander; Andersen, Dale T; Bej, Asim K

    2016-01-01

    In this study, we report the distribution and abundance of cold-adaptation proteins in microbial mat communities in the perennially ice-covered Lake Joyce, located in the McMurdo Dry Valleys, Antarctica. We have used MG-RAST and R code bioinformatics tools on Illumina HiSeq2000 shotgun metagenomic data and compared the filtering efficacy of these two methods on cold-adaptation proteins. Overall, the abundance of cold-shock DEAD-box protein A (CSDA), antifreeze proteins (AFPs), fatty acid desaturase (FAD), trehalose synthase (TS), and cold-shock family of proteins (CSPs) were present in all mat samples at high, moderate, or low levels, whereas the ice nucleation protein (INP) was present only in the ice and bulbous mat samples at insignificant levels. Considering the near homogeneous temperature profile of Lake Joyce (0.08-0.29 °C), the distribution and abundance of these proteins across various mat samples predictively correlated with known functional attributes necessary for microbial communities to thrive in this ecosystem. The comparison of the MG-RAST and the R code methods showed dissimilar occurrences of the cold-adaptation protein sequences, though with insignificant ANOSIM (R = 0.357; p-value = 0.012), ADONIS (R(2) = 0.274; p-value = 0.03) and STAMP (p-values = 0.521-0.984) statistical analyses. Furthermore, filtering targeted sequences using the R code accounted for taxonomic groups by avoiding sequence redundancies, whereas the MG-RAST provided total counts resulting in a higher sequence output. The results from this study revealed for the first time the distribution of cold-adaptation proteins in six different types of microbial mats in Lake Joyce, while suggesting a simpler and more manageable user-defined method of R code, as compared to a web-based MG-RAST pipeline. PMID:26578243

  8. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  9. Effective epitope identification employing phylogenetic, mutational variability, sequence entropy, and correlated mutation analysis targeting NS5B protein of hepatitis C virus: from bioinformatics to therapeutics.

    PubMed

    Meshram, Rohan J; Gacche, Rajesh N

    2015-08-01

    Hepatitis C virus (HCV) is considered as a foremost cause affecting numerous human liver-related disorders. An effective immuno-prophylactic measure (like stable vaccine) is still unavailable for HCV. We perform an in silico analysis of nonstructural protein 5B (NS5B) based CD4 and CD8 epitopes that might be implicated in improvement of treatment strategies for efficient vaccine development programs against HCV. Here, we report on effective utilization of knowledge obtained from multiple sequence alignment and phylogenetic analysis for investigation and evaluation of candidate epitopes that have enormous potential to be used in formulating proficient vaccine, embracing multiple strains prevalent among major geographical locations. Mutational variability data discussed herein focus on discriminating the region under active evolutionary pressure from those having lower mutational potential in existing experimentally verified epitopes, thus, providing a concrete framework for designing an effective peptide-based vaccine against HCV. Additionally, we measured entropy distribution in NS5B residues and pinpoint the positions in epitopes that are more susceptible to mutations and, thus, account for virus strategy to evade the host immune system. Findings from this study are expected to add more details on the sequence and structural aspects of NS5B protein, ultimately facilitating our understanding about the pathophysiology of HCV and assisting advance studies on the function of NS5B antigen on the epitope level. We also report on the mutational crosstalk between functionally important coevolving residues, using correlated mutation analysis, and identify networks of coupled mutations that represent pathways of allosteric communication inside and among NS5B thumb, finger, and palm domains. PMID:25727409

  10. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related undergraduate…

  11. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  12. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Cancer.gov

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  13. Fuzzy Logic in Medicine and Bioinformatics

    PubMed Central

    Torres, Angela; Nieto, Juan J.

    2006-01-01

    The purpose of this paper is to present a general view of the current applications of fuzzy logic in medicine and bioinformatics. We particularly review the medical literature using fuzzy logic. We then recall the geometrical interpretation of fuzzy sets as points in a fuzzy hypercube and present two concrete illustrations in medicine (drug addictions) and in bioinformatics (comparison of genomes). PMID:16883057

  14. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  15. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR RLK) genetic…

  16. Searching for molecular markers in head and neck squamous cell carcinomas (HNSCC) by statistical and bioinformatic analysis of larynx-derived SAGE libraries

    PubMed Central

    Silveira, Nelson JF; Varuzza, Leonardo; Machado-Lima, Ariane; Lauretto, Marcelo S; Pinheiro, Daniel G; Rodrigues, Rodrigo V; Severino, Patrícia; Nobrega, Francisco G; Silva, Wilson A; de B Pereira, Carlos A; Tajara, Eloiza H

    2008-01-01

    Background Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The

  17. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    SciTech Connect

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad; Freyermuth, Sharyn K.; Bailey, Cheryl; Britton, Robert A.; Gordon, Stuart G.; Heinhorst, Sabine; Reed, Kelynne; Xu, Zhaohui; Sanders-Lorenz, Erin R.; Axen, Seth; Kim, Edwin; Johns, Mitrick; Scott, Kathleen; Kerfeld, Cheryl A.

    2011-08-01

    courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.

  18. Additional challenges for uncertainty analysis in river engineering

    NASA Astrophysics Data System (ADS)

    Berends, Koen; Warmink, Jord; Hulscher, Suzanne

    2016-04-01

    the proposed intervention. The implicit assumption underlying such analysis is that both models are commensurable. We hypothesize that they are commensurable only to a certain extent. In an idealised study we have demonstrated that prediction performance loss should be expected with increasingly large engineering works. When accounting for parametric uncertainty of floodplain roughness in model identification, we see uncertainty bounds for predicted effects of interventions increase with increasing intervention scale. Calibration of these types of models therefore seems to have a shelf-life, beyond which calibration does not longer improves prediction. Therefore a qualification scheme for model use is required that can be linked to model validity. In this study, we characterize model use along three dimensions: extrapolation (using the model with different external drivers), extension (using the model for different output or indicators) and modification (using modified models). Such use of models is expected to have implications for the applicability of surrogating modelling for efficient uncertainty analysis as well, which is recommended for future research. Warmink, J. J.; Straatsma, M. W.; Huthoff, F.; Booij, M. J. & Hulscher, S. J. M. H. 2013. Uncertainty of design water levels due to combined bed form and vegetation roughness in the Dutch river Waal. Journal of Flood Risk Management 6, 302-318 . DOI: 10.1111/jfr3.12014

  19. Bioinformatic Analysis of Patient-Derived ASPS Gene Expressions and ASPL-TFE3 Fusion Transcript Levels Identify Potential Therapeutic Targets

    PubMed Central

    Covell, David G.; Wallqvist, Anders; Kenney, Susan; Vistica, David T.

    2012-01-01

    Gene expression data, collected from ASPS tumors of seven different patients and from one immortalized ASPS cell line (ASPS-1), was analyzed jointly with patient ASPL-TFE3 (t(X;17)(p11;q25)) fusion transcript data to identify disease-specific pathways and their component genes. Data analysis of the pooled patient and ASPS-1 gene expression data, using conventional clustering methods, revealed a relatively small set of pathways and genes characterizing the biology of ASPS. These results could be largely recapitulated using only the gene expression data collected from patient tumor samples. The concordance between expression measures derived from ASPS-1 and both pooled and individual patient tumor data provided a rationale for extending the analysis to include patient ASPL-TFE3 fusion transcript data. A novel linear model was exploited to link gene expressions to fusion transcript data and used to identify a small set of ASPS-specific pathways and their gene expression. Cellular pathways that appear aberrantly regulated in response to the t(X;17)(p11;q25) translocation include the cell cycle and cell adhesion. The identification of pathways and gene subsets characteristic of ASPS support current therapeutic strategies that target the FLT1 and MET, while also proposing additional targeting of genes found in pathways involved in the cell cycle (CHK1), cell adhesion (ARHGD1A), cell division (CDC6), control of meiosis (RAD51L3) and mitosis (BIRC5), and chemokine-related protein tyrosine kinase activity (CCL4). PMID:23226201

  20. Bioinformatics education--perspectives and challenges out of Africa.

    PubMed

    Tastan Bishop, Özlem; Adebiyi, Ezekiel F; Alzohairy, Ahmed M; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J; Panji, Sumir; Patterton, Hugh-G

    2015-03-01

    The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350

  1. Bioinformatics Education—Perspectives and Challenges out of Africa

    PubMed Central

    Adebiyi, Ezekiel F.; Alzohairy, Ahmed M.; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J.; Panji, Sumir; Patterton, Hugh-G.

    2015-01-01

    The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350

  2. Bioinformatic-driven search for metabolic biomarkers in disease

    PubMed Central

    2011-01-01

    The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application. PMID:21884622

  3. Computational and Bioinformatics Frameworks for Next-Generation Whole Exome and Genome Sequencing

    PubMed Central

    Dolled-Filhart, Marisa P.; Lee, Michael; Ou-yang, Chih-wen; Haraksingh, Rajini Rani; Lin, Jimmy Cheng-Ho

    2013-01-01

    It has become increasingly apparent that one of the major hurdles in the genomic age will be the bioinformatics challenges of next-generation sequencing. We provide an overview of a general framework of bioinformatics analysis. For each of the three stages of (1) alignment, (2) variant calling, and (3) filtering and annotation, we describe the analysis required and survey the different software packages that are used. Furthermore, we discuss possible future developments as data sources grow and highlight opportunities for new bioinformatics tools to be developed. PMID:23365548

  4. Structural Bioinformatics and Protein Docking Analysis of the Molecular Chaperone-Kinase Interactions: Towards Allosteric Inhibition of Protein Kinases by Targeting the Hsp90-Cdc37 Chaperone Machinery

    PubMed Central

    Lawless, Nathan; Blacklock, Kristin; Berrigan, Elizabeth; Verkhivker, Gennady

    2013-01-01

    A fundamental role of the Hsp90-Cdc37 chaperone system in mediating maturation of protein kinase clients and supporting kinase functional activity is essential for the integrity and viability of signaling pathways involved in cell cycle control and organism development. Despite significant advances in understanding structure and function of molecular chaperones, the molecular mechanisms and guiding principles of kinase recruitment to the chaperone system are lacking quantitative characterization. Structural and thermodynamic characterization of Hsp90-Cdc37 binding with protein kinase clients by modern experimental techniques is highly challenging, owing to a transient nature of chaperone-mediated interactions. In this work, we used experimentally-guided protein docking to probe the allosteric nature of the Hsp90-Cdc37 binding with the cyclin-dependent kinase 4 (Cdk4) kinase clients. The results of docking simulations suggest that the kinase recognition and recruitment to the chaperone system may be primarily determined by Cdc37 targeting of the N-terminal kinase lobe. The interactions of Hsp90 with the C-terminal kinase lobe may provide additional “molecular brakes” that can lock (or unlock) kinase from the system during client loading (release) stages. The results of this study support a central role of the Cdc37 chaperone in recognition and recruitment of the kinase clients. Structural analysis may have useful implications in developing strategies for allosteric inhibition of protein kinases by targeting the Hsp90-Cdc37 chaperone machinery. PMID:24287464

  5. SNPTrack™ : an integrated bioinformatics system for genetic association studies.

    PubMed

    Xu, Joshua; Kelly, Reagan; Zhou, Guangxu; Turner, Steven A; Ding, Don; Harris, Stephen C; Hong, Huixiao; Fang, Hong; Tong, Weida

    2012-01-01

    A genetic association study is a complicated process that involves collecting phenotypic data, generating genotypic data, analyzing associations between genotypic and phenotypic data, and interpreting genetic biomarkers identified. SNPTrack is an integrated bioinformatics system developed by the US Food and Drug Administration (FDA) to support the review and analysis of pharmacogenetics data resulting from FDA research or submitted by sponsors. The system integrates data management, analysis, and interpretation in a single platform for genetic association studies. Specifically, it stores genotyping data and single-nucleotide polymorphism (SNP) annotations along with study design data in an Oracle database. It also integrates popular genetic analysis tools, such as PLINK and Haploview. SNPTrack provides genetic analysis capabilities and captures analysis results in its database as SNP lists that can be cross-linked for biological interpretation to gene/protein annotations, Gene Ontology, and pathway analysis data. With SNPTrack, users can do the entire stream of bioinformatics jobs for genetic association studies. SNPTrack is freely available to the public at http://www.fda.gov/ScienceResearch/BioinformaticsTools/SNPTrack/default.htm. PMID:23245293

  6. NGS for the Masses: Empowering Biologists to Improve Bioinformatics Productivity ( 7th Annual SFAF Meeting, 2012)

    SciTech Connect

    Qaadri, Kashef

    2012-06-01

    Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  7. NGS for the Masses: Empowering Biologists to Improve Bioinformatics Productivity ( 7th Annual SFAF Meeting, 2012)

    ScienceCinema

    Qaadri, Kashef [Biomatters

    2013-03-22

    Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  8. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    NASA Astrophysics Data System (ADS)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were

  9. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  10. Bioinformatic challenges in targeted proteomics.

    PubMed

    Reker, Daniel; Malmström, Lars

    2012-09-01

    Selected reaction monitoring mass spectrometry is an emerging targeted proteomics technology that allows for the investigation of complex protein samples with high sensitivity and efficiency. It requires extensive knowledge about the sample for the many parameters needed to carry out the experiment to be set appropriately. Most studies today rely on parameter estimation from prior studies, public databases, or from measuring synthetic peptides. This is efficient and sound, but in absence of prior data, de novo parameter estimation is necessary. Computational methods can be used to create an automated framework to address this problem. However, the number of available applications is still small. This review aims at giving an orientation on the various bioinformatical challenges. To this end, we state the problems in classical machine learning and data mining terms, give examples of implemented solutions and provide some room for alternatives. This will hopefully lead to an increased momentum for the development of algorithms and serve the needs of the community for computational methods. We note that the combination of such methods in an assisted workflow will ease both the usage of targeted proteomics in experimental studies as well as the further development of computational approaches. PMID:22866949

  11. Adenovirus type 5 E4 Orf3 protein targets promyelocytic leukaemia (PML) protein nuclear domains for disruption via a sequence in PML isoform II that is predicted as a protein interaction site by bioinformatic analysis.

    PubMed

    Leppard, Keith N; Emmott, Edward; Cortese, Marc S; Rich, Tina

    2009-01-01

    Human adenovirus type 5 infection causes the disruption of structures in the cell nucleus termed promyelocytic leukaemia (PML) protein nuclear domains or ND10, which contain the PML protein as a critical component. This disruption is achieved through the action of the viral E4 Orf3 protein, which forms track-like nuclear structures that associate with the PML protein. This association is mediated by a direct interaction of Orf3 with a specific PML isoform, PMLII. We show here that the Orf3 interaction properties of PMLII are conferred by a 40 aa residue segment of the unique C-terminal domain of the protein. This segment was sufficient to confer interaction on a heterologous protein. The analysis was informed by prior application of a bioinformatic tool for the prediction of potential protein interaction sites within unstructured protein sequences (predictors of naturally disordered region analysis; PONDR). This tool predicted three potential molecular recognition elements (MoRE) within the C-terminal domain of PMLII, one of which was found to form the core of the Orf3 interaction site, thus demonstrating the utility of this approach. The sequence of the mapped Orf3-binding site on PML protein was found to be relatively poorly conserved across other species; however, the overall organization of MoREs within unstructured sequence was retained, suggesting the potential for conservation of functional interactions. PMID:19088278

  12. Bioinformatics tools for small genomes, such as hepatitis B virus.

    PubMed

    Bell, Trevor G; Kramvis, Anna

    2015-02-01

    DNA sequence analysis is undertaken in many biological research laboratories. The workflow consists of several steps involving the bioinformatic processing of biological data. We have developed a suite of web-based online bioinformatic tools to assist with processing, analysis and curation of DNA sequence data. Most of these tools are genome-agnostic, with two tools specifically designed for hepatitis B virus sequence data. Tools in the suite are able to process sequence data from Sanger sequencing, ultra-deep amplicon resequencing (pyrosequencing) and chromatograph (trace files), as appropriate. The tools are available online at no cost and are aimed at researchers without specialist technical computer knowledge. The tools can be accessed at http://hvdr.bioinf.wits.ac.za/SmallGenomeTools, and the source code is available online at https://github.com/DrTrevorBell/SmallGenomeTools. PMID:25690798

  13. Bioinformatics in Italy: BITS2011, the Eighth Annual Meeting of the Italian Society of Bioinformatics

    PubMed Central

    2012-01-01

    The BITS2011 meeting, held in Pisa on June 20-22, 2011, brought together more than 120 Italian researchers working in the field of Bioinformatics, as well as students in Bioinformatics, Computational Biology, Biology, Computer Sciences, and Engineering, representing a landscape of Italian bioinformatics research. This preface provides a brief overview of the meeting and introduces the peer-reviewed manuscripts that were accepted for publication in this Supplement. PMID:22536954

  14. Structural bioinformatics of the human spliceosomal proteome

    PubMed Central

    Korneta, Iga; Magnus, Marcin; Bujnicki, Janusz M.

    2012-01-01

    In this work, we describe the results of a comprehensive structural bioinformatics analysis of the spliceosomal proteome. We used fold recognition analysis to complement prior data on the ordered domains of 252 human splicing proteins. Examples of newly identified domains include a PWI domain in the U5 snRNP protein 200K (hBrr2, residues 258–338), while examples of previously known domains with a newly determined fold include the DUF1115 domain of the U4/U6 di-snRNP protein 90K (hPrp3, residues 540–683). We also established a non-redundant set of experimental models of spliceosomal proteins, as well as constructed in silico models for regions without an experimental structure. The combined set of structural models is available for download. Altogether, over 90% of the ordered regions of the spliceosomal proteome can be represented structurally with a high degree of confidence. We analyzed the reduced spliceosomal proteome of the intron-poor organism Giardia lamblia, and as a result, we proposed a candidate set of ordered structural regions necessary for a functional spliceosome. The results of this work will aid experimental and structural analyses of the spliceosomal proteins and complexes, and can serve as a starting point for multiscale modeling of the structure of the entire spliceosome. PMID:22573172

  15. Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives.

    PubMed

    Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele

    2014-01-01

    The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the "glue" for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202

  16. Managing, Analysing, and Integrating Big Data in Medical Bioinformatics: Open Problems and Future Perspectives

    PubMed Central

    Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele

    2014-01-01

    The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202

  17. Bioinformatics: Current practice and future challenges for life science education.

    PubMed

    Hack, Catherine; Kendall, Gary

    2005-03-01

    It is widely predicted that the application of high-throughput technologies to the quantification and identification of biological molecules will cause a paradigm shift in the life sciences. However, if the biosciences are to evolve from a predominantly descriptive discipline to an information science, practitioners will require enhanced skills in mathematics, computing, and statistical analysis. Universities have responded to the widely perceived skills gap primarily by developing masters programs in bioinformatics, resulting in a rapid expansion in the provision of postgraduate bioinformatics education. There is, however, a clear need to improve the quantitative and analytical skills of life science undergraduates. This article reviews the response of academia in the United Kingdom and proposes the learning outcomes that graduates should achieve to cope with the new biology. While the analysis discussed here uses the development of bioinformatics education in the United Kingdom as an illustrative example, it is hoped that the issues raised will resonate with all those involved in curriculum development in the life sciences. PMID:21638550

  18. Identifiying human MHC supertypes using bioinformatic methods.

    PubMed

    Doytchinova, Irini A; Guan, Pingping; Flower, Darren R

    2004-04-01

    Classification of MHC molecules into supertypes in terms of peptide-binding specificities is an important issue, with direct implications for the development of epitope-based vaccines with wide population coverage. In view of extremely high MHC polymorphism (948 class I and 633 class II HLA alleles) the experimental solution of this task is presently impossible. In this study, we describe a bioinformatics strategy for classifying MHC molecules into supertypes using information drawn solely from three-dimensional protein structure. Two chemometric techniques-hierarchical clustering and principal component analysis-were used independently on a set of 783 HLA class I molecules to identify supertypes based on structural similarities and molecular interaction fields calculated for the peptide binding site. Eight supertypes were defined: A2, A3, A24, B7, B27, B44, C1, and C4. The two techniques gave 77% consensus, i.e., 605 HLA class I alleles were classified in the same supertype by both methods. The proposed strategy allowed "supertype fingerprints" to be identified. Thus, the A2 supertype fingerprint is Tyr(9)/Phe(9), Arg(97), and His(114) or Tyr(116); the A3-Tyr(9)/Phe(9)/Ser(9), Ile(97)/Met(97) and Glu(114) or Asp(116); the A24-Ser(9) and Met(97); the B7-Asn(63) and Leu(81); the B27-Glu(63) and Leu(81); for B44-Ala(81); the C1-Ser(77); and the C4-Asn(77). PMID:15034046

  19. Comparative Proteomic and Bioinformatic Analysis of the Effects of a High-Grain Diet on the Hepatic Metabolism in Lactating Dairy Goats

    PubMed Central

    Jiang, Xueyuan; Zeng, Tao; Zhang, Shukun; Zhang, Yuanshu

    2013-01-01

    To gain insight on the impart of high-grain diets on liver metabolism in ruminants, we employed a comparative proteomic approach to investigate the proteome-wide effects of diet in lactating dairy goats by conducting a proteomic analysis of the liver extracts of 10 lactating goats fed either a control diet or a high-grain diet. More than 500 protein spots were detected per condition by two-dimensional electrophoresis (2-DE). In total, 52 differentially expressed spots (≥2.0-fold changed) were excised and analyzed using MALDI TOF/TOF. Fifty-one protein spots were successfully identified. Of these, 29 proteins were upregulated, while 22 were downregulated in the high-grain fed vs. control animals. Differential expressions of proteins including alpha enolase, elongation factor 2, calreticulin, cytochrome b5, apolipoprotein A-I, catalase, was verified by mRNA analysis and/or Western blotting. Database searches combined with Gene Ontology (GO) analysis and KEGG pathway analysis revealed that the high-grain diet resulted in altered expression of proteins related to amino acids metabolism. These results suggest new candidate proteins that may contribute to a better understanding of the signaling pathways and mechanisms that mediate liver adaptation to high-grain diet. PMID:24260456

  20. Quantum Bio-Informatics IV

    NASA Astrophysics Data System (ADS)

    Accardi, Luigi; Freudenberg, Wolfgang; Ohya, Masanori

    2011-01-01

    The QP-DYN algorithms / L. Accardi, M. Regoli and M. Ohya -- Study of transcriptional regulatory network based on Cis module database / S. Akasaka ... [et al.] -- On Lie group-Lie algebra correspondences of unitary groups in finite von Neumann algebras / H. Ando, I. Ojima and Y. Matsuzawa -- On a general form of time operators of a Hamiltonian with purely discrete spectrum / A. Arai -- Quantum uncertainty and decision-making in game theory / M. Asano ... [et al.] -- New types of quantum entropies and additive information capacities / V. P. Belavkin -- Non-Markovian dynamics of quantum systems / D. Chruscinski and A. Kossakowski -- Self-collapses of quantum systems and brain activities / K.-H. Fichtner ... [et al.] -- Statistical analysis of random number generators / L. Accardi and M. Gabler -- Entangled effects of two consecutive pairs in residues and its use in alignment / T. Ham, K. Sato and M. Ohya -- The passage from digital to analogue in white noise analysis and applications / T. Hida -- Remarks on the degree of entanglement / D. Chruscinski ... [et al.] -- A completely discrete particle model derived from a stochastic partial differential equation by point systems / K.-H. Fichtner, K. Inoue and M. Ohya -- On quantum algorithm for exptime problem / S. Iriyama and M. Ohya -- On sufficient algebraic conditions for identification of quantum states / A. Jamiolkowski -- Concurrence and its estimations by entanglement witnesses / J. Jurkowski -- Classical wave model of quantum-like processing in brain / A. Khrennikov -- Entanglement mapping vs. quantum conditional probability operator / D. Chruscinski ... [et al.] -- Constructing multipartite entanglement witnesses / M. Michalski -- On Kadison-Schwarz property of quantum quadratic operators on M[symbol](C) / F. Mukhamedov and A. Abduganiev -- On phase transitions in quantum Markov chains on Cayley Tree / L. Accardi, F. Mukhamedov and M. Saburov -- Space(-time) emergence as symmetry breaking effect / I. Ojima

  1. [Post-translational modification (PTM) bioinformatics in China: progresses and perspectives].

    PubMed

    Zexian, Liu; Yudong, Cai; Xuejiang, Guo; Ao, Li; Tingting, Li; Jianding, Qiu; Jian, Ren; Shaoping, Shi; Jiangning, Song; Minghui, Wang; Lu, Xie; Yu, Xue; Ziding, Zhang; Xingming, Zhao

    2015-07-01

    Post-translational modifications (PTMs) are essential for regulating conformational changes, activities and functions of proteins, and are involved in almost all cellular pathways and processes. Identification of protein PTMs is the basis for understanding cellular and molecular mechanisms. In contrast with labor-intensive and time-consuming experiments, the PTM prediction using various bioinformatics approaches can provide accurate, convenient, and efficient strategies and generate valuable information for further experimental consideration. In this review, we summarize the current progresses made by Chineses bioinformaticians in the field of PTM Bioinformatics, including the design and improvement of computational algorithms for predicting PTM substrates and sites, design and maintenance of online and offline tools, establishment of PTM-related databases and resources, and bioinformatics analysis of PTM proteomics data. Through comparing similar studies in China and other countries, we demonstrate both advantages and limitations of current PTM bioinformatics as well as perspectives for future studies in China. PMID:26351162

  2. Identification of microRNA-regulated pathways using an integration of microRNA-mRNA microarray and bioinformatics analysis in CD34+ cells of myelodysplastic syndromes.

    PubMed

    Xu, Feng; Zhu, Yang; He, Qi; Wu, Ling-Yun; Zhang, Zheng; Shi, Wen-Hui; Liu, Li; Chang, Chun-Kang; Li, Xiao

    2016-01-01

    The effect of microRNA (miRNA) and targeted mRNA on signal transduction is not fully understood in myelodysplastic syndromes (MDS). Here, we tried to identify the miRNAs-regulated pathways through a combination of miRNA and mRNA microarray in CD34+ cells from MDS patients. We identified 34 differentially expressed miRNAs and 1783 mRNAs in MDS. 25 dysregulated miRNAs and 394 targeted mRNAs were screened by a combination of Pearson's correlation analysis and software prediction. Pathway analysis showed that several pathways such as Notch, PI3K/Akt might be regulated by those miRNA-mRNAs pairs. Through a combination of Pathway and miRNA-Gene or GO-Network analysis, miRNAs-regulated pathways, such as miR-195-5p/DLL1/Notch signaling pathway, were identified. Further qRT-PCR showed that miR-195-5p was up-regulated while DLL1 was down-regulated in patients with low-grade MDS compared with normal controls. Luciferase assay showed that DLL1 was a direct target of miR-195-5p. Overexpression of miR-195-5p led to increased cell apoptosis and reduced cell growth through inhibition of Notch signaling pathway. In conclusion, alteration expression of miRNAs and targeted mRNAs might have an important impact on cancer-related cellular pathways in MDS. Inhibition of Notch signaling pathway by miR-195-5p-DLL1 axis contributes to the excess apoptosis in low-grade MDS. PMID:27571714

  3. Expression profile analysis of long noncoding RNA in HER-2-enriched subtype breast cancer by next-generation sequencing and bioinformatics

    PubMed Central

    Yang, Fan; Lyu, Shixu; Dong, Siyang; Liu, Yehuan; Zhang, Xiaohua; Wang, Ouchen

    2016-01-01

    Background Human epidermal growth factor receptor 2 (HER-2)-enriched subtype breast cancer is associated with a more aggressive phenotype and shorter survival time. Long non-coding RNAs (LncRNAs) have essential roles in tumorigenesis and occupy a central place in cancer progression. Notably, few studies have focused on the dysregulation of LncRNAs in the HER-2-enriched subtype breast cancer. In this study, we analyzed the expression profile of LncRNAs and mRNAs in this particular subtype of breast cancer. Methods Seven pairs of HER-2-enriched subtype breast cancer and normal tissue were sequenced. We screened out differently expressed genes and measured the correlation of the expression levels of dysregulated LncRNAs and HER-2 by Pearson’s correlation coefficient analysis. Gene ontology analysis and pathway analysis were used to understand the biological roles of these differently expressed genes. Pathway act network and coexpression network were constructed. Results More than 1,300 LncRNAs and 2,800 mRNAs, which were significantly differently expressed, were identified. Among these LncRNAs, AFAP1-AS1 was the most dysregulated LncRNA, while ORM2 was the most dysregulated mRNA. LOC100288637 had the highest positive correlation coefficient of 0.93 with HER-2, while RPL13P5 had the highest negative correlation coefficient of −0.87. The pathway act network showed that MAPK signaling pathway, PI3K-Akt signaling pathway, metabolic pathways, cell cycle, and regulation of actin cytoskeleton were highly related with HER-2-enriched subtype breast cancer. Coexpression network recognized LINC00636, LINC01405, ADARB2-AS1, ST8SIA6-AS1, LINC00511, and DPP10-AS1 as core genes. Conclusion These results analyze the functions of LncRNAs and provide useful information for exploring candidate therapeutic targets and new molecular biomarkers for HER-2-enriched subtype breast cancer. PMID:26929647

  4. Identification of microRNA-regulated pathways using an integration of microRNA-mRNA microarray and bioinformatics analysis in CD34+ cells of myelodysplastic syndromes

    PubMed Central

    Xu, Feng; Zhu, Yang; He, Qi; Wu, Ling-Yun; Zhang, Zheng; Shi, Wen-Hui; Liu, Li; Chang, Chun-Kang; Li, Xiao

    2016-01-01

    The effect of microRNA (miRNA) and targeted mRNA on signal transduction is not fully understood in myelodysplastic syndromes (MDS). Here, we tried to identify the miRNAs-regulated pathways through a combination of miRNA and mRNA microarray in CD34+ cells from MDS patients. We identified 34 differentially expressed miRNAs and 1783 mRNAs in MDS. 25 dysregulated miRNAs and 394 targeted mRNAs were screened by a combination of Pearson’s correlation analysis and software prediction. Pathway analysis showed that several pathways such as Notch, PI3K/Akt might be regulated by those miRNA-mRNAs pairs. Through a combination of Pathway and miRNA-Gene or GO-Network analysis, miRNAs-regulated pathways, such as miR-195-5p/DLL1/Notch signaling pathway, were identified. Further qRT-PCR showed that miR-195-5p was up-regulated while DLL1 was down-regulated in patients with low-grade MDS compared with normal controls. Luciferase assay showed that DLL1 was a direct target of miR-195-5p. Overexpression of miR-195-5p led to increased cell apoptosis and reduced cell growth through inhibition of Notch signaling pathway. In conclusion, alteration expression of miRNAs and targeted mRNAs might have an important impact on cancer-related cellular pathways in MDS. Inhibition of Notch signaling pathway by miR-195-5p-DLL1 axis contributes to the excess apoptosis in low-grade MDS. PMID:27571714

  5. A Guide to Bioinformatics for Immunologists

    PubMed Central

    Whelan, Fiona J.; Yap, Nicholas V. L.; Surette, Michael G.; Golding, G. Brian; Bowdish, Dawn M. E.

    2013-01-01

    Bioinformatics includes a suite of methods, which are cheap, approachable, and many of which are easily accessible without any sort of specialized bioinformatic training. Yet, despite this, bioinformatic tools are under-utilized by immunologists. Herein, we review a representative set of publicly available, easy-to-use bioinformatic tools using our own research on an under-annotated human gene, SCARA3, as an example. SCARA3 shares an evolutionary relationship with the class A scavenger receptors, but preliminary research showed that it was divergent enough that its function remained unclear. In our quest for more information about this gene – did it share gene sequence similarities to other scavenger receptors? Did it contain conserved protein domains? Where was it expressed in the human body? – we discovered the power and informative potential of publicly available bioinformatic tools designed for the novice in mind, which allowed us to hypothesize on the regulation, structure, and function of this protein. We argue that these tools are largely applicable to many facets of immunology research. PMID:24363654

  6. Carving a niche: establishing bioinformatics collaborations

    PubMed Central

    Lyon, Jennifer A.; Tennant, Michele R.; Messner, Kevin R.; Osterbur, David L.

    2006-01-01

    Objectives: The paper describes collaborations and partnerships developed between library bioinformatics programs and other bioinformatics-related units at four academic institutions. Methods: A call for information on bioinformatics partnerships was made via email to librarians who have participated in the National Center for Biotechnology Information's Advanced Workshop for Bioinformatics Information Specialists. Librarians from Harvard University, the University of Florida, the University of Minnesota, and Vanderbilt University responded and expressed willingness to contribute information on their institutions, programs, services, and collaborating partners. Similarities and differences in programs and collaborations were identified. Results: The four librarians have developed partnerships with other units on their campuses that can be categorized into the following areas: knowledge management, instruction, and electronic resource support. All primarily support freely accessible electronic resources, while other campus units deal with fee-based ones. These demarcations are apparent in resource provision as well as in subsequent support and instruction. Conclusions and Recommendations: Through environmental scanning and networking with colleagues, librarians who provide bioinformatics support can develop fruitful collaborations. Visibility is key to building collaborations, as is broad-based thinking in terms of potential partners. PMID:16888668

  7. Isolation, characterization, and bioinformatic analysis of calmodulin-binding protein cmbB reveals a novel tandem IP22 repeat common to many Dictyostelium and Mimivirus proteins.

    PubMed

    O'Day, Danton H; Suhre, Karsten; Myre, Michael A; Chatterjee-Chakraborty, Munmun; Chavez, Sara E

    2006-08-01

    A novel calmodulin-binding protein cmbB from Dictyostelium discoideum is encoded in a single gene. Northern analysis reveals two cmbB transcripts first detectable at 4 h during multicellular development. Western blotting detects an approximately 46.6 kDa protein. Sequence analysis and calmodulin-agarose binding studies identified a "classic" calcium-dependent calmodulin-binding domain (179IPKSLRSLFLGKGYNQPLEF198) but structural analyses suggest binding may not involve classic alpha-helical calmodulin-binding. The cmbB protein is comprised of tandem repeats of a newly identified IP22 motif ([I,L]Pxxhxxhxhxxxhxxxhxxxx; where h = any hydrophobic amino acid) that is highly conserved and a more precise representation of the FNIP repeat. At least eight Acanthamoeba polyphaga Mimivirus proteins and over 100 Dictyostelium proteins contain tandem arrays of the IP22 motif and its variants. cmbB also shares structural homology to YopM, from the plague bacterium Yersenia pestis. PMID:16777069

  8. Bioinformatic analysis reveals a pattern of STAT3-associated gene expression specific to basal-like breast cancers in human tumors

    PubMed Central

    Tell, Robert W.; Horvath, Curt M.

    2014-01-01

    Signal transducer and activator of transcription 3 (STAT3), a latent transcription factor associated with inflammatory signaling and innate and adaptive immune responses, is known to be aberrantly activated in a wide variety of cancers. In vitro analysis of STAT3 in human cancer cell lines has elucidated a number of specific targets associated with poor prognosis in breast cancer. However, to date, no comparison of cancer subtype and gene expression associated with STAT3 signaling in human patients has been reported. In silico analysis of human breast cancer microarray and reverse-phase protein array data was performed to identify expression patterns associated with STAT3 in basal-like and luminal breast cancers. Results indicate clearly identifiable STAT3-regulated signatures common to basal-like breast cancers but not to luminal A or luminal B cancers. Furthermore, these differentially expressed genes are associated with immune signaling and inflammation, a known phenotype of basal-like cancers. These findings demonstrate a distinct role for STAT3 signaling in basal breast cancers, and underscore the importance of considering subtype-specific molecular pathways that contribute to tissue-specific cancers. PMID:25139989

  9. Bioinformatics and Microarray Analysis of miRNAs in Aged Female Mice Model Implied New Molecular Mechanisms for Impaired Fracture Healing

    PubMed Central

    He, Bing; Zhang, Zong-Kang; Liu, Jin; He, Yi-Xin; Tang, Tao; Li, Jie; Guo, Bao-Sheng; Lu, Ai-Ping; Zhang, Bao-Ting; Zhang, Ge

    2016-01-01

    Impaired fracture healing in aged females is still a challenge in clinics. MicroRNAs (miRNAs) play important roles in fracture healing. This study aims to identify the miRNAs that potentially contribute to the impaired fracture healing in aged females. Transverse femoral shaft fractures were created in adult and aged female mice. At post-fracture 0-, 2- and 4-week, the fracture sites were scanned by micro computed tomography to confirm that the fracture healing was impaired in aged female mice and the fracture calluses were collected for miRNA microarray analysis. A total of 53 significantly differentially expressed miRNAs and 5438 miRNA-target gene interactions involved in bone fracture healing were identified. A novel scoring system was designed to analyze the miRNA contribution to impaired fracture healing (RCIFH). Using this method, 11 novel miRNAs were identified to impair fracture healing at 2- or 4-week post-fracture. Thereafter, function analysis of target genes was performed for miRNAs with high RCIFH values. The results showed that high RCIFH miRNAs in aged female mice might impair fracture healing not only by down-regulating angiogenesis-, chondrogenesis-, and osteogenesis-related pathways, but also by up-regulating osteoclastogenesis-related pathway, which implied the essential roles of these high RCIFH miRNAs in impaired fracture healing in aged females, and might promote the discovery of novel therapeutic strategies. PMID:27527150

  10. A Web-based assessment of bioinformatics end-user support services at US universities

    PubMed Central

    Messersmith, Donna J.; Benson, Dennis A.; Geer, Renata C.

    2006-01-01

    Objectives: This study was conducted to gauge the availability of bioinformatics end-user support services at US universities and to identify the providers of those services. The study primarily focused on the availability of short-term workshops that introduce users to molecular biology databases and analysis software. Methods: Websites of selected US universities were reviewed to determine if bioinformatics educational workshops were offered, and, if so, what organizational units in the universities provided them. Results: Of 239 reviewed universities, 72 (30%) offered bioinformatics educational workshops. These workshops were located at libraries (N = 15), bioinformatics centers (N = 38), or other facilities (N = 35). No such training was noted on the sites of 167 universities (70%). Of the 115 bioinformatics centers identified, two-thirds did not offer workshops. Conclusions: This analysis of university Websites indicates that a gap may exist in the availability of workshops and related training to assist researchers in the use of bioinformatics resources, representing a potential opportunity for libraries and other facilities to provide training and assistance for this growing user group. PMID:16888663

  11. GOBLET: The Global Organisation for Bioinformatics Learning, Education and Training

    PubMed Central

    Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.

    2015-01-01

    In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076

  12. Using Grid technology for computationally intensive applied bioinformatics analyses.

    PubMed

    Andrade, Jorge; Berglund, Lisa; Uhlén, Mathias; Odeberg, Jacob

    2006-01-01

    For several applications and algorithms used in applied bioinformatics, a bottle neck in terms of computational time may arise when scaled up to facilitate analyses of large datasets and databases. Re-codification, algorithm modification or sacrifices in sensitivity and accuracy may be necessary to accommodate for limited computational capacity of single work stations. Grid computing offers an alternative model for solving massive computational problems by parallel execution of existing algorithms and software implementations. We present the implementation of a Grid-aware model for solving computationally intensive bioinformatic analyses exemplified by a blastp sliding window algorithm for whole proteome sequence similarity analysis, and evaluate the performance in comparison with a local cluster and a single workstation. Our strategy involves temporary installations of the BLAST executable and databases on remote nodes at submission, accommodating for dynamic Grid environments as it avoids the need of predefined runtime environments (preinstalled software and databases at specific Grid-nodes). Importantly, the implementation is generic where the BLAST executable can be replaced by other software tools to facilitate analyses suitable for parallelisation. This model should be of general interest in applied bioinformatics. Scripts and procedures are freely available from the authors. PMID:17518760

  13. Best practices in bioinformatics training for life scientists.

    PubMed

    Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K

    2013-09-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301

  14. GOBLET: the Global Organisation for Bioinformatics Learning, Education and Training.

    PubMed

    Attwood, Teresa K; Atwood, Teresa K; Bongcam-Rudloff, Erik; Brazas, Michelle E; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M; Schneider, Maria Victoria; van Gelder, Celia W G

    2015-04-01

    In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy--paradoxically, many are actually closing "niche" bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076

  15. A primer to frequent itemset mining for bioinformatics.

    PubMed

    Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart; Laukens, Kris

    2015-03-01

    Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173

  16. Best practices in bioinformatics training for life scientists

    PubMed Central

    Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D.; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L.; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C.; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K.

    2013-01-01

    The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301

  17. A primer to frequent itemset mining for bioinformatics

    PubMed Central

    Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart

    2015-01-01

    Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173

  18. Wnt-signalling pathways and microRNAs network in carcinogenesis: experimental and bioinformatics approaches.

    PubMed

    Onyido, Emenike K; Sweeney, Eloise; Nateri, Abdolrahman Shams

    2016-01-01

    Over the past few years, microRNAs (miRNAs) have not only emerged as integral regulators of gene expression at the post-transcriptional level but also respond to signalling molecules to affect cell function(s). miRNAs crosstalk with a variety of the key cellular signalling networks such as Wnt, transforming growth factor-β and Notch, control stem cell activity in maintaining tissue homeostasis, while if dysregulated contributes to the initiation and progression of cancer. Herein, we overview the molecular mechanism(s) underlying the crosstalk between Wnt-signalling components (canonical and non-canonical) and miRNAs, as well as changes in the miRNA/Wnt-signalling components observed in the different forms of cancer. Furthermore, the fundamental understanding of miRNA-mediated regulation of Wnt-signalling pathway and vice versa has been significantly improved by high-throughput genomics and bioinformatics technologies. Whilst, these approaches have identified a number of specific miRNA(s) that function as oncogenes or tumour suppressors, additional analyses will be necessary to fully unravel the links among conserved cellular signalling pathways and miRNAs and their potential associated components in cancer, thereby creating therapeutic avenues against tumours. Hence, we also discuss the current challenges associated with Wnt-signalling/miRNAs complex and the analysis using the biomedical experimental and bioinformatics approaches. PMID:27590724

  19. GITIRBio: A Semantic and Distributed Service Oriented-Architecture for Bioinformatics Pipeline.

    PubMed

    Castillo, Luis F; López-Gartner, Germán; Isaza, Gustavo A; Sánchez, Mariana; Arango, Jeferson; Agudelo-Valencia, Daniel; Castaño, Sergio

    2015-01-01

    The need to process large quantities of data generated from genomic sequencing has resulted in a difficult task for life scientists who are not familiar with the use of command-line operations or developments in high performance computing and parallelization. This knowledge gap, along with unfamiliarity with necessary processes, can hinder the execution of data processing tasks. Furthermore, many of the commonly used bioinformatics tools for the scientific community are presented as isolated, unrelated entities that do not provide an integrated, guided, and assisted interaction with the scheduling facilities of computational resources or distribution, processing and mapping with runtime analysis. This paper presents the first approximation of a Web Services platform-based architecture (GITIRBio) that acts as a distributed front-end system for autonomous and assisted processing of parallel bioinformatics pipelines that has been validated using multiple sequences. Additionally, this platform allows integration with semantic repositories of genes for search annotations. GITIRBio is available at: http://c-head.ucaldas.edu.co:8080/gitirbio. PMID:26527189

  20. PyDPI: freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies.

    PubMed

    Cao, Dong-Sheng; Liang, Yi-Zeng; Yan, Jun; Tan, Gui-Shan; Xu, Qing-Song; Liu, Shao

    2013-11-25

    The rapidly increasing amount of publicly available data in biology and chemistry enables researchers to revisit interaction problems by systematic integration and analysis of heterogeneous data. Herein, we developed a comprehensive python package to emphasize the integration of chemoinformatics and bioinformatics into a molecular informatics platform for drug discovery. PyDPI (drug-protein interaction with Python) is a powerful python toolkit for computing commonly used structural and physicochemical features of proteins and peptides from amino acid sequences, molecular descriptors of drug molecules from their topology, and protein-protein interaction and protein-ligand interaction descriptors. It computes 6 protein feature groups composed of 14 features that include 52 descriptor types and 9890 descriptors, 9 drug feature groups composed of 13 descriptor types that include 615 descriptors. In addition, it provides seven types of molecular fingerprint systems for drug molecules, including topological fingerprints, electro-topological state (E-state) fingerprints, MACCS keys, FP4 keys, atom pair fingerprints, topological torsion fingerprints, and Morgan/circular fingerprints. By combining different types of descriptors from drugs and proteins in different ways, interaction descriptors representing protein-protein or drug-protein interactions could be conveniently generated. These computed descriptors can be widely used in various fields relevant to chemoinformatics, bioinformatics, and chemogenomics. PyDPI is freely available via https://sourceforge.net/projects/pydpicao/. PMID:24047419

  1. Determination of the mechanism of action of repetitive halothane exposure on rat brain tissues using a combined method of microarray gene expression profiling and bioinformatics analysis.

    PubMed

    Wang, Jiansheng; Yang, Xiaojun; Xiao, Huan; Kong, Jianqiang; Bing, Miao

    2015-12-01

    The present study aimed to investigate the gene expression profiles of rats brain tissues treated with halothane compared with untreated controls to improve current understanding of the mechanism of action of the inhaled anesthetic. The GSE357 gene expression profile was dowloaded from the Gene Expression Omnibus database, and included six gene chips of samples repeatedly exposed to halothane and 12 gene chips of untreated controls. The differentially expressed genes (DEGs) between these two groups were identified using the Limma package in R language. Subsequently, the Database for Annotation, Visualization and Integrated Discovery was used to annotate the function of these DEGs. In addition, the most significantly upregulated gene and downregulated gene were annotated, to reveal the functional interactions with other associated genes, in FuncBase database. A total of 44 DEGs were obtained between The control and halothane exposure samples. Following Gene Ontology functional classification, these DEGs were found to be involved predominantly in the circulatory system, regulation of cell proliferation and response to endogenous stimulus and corticosteroid stimulus processes. KRT31 and HMGCS2, which were identified as the most significantly downregulated and upregulated DEGs, respectively, were associated with the lipid metabolic process and T cell activation, respectively. These results provided a basis for the development of improved inhalational anesthetics with minimal side effects and are essential for optimization of inhaled anesthetic techniques for advanced surgical procedures. PMID:26497548

  2. Evolutionary and bioinformatic analysis of the spike glycoprotein gene of H120 vaccine strain protectotype of infectious bronchitis virus from India.

    PubMed

    Kamble, Nitin Machindra; Pillai, Aravind S; Gaikwad, Satish S; Shukla, Sanjeev Kumar; Khulape, Sagar Aashok; Dey, Sohini; Mohan, C Madhan

    2016-01-01

    The infectious bronchitis virus is a causative agent of avian infectious bronchitis (AIB), and is is an important disease that produces severe economic losses to the poultry industry worldwide. Recent AIB outbreaks in India have been associated with poor growth in broilers, drop in egg production, and thin egg shells in layers. The complete spike gene of Indian AIB vaccine strain was amplified and sequenced using a conventional reverse transcription polymerase chain reaction and is submitted to the GenBank (accession no KF188436). Phylogenetic analysis revealed that the vaccine strain currently used belongs to H120 genotype, an attenuated strain of Massachusetts (Mass) serotype. Nucleotide and amino acid sequence comparisons have shown that the reported spike gene from Indian isolates have 71.8%-99% and 71.4%-96.9% genetic similarity with the sequenced H120 strain. The study identifies live attenuated IBV vaccine strain, which is routinely used for vaccination, for the first time. Based on nucleotide and amino acid relatedness studies of the vaccine strain with reported IBV sequences from India, it is shown that the current vaccine strain is efficient in controlling the IBV infection. Continuous monitoring of IBV outbreaks by sequencing for genotyping and in vivo cross protection studies for serotyping is not only important for epidemiological investigation but also for evaluation of efficacy of the current vaccine. PMID:25311758

  3. Bioinformatic Analysis of Plasma Apolipoproteins A-I and A-II Revealed Unique Features of A-I/A-II HDL Particles in Human Plasma

    PubMed Central

    Kido, Toshimi; Kurata, Hideaki; Kondo, Kazuo; Itakura, Hiroshige; Okazaki, Mitsuyo; Urata, Takeyoshi; Yokoyama, Shinji

    2016-01-01

    Plasma concentration of apoA-I, apoA-II and apoA-II-unassociated apoA-I was analyzed in 314 Japanese subjects (177 males and 137 females), including one (male) homozygote and 37 (20 males and 17 females) heterozygotes of genetic CETP deficiency. ApoA-I unassociated with apoA-II markedly and linearly increased with HDL-cholesterol, while apoA-II increased only very slightly and the ratio of apoA-II-associated apoA-I to apoA-II stayed constant at 2 in molar ratio throughout the increase of HDL-cholesterol, among the wild type and heterozygous CETP deficiency. Thus, overall HDL concentration almost exclusively depends on HDL with apoA-I without apoA-II (LpAI) while concentration of HDL containing apoA-I and apoA-II (LpAI:AII) is constant having a fixed molar ratio of 2 : 1 regardless of total HDL and apoA-I concentration. Distribution of apoA-I between LpAI and LpAI:AII is consistent with a model of statistical partitioning regardless of sex and CETP genotype. The analysis also indicated that LpA-I accommodates on average 4 apoA-I molecules and has a clearance rate indistinguishable from LpAI:AII. Independent evidence indicated LpAI:A-II has a diameter 20% smaller than LpAI, consistent with a model having two apoA-I and one apoA-II. The functional contribution of these particles is to be investigated. PMID:27526664

  4. Bioinformatic Analysis of Plasma Apolipoproteins A-I and A-II Revealed Unique Features of A-I/A-II HDL Particles in Human Plasma.

    PubMed

    Kido, Toshimi; Kurata, Hideaki; Kondo, Kazuo; Itakura, Hiroshige; Okazaki, Mitsuyo; Urata, Takeyoshi; Yokoyama, Shinji

    2016-01-01

    Plasma concentration of apoA-I, apoA-II and apoA-II-unassociated apoA-I was analyzed in 314 Japanese subjects (177 males and 137 females), including one (male) homozygote and 37 (20 males and 17 females) heterozygotes of genetic CETP deficiency. ApoA-I unassociated with apoA-II markedly and linearly increased with HDL-cholesterol, while apoA-II increased only very slightly and the ratio of apoA-II-associated apoA-I to apoA-II stayed constant at 2 in molar ratio throughout the increase of HDL-cholesterol, among the wild type and heterozygous CETP deficiency. Thus, overall HDL concentration almost exclusively depends on HDL with apoA-I without apoA-II (LpAI) while concentration of HDL containing apoA-I and apoA-II (LpAI:AII) is constant having a fixed molar ratio of 2 : 1 regardless of total HDL and apoA-I concentration. Distribution of apoA-I between LpAI and LpAI:AII is consistent with a model of statistical partitioning regardless of sex and CETP genotype. The analysis also indicated that LpA-I accommodates on average 4 apoA-I molecules and has a clearance rate indistinguishable from LpAI:AII. Independent evidence indicated LpAI:A-II has a diameter 20% smaller than LpAI, consistent with a model having two apoA-I and one apoA-II. The functional contribution of these particles is to be investigated. PMID:27526664

  5. Systems Biology, Bioinformatics, and Biomarkers in Neuropsychiatry

    PubMed Central

    Alawieh, Ali; Zaraket, Fadi A.; Li, Jian-Liang; Mondello, Stefania; Nokkari, Amaly; Razafsha, Mahdi; Fadlallah, Bilal; Boustany, Rose-Mary; Kobeissy, Firas H.

    2012-01-01

    Although neuropsychiatric (NP) disorders are among the top causes of disability worldwide with enormous financial costs, they can still be viewed as part of the most complex disorders that are of unknown etiology and incomprehensible pathophysiology. The complexity of NP disorders arises from their etiologic heterogeneity and the concurrent influence of environmental and genetic factors. In addition, the absence of rigid boundaries between the normal and diseased state, the remarkable overlap of symptoms among conditions, the high inter-individual and inter-population variations, and the absence of discriminative molecular and/or imaging biomarkers for these diseases makes difficult an accurate diagnosis. Along with the complexity of NP disorders, the practice of psychiatry suffers from a “top-down” method that relied on symptom checklists. Although checklist diagnoses cost less in terms of time and money, they are less accurate than a comprehensive assessment. Thus, reliable and objective diagnostic tools such as biomarkers are needed that can detect and discriminate among NP disorders. The real promise in understanding the pathophysiology of NP disorders lies in bringing back psychiatry to its biological basis in a systemic approach which is needed given the NP disorders’ complexity to understand their normal functioning and response to perturbation. This approach is implemented in the systems biology discipline that enables the discovery of disease-specific NP biomarkers for diagnosis and therapeutics. Systems biology involves the use of sophisticated computer software “omics”-based discovery tools and advanced performance computational techniques in order to understand the behavior of biological systems and identify diagnostic and prognostic biomarkers specific for NP disorders together with new targets of therapeutics. In this review, we try to shed light on the need of systems biology, bioinformatics, and biomarkers in neuropsychiatry, and

  6. Macrobrachium rosenbergii mannose binding lectin: synthesis of MrMBL-N20 and MrMBL-C16 peptides and their antimicrobial characterization, bioinformatics and relative gene expression analysis.

    PubMed

    Arockiaraj, Jesu; Chaurasia, Mukesh Kumar; Kumaresan, Venkatesh; Palanisamy, Rajesh; Harikrishnan, Ramasamy; Pasupuleti, Mukesh; Kasi, Marimuthu

    2015-04-01

    Mannose-binding lectin (MBL), an antimicrobial protein, is an important component of innate immune system which recognizes repetitive sugar groups on the surface of bacteria and viruses leading to activation of the complement system. In this study, we reported a complete molecular characterization of cDNA encoded for MBL from freshwater prawn Macrobrachium rosenbergii (Mr). Two short peptides (MrMBL-N20: (20)AWNTYDYMKREHSLVKPYQG(39) and MrMBL-C16: (307)GGLFYVKHKEQQRKRF(322)) were synthesized from the MrMBL polypeptide. The purity of the MrMBL-N20 (89%) and MrMBL-C16 (93%) peptides were confirmed by MS analysis (MALDI-ToF). The purified peptides were used for further antimicrobial characterization including minimum inhibitory concentration (MIC) assay, kinetics of bactericidal efficiency and analysis of hemolytic capacity. The peptides exhibited antimicrobial activity towards all the Gram-negative bacteria taken for analysis, whereas they showed the activity towards only a few selected Gram-positive bacteria. MrMBL-C16 peptides produced the highest inhibition towards both the Gram-negative and Gram-positive bacteria compared to the MrMBL-N20. Both peptides do not produce any inhibition against Bacillus sps. The kinetics of bactericidal efficiency showed that the peptides drastically reduced the number of surviving bacterial colonies after 24 h incubation. The results of hemolytic activity showed that both peptides produced strong activity at higher concentration. However, MrMBL-C16 peptide produced the highest activity compared to the MrMBL-N20 peptide. Overall, the results indicated that the peptides can be used as bactericidal agents. The MrMBL protein sequence was characterized using various bioinformatics tools including phylogenetic analysis and structure prediction. We also reported the MrMBL gene expression pattern upon viral and bacterial infection in M. rosenbergii gills. It could be concluded that the prawn MBL may be one of the important molecule which

  7. Analysis methods for the determination of anthropogenic additions of P to agricultural soils

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phosphorus additions and measurement in soil is of concern on lands where biosolids have been applied. Colorimetric analysis for plant-available P may be inadequate for the accurate assessment of soil P. Phosphate additions in a regulatory environment need to be accurately assessed as the reported...

  8. Which craft is best in bioinformatics?

    PubMed

    Attwood, T K; Miller, C J

    2001-07-01

    'Silicon-based' biology has gathered momentum as the world-wide sequencing projects have made possible the investigation and comparative analysis of complete genomes. Central to the quest to elucidate and characterise the genes and gene products encoded within genomes are pivotal concepts concerning the processes of evolution, the mechanisms of protein folding, and, crucially, the manifestation of protein function. Our use of computers to model such concepts is limited by, and must be placed in the context of, the current limits of our understanding of these biological processes. It is important to recognise that we do not have a common understanding of what constitutes a gene; we cannot invariably say that a particular sequence or fold has arisen via divergence or convergence; we do not fully understand the rules of protein folding, so we cannot predict protein structure; and we cannot invariably diagnose protein function, given knowledge only of its sequence or structure in isolation. Accepting what we cannot do with computers plays an essential role in forming an appreciation of what we can do. Without this understanding, it is easy to be misled, as spurious arguments are often used to promote over-enthusiastic notions of what particular programs can achieve. There are valuable lessons to be learned here from the field of artificial intelligence, principal among which is the realisation that capturing and representing complex knowledge is time consuming, expensive and hard. If bioinformatics is to tackle biological complexity meaningfully, the road ahead must therefore be paved with caution, rigour and pragmatism. PMID:11459349

  9. Implementing bioinformatic workflows within the bioextract server

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  10. Bioinformatics in Undergraduate Education: Practical Examples

    ERIC Educational Resources Information Center

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  11. SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.

    EPA Science Inventory

    Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...

  12. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    ERIC Educational Resources Information Center

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  13. Bioinformatics: A History of Evolution "In Silico"

    ERIC Educational Resources Information Center

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  14. "Extreme Programming" in a Bioinformatics Class

    ERIC Educational Resources Information Center

    Kelley, Scott; Alger, Christianna; Deutschman, Douglas

    2009-01-01

    The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP). The…

  15. 2010 Translational bioinformatics year in review

    PubMed Central

    Miller, Katharine S

    2011-01-01

    A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future. PMID:21672905

  16. Bioinformatic analysis of peptide precursor proteins.

    PubMed

    Baggerman, G; Liu, F; Wets, G; Schoofs, L

    2005-04-01

    Neuropeptides are among the most important signal molecules in animals. Traditional identification of peptide hormones through peptide purification is a tedious and time-consuming process. With the advent of the genome sequencing projects, putative peptide precursor can be mined from the genome. However, because bioactive peptides are usually quite short in length and because the active core of a peptide is often limited to only a few amino acids, using the BLAST search engine to identify neuropeptide precursors in the genome is difficult and sometimes impossible. To overcome these shortcomings, we subject the entire set of all known Drosophila melanogaster peptide precursor sequences to motif-finding algorithms in search of a motif that is common for all prepropeptides and that could be used in the search for new peptide precursors. PMID:15891006

  17. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  18. Navigating the changing learning landscape: perspective from bioinformatics.ca.

    PubMed

    Brazas, Michelle D; Ouellette, B F Francis

    2013-09-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  19. ebTrack: an environmental bioinformatics system built upon ArrayTrack™

    PubMed Central

    Chen, Minjun; Martin, Jackson; Fang, Hong; Isukapalli, Sastry; Georgopoulos, Panos G; Welsh, William J; Tong, Weida

    2009-01-01

    ebTrack is being developed as an integrated bioinformatics system for environmental research and analysis by addressing the issues of integration, curation, management, first level analysis and interpretation of environmental and toxicological data from diverse sources. It is based on enhancements to the US FDA developed ArrayTrack™ system through additional analysis modules for gene expression data as well as through incorporation and linkages to modules for analysis of proteomic and metabonomic datasets that include tandem mass spectra. ebTrack uses a client-server architecture with the free and open source PostgreSQL as its database engine, and java tools for user interface, analysis, visualization, and web-based deployment. Several predictive tools that are critical for environmental health research are currently supported in ebTrack, including Significance Analysis of Microarray (SAM). Furthermore, new tools are under continuous integration, and interfaces to environmental health risk analysis tools are being developed in order to make ebTrack widely usable. These health risk analysis tools include the Modeling ENvironment for TOtal Risk studies (MENTOR) for source-to-dose exposure modeling and the DOse Response Information ANalysis system (DORIAN) for health outcome modeling. The design of ebTrack is presented in detail and steps involved in its application are summarized through an illustrative application. PMID:19278561

  20. Analysis of CNT additives in porous layered thin film lubrication with electric double layer

    NASA Astrophysics Data System (ADS)

    Rao, T. V. V. L. N.; Rani, A. M. A.; Sufian, S.; Mohamed, N. M.

    2015-07-01

    This paper presents an analysis of thin film lubrication of porous layered carbon nanotubes (CNTs) additive slider bearing with electric double layer. The CNTs additive lubricant flow in the thin fluid film and porous layers are governed by Stokes and Brinkman equations respectively, including electro-kinetic force. The apparent viscosity and nondimensional pressure expression are derived. The nondimensional load capacity increases under the influence of electro-viscosity, CNT additives volume fraction, permeability and thickness of porous layer. A CNTs additive lubricated porous thin film slider bearing with electric double layer provides higher load capacity.

  1. Biochemical, Transcriptional, and Bioinformatic Analysis of Lipid Droplets from Seeds of Date Palm (Phoenix dactylifera L.) and Their Use as Potent Sequestration Agents against the Toxic Pollutant, 2,3,7,8-Tetrachlorinated Dibenzo-p-Dioxin

    PubMed Central

    Hanano, Abdulsamie; Almousally, Ibrahem; Shaban, Mouhnad; Rahman, Farzana; Blee, Elizabeth; Murphy, Denis J.

    2016-01-01

    Contamination of aquatic environments with dioxins, the most toxic group of persistent organic pollutants (POPs), is a major ecological issue. Dioxins are highly lipophilic and bioaccumulate in fatty tissues of marine organisms used for seafood where they constitute a potential risk for human health. Lipid droplets (LDs) purified from date palm, Phoenix dactylifera, seeds were characterized and their capacity to extract dioxins from aquatic systems was assessed. The bioaffinity of date palm LDs toward 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), the most toxic congener of dioxins was determined. Fractioned LDs were spheroidal with mean diameters of 2.5 µm, enclosing an oil-rich core of 392.5 mg mL-1. Isolated LDs did not aggregate and/or coalesce unless placed in acidic media and were strongly associated with three major groups of polypeptides of relative mass 32–37, 20–24, and 16–18 kDa. These masses correspond to the LD-associated proteins, oleosins, caleosins, and steroleosins, respectively. Efficient partitioning of TCDD into LDs occurred with a coefficient of log KLB/w,TCDD = 7.528 ± 0.024; it was optimal at neutral pH and was dependent on the presence of the oil-rich core, but was independent of the presence of LD-associated proteins. Bioinformatic analysis of the date palm genome revealed nine oleosin-like, five caleosin-like, and five steroleosin-like sequences, with predicted structures having putative lipid-binding domains that match their LD stabilizing roles and use as bio-based encapsulation systems. Transcriptomic analysis of date palm seedlings exposed to TCDD showed strong up-regulation of several caleosin and steroleosin genes, consistent with increased LD formation. The results suggest that the plant LDs could be used in ecological remediation strategies to remove POPs from aquatic environments. Recent reports suggest that several fungal and algal species also use LDs to sequester both external and internally derived hydrophobic toxins

  2. Biochemical, Transcriptional, and Bioinformatic Analysis of Lipid Droplets from Seeds of Date Palm (Phoenix dactylifera L.) and Their Use as Potent Sequestration Agents against the Toxic Pollutant, 2,3,7,8-Tetrachlorinated Dibenzo-p-Dioxin.

    PubMed

    Hanano, Abdulsamie; Almousally, Ibrahem; Shaban, Mouhnad; Rahman, Farzana; Blee, Elizabeth; Murphy, Denis J

    2016-01-01

    Contamination of aquatic environments with dioxins, the most toxic group of persistent organic pollutants (POPs), is a major ecological issue. Dioxins are highly lipophilic and bioaccumulate in fatty tissues of marine organisms used for seafood where they constitute a potential risk for human health. Lipid droplets (LDs) purified from date palm, Phoenix dactylifera, seeds were characterized and their capacity to extract dioxins from aquatic systems was assessed. The bioaffinity of date palm LDs toward 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), the most toxic congener of dioxins was determined. Fractioned LDs were spheroidal with mean diameters of 2.5 µm, enclosing an oil-rich core of 392.5 mg mL(-1). Isolated LDs did not aggregate and/or coalesce unless placed in acidic media and were strongly associated with three major groups of polypeptides of relative mass 32-37, 20-24, and 16-18 kDa. These masses correspond to the LD-associated proteins, oleosins, caleosins, and steroleosins, respectively. Efficient partitioning of TCDD into LDs occurred with a coefficient of log K LB/w,TCDD = 7.528 ± 0.024; it was optimal at neutral pH and was dependent on the presence of the oil-rich core, but was independent of the presence of LD-associated proteins. Bioinformatic analysis of the date palm genome revealed nine oleosin-like, five caleosin-like, and five steroleosin-like sequences, with predicted structures having putative lipid-binding domains that match their LD stabilizing roles and use as bio-based encapsulation systems. Transcriptomic analysis of date palm seedlings exposed to TCDD showed strong up-regulation of several caleosin and steroleosin genes, consistent with increased LD formation. The results suggest that the plant LDs could be used in ecological remediation strategies to remove POPs from aquatic environments. Recent reports suggest that several fungal and algal species also use LDs to sequester both external and internally derived hydrophobic toxins, which

  3. Biophysics and bioinformatics of transcription regulation in bacteria and bacteriophages

    NASA Astrophysics Data System (ADS)

    Djordjevic, Marko

    2005-11-01

    Due to rapid accumulation of biological data, bioinformatics has become a very important branch of biological research. In this thesis, we develop novel bioinformatic approaches and aid design of biological experiments by using ideas and methods from statistical physics. Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of the regulatory circuits that control expression of genes. We propose a novel, biophysics based algorithm, for the supervised detection of transcription factor (TF) binding sites. The method classifies potential binding sites by explicitly estimating the sequence-specific binding energy and the chemical potential of a given TF. In contrast with the widely used information theory based weight matrix method, our approach correctly incorporates saturation in the transcription factor/DNA binding probability. This results in a significant reduction in the number of expected false positives, and in the explicit appearance---and determination---of a binding threshold. The new method was used to identify likely genomic binding sites for the Escherichia coli TFs, and to examine the relationship between TF binding specificity and degree of pleiotropy (number of regulatory targets). We next address how parameters of protein-DNA interactions can be obtained from data on protein binding to random oligos under controlled conditions (SELEX experiment data). We show that 'robust' generation of an appropriate data set is achieved by a suitable modification of the standard SELEX procedure, and propose a novel bioinformatic algorithm for analysis of such data. Finally, we use quantitative data analysis, bioinformatic methods and kinetic modeling to analyze gene expression strategies of bacterial viruses. We study bacteriophage Xp10 that infects rice pathogen Xanthomonas oryzae. Xp10 is an unusual bacteriophage, which has morphology and genome organization that most closely

  4. A Bioinformatics Reference Model: Towards a Framework for Developing and Organising Bioinformatic Resources

    NASA Astrophysics Data System (ADS)

    Hiew, Hong Liang; Bellgard, Matthew

    2007-11-01

    Life Science research faces the constant challenge of how to effectively handle an ever-growing body of bioinformatics software and online resources. The users and developers of bioinformatics resources have a diverse set of competing demands on how these resources need to be developed and organised. Unfortunately, there does not exist an adequate community-wide framework to integrate such competing demands. The problems that arise from this include unstructured standards development, the emergence of tools that do not meet specific needs of researchers, and often times a communications gap between those who use the tools and those who supply them. This paper presents an overview of the different functions and needs of bioinformatics stakeholders to determine what may be required in a community-wide framework. A Bioinformatics Reference Model is proposed as a basis for such a framework. The reference model outlines the functional relationship between research usage and technical aspects of bioinformatics resources. It separates important functions into multiple structured layers, clarifies how they relate to each other, and highlights the gaps that need to be addressed for progress towards a diverse, manageable, and sustainable body of resources. The relevance of this reference model to the bioscience research community, and its implications in progress for organising our bioinformatics resources, are discussed.

  5. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  6. YPED: An Integrated Bioinformatics Suite and Database for Mass Spectrometry-based Proteomics Research

    PubMed Central

    Colangelo, Christopher M.; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L.; Carriero, Nicholas J.; Gulcicek, Erol E.; Lam, TuKiet T.; Wu, Terence; Bjornson, Robert D.; Bruce, Can; Nairn, Angus C.; Rinehart, Jesse; Miller, Perry L.; Williams, Kenneth R.

    2015-01-01

    We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry (LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED’s database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. PMID:25712262

  7. Can bioinformatics help in the identification of moonlighting proteins?

    PubMed

    Hernández, Sergio; Calvo, Alejandra; Ferragut, Gabriela; Franco, Luís; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2014-12-01

    Protein multitasking or moonlighting is the capability of certain proteins to execute two or more unique biological functions. This ability to perform moonlighting functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Usually, moonlighting proteins are revealed experimentally by serendipity, and the proteins described probably represent just the tip of the iceberg. It would be helpful if bioinformatics could predict protein multifunctionality, especially because of the large amounts of sequences coming from genome projects. In the present article, we describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. The sequence analysis has been performed: (i) by remote homology searches using PSI-BLAST, (ii) by the detection of functional motifs, and (iii) by the co-evolutionary relationship between amino acids. Programs designed to identify functional motifs/domains are basically oriented to detect the main function, but usually fail in the detection of secondary ones. Remote homology searches such as PSI-BLAST seem to be more versatile in this task, and it is a good complement for the information obtained from protein-protein interaction (PPI) databases. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can be used only in very restricted situations, but can suggest how the evolutionary process of the acquisition of the second function took place. PMID:25399591

  8. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    SciTech Connect

    Taylor, Ronald C.

    2010-12-21

    Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.

  9. A toolbox for developing bioinformatics software

    PubMed Central

    Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M.

    2012-01-01

    Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers. PMID:21803787

  10. A toolbox for developing bioinformatics software.

    PubMed

    Rother, Kristian; Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M

    2012-03-01

    Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers. PMID:21803787

  11. Translational bioinformatics applications in genome medicine

    PubMed Central

    2009-01-01

    Although investigators using methodologies in bioinformatics have always been useful in genomic experimentation in analytic, engineering, and infrastructure support roles, only recently have bioinformaticians been able to have a primary scientific role in asking and answering questions on human health and disease. Here, I argue that this shift in role towards asking questions in medicine is now the next step needed for the field of bioinformatics. I outline four reasons why bioinformaticians are newly enabled to drive the questions in primary medical discovery: public availability of data, intersection of data across experiments, commoditization of methods, and streamlined validation. I also list four recommendations for bioinformaticians wishing to get more involved in translational research. PMID:19566916

  12. Bioinformatics in New Generation Flavivirus Vaccines

    PubMed Central

    Koraka, Penelope; Martina, Byron E. E.; Osterhaus, Albert D. M. E.

    2010-01-01

    Flavivirus infections are the most prevalent arthropod-borne infections world wide, often causing severe disease especially among children, the elderly, and the immunocompromised. In the absence of effective antiviral treatment, prevention through vaccination would greatly reduce morbidity and mortality associated with flavivirus infections. Despite the success of the empirically developed vaccines against yellow fever virus, Japanese encephalitis virus and tick-borne encephalitis virus, there is an increasing need for a more rational design and development of safe and effective vaccines. Several bioinformatic tools are available to support such rational vaccine design. In doing so, several parameters have to be taken into account, such as safety for the target population, overall immunogenicity of the candidate vaccine, and efficacy and longevity of the immune responses triggered. Examples of how bio-informatics is applied to assist in the rational design and improvements of vaccines, particularly flavivirus vaccines, are presented and discussed. PMID:20467477

  13. Discovery and Classification of Bioinformatics Web Services

    SciTech Connect

    Rocco, D; Critchlow, T

    2002-09-02

    The transition of the World Wide Web from a paradigm of static Web pages to one of dynamic Web services provides new and exciting opportunities for bioinformatics with respect to data dissemination, transformation, and integration. However, the rapid growth of bioinformatics services, coupled with non-standardized interfaces, diminish the potential that these Web services offer. To face this challenge, we examine the notion of a Web service class that defines the functionality provided by a collection of interfaces. These descriptions are an integral part of a larger framework that can be used to discover, classify, and wrapWeb services automatically. We discuss how this framework can be used in the context of the proliferation of sites offering BLAST sequence alignment services for specialized data sets.

  14. Validation analysis of probabilistic models of dietary exposure to food additives.

    PubMed

    Gilsenan, M B; Thompson, R L; Lambe, J; Gibney, M J

    2003-10-01

    The validity of a range of simple conceptual models designed specifically for the estimation of food additive intakes using probabilistic analysis was assessed. Modelled intake estimates that fell below traditional conservative point estimates of intake and above 'true' additive intakes (calculated from a reference database at brand level) were considered to be in a valid region. Models were developed for 10 food additives by combining food intake data, the probability of an additive being present in a food group and additive concentration data. Food intake and additive concentration data were entered as raw data or as a lognormal distribution, and the probability of an additive being present was entered based on the per cent brands or the per cent eating occasions within a food group that contained an additive. Since the three model components assumed two possible modes of input, the validity of eight (2(3)) model combinations was assessed. All model inputs were derived from the reference database. An iterative approach was employed in which the validity of individual model components was assessed first, followed by validation of full conceptual models. While the distribution of intake estimates from models fell below conservative intakes, which assume that the additive is present at maximum permitted levels (MPLs) in all foods in which it is permitted, intake estimates were not consistently above 'true' intakes. These analyses indicate the need for more complex models for the estimation of food additive intakes using probabilistic analysis. Such models should incorporate information on market share and/or brand loyalty. PMID:14555358

  15. Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

    PubMed Central

    Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

    2007-01-01

    The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876

  16. Adapting bioinformatics curricula for big data.

    PubMed

    Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469

  17. Chapter 16: text mining for translational bioinformatics.

    PubMed

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing. PMID:23633944

  18. Adapting bioinformatics curricula for big data

    PubMed Central

    Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469

  19. Chapter 16: Text Mining for Translational Bioinformatics

    PubMed Central

    Cohen, K. Bretonnel; Hunter, Lawrence E.

    2013-01-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing. PMID:23633944

  20. Bioinformatics-Driven New Immune Target Discovery in Disease.

    PubMed

    Yang, C; Chen, P; Zhang, W; Du, H

    2016-08-01

    Biomolecular network analysis has been widely applied in the discovery of cancer driver genes and molecular mechanism anatomization of many diseases on the genetic level. However, the application of such approach in the potential antigen discovery of autoimmune diseases remains largely unexplored. Here, we describe a previously uncharacterized region, with disease-associated autoantigens, to build antigen networks with three bioinformatics tools, namely NetworkAnalyst, GeneMANIA and ToppGene. First, we identified histone H2AX as an antigen of systemic lupus erythematosus by comparing highly ranked genes from all the built network-derived gene lists, and then a new potential biomarker for Behcet's disease, heat shock protein HSP 90-alpha (HSP90AA1), was further screened out. Moreover, 130 confirmed patients were enrolled and a corresponding enzyme-linked immunosorbent assay, mass spectrum analysis and immunoprecipitation were performed to further confirm the bioinformatics results with real-world clinical samples in succession. Our findings demonstrate that the combination of multiple molecular network approaches is a promising tool to discover new immune targets in diseases. PMID:27226232

  1. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  2. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-02-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  3. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-01

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.

  4. Multivariate qualitative analysis of banned additives in food safety using surface enhanced Raman scattering spectroscopy.

    PubMed

    He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei

    2015-02-25

    A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety. PMID:25300041

  5. 7 CFR 91.38 - Additional fees for appeal of analysis.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 3 2011-01-01 2011-01-01 false Additional fees for appeal of analysis. 91.38 Section 91.38 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING... for laboratory service that appears in this paragraph. The new fiscal year for Science and...

  6. 7 CFR 91.38 - Additional fees for appeal of analysis.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 3 2010-01-01 2010-01-01 false Additional fees for appeal of analysis. 91.38 Section 91.38 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING... for laboratory service that appears in this paragraph. The new fiscal year for Science and...

  7. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis.

    PubMed

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition. PMID:26813078

  8. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis

    PubMed Central

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition. PMID:26813078

  9. Stimulation of terrestrial ecosystem carbon storage by nitrogen addition: a meta-analysis

    NASA Astrophysics Data System (ADS)

    Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong

    2016-01-01

    Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition.

  10. A Linked Series of Laboratory Exercises in Molecular Biology Utilizing Bioinformatics and GFP

    ERIC Educational Resources Information Center

    Medin, Carey L.; Nolin, Katie L.

    2011-01-01

    Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…

  11. MOWServ: a web client for integration of bioinformatic resources

    PubMed Central

    Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J.; Claros, M. Gonzalo; Trelles, Oswaldo

    2010-01-01

    The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user’s tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794

  12. MOWServ: a web client for integration of bioinformatic resources.

    PubMed

    Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J; Claros, M Gonzalo; Trelles, Oswaldo

    2010-07-01

    The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794

  13. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis

    PubMed Central

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-01-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis. PMID:26401064

  14. Analysis of occupational accidents: prevention through the use of additional technical safety measures for machinery

    PubMed Central

    Dźwiarek, Marek; Latała, Agata

    2016-01-01

    This article presents an analysis of results of 1035 serious and 341 minor accidents recorded by Poland's National Labour Inspectorate (PIP) in 2005–2011, in view of their prevention by means of additional safety measures applied by machinery users. Since the analysis aimed at formulating principles for the application of technical safety measures, the analysed accidents should bear additional attributes: the type of machine operation, technical safety measures and the type of events causing injuries. The analysis proved that the executed tasks and injury-causing events were closely connected and there was a relation between casualty events and technical safety measures. In the case of tasks consisting of manual feeding and collecting materials, the injuries usually occur because of the rotating motion of tools or crushing due to a closing motion. Numerous accidents also happened in the course of supporting actions, like removing pollutants, correcting material position, cleaning, etc. PMID:26652689

  15. Reducing the matrix effects in chemical analysis: fusion of isotope dilution and standard addition methods

    NASA Astrophysics Data System (ADS)

    Pagliano, Enea; Meija, Juris

    2016-04-01

    The combination of isotope dilution and mass spectrometry has become an ubiquitous tool of chemical analysis. Often perceived as one of the most accurate methods of chemical analysis, it is not without shortcomings. Current isotope dilution equations are not capable of fully addressing one of the key problems encountered in chemical analysis: the possible effect of sample matrix on measured isotope ratios. The method of standard addition does compensate for the effect of sample matrix by making sure that all measured solutions have identical composition. While it is impossible to attain such condition in traditional isotope dilution, we present equations which allow for matrix-matching between all measured solutions by fusion of isotope dilution and standard addition methods.

  16. Analysis of occupational accidents: prevention through the use of additional technical safety measures for machinery.

    PubMed

    Dźwiarek, Marek; Latała, Agata

    2016-01-01

    This article presents an analysis of results of 1035 serious and 341 minor accidents recorded by Poland's National Labour Inspectorate (PIP) in 2005-2011, in view of their prevention by means of additional safety measures applied by machinery users. Since the analysis aimed at formulating principles for the application of technical safety measures, the analysed accidents should bear additional attributes: the type of machine operation, technical safety measures and the type of events causing injuries. The analysis proved that the executed tasks and injury-causing events were closely connected and there was a relation between casualty events and technical safety measures. In the case of tasks consisting of manual feeding and collecting materials, the injuries usually occur because of the rotating motion of tools or crushing due to a closing motion. Numerous accidents also happened in the course of supporting actions, like removing pollutants, correcting material position, cleaning, etc. PMID:26652689

  17. BioRuby: bioinformatics software for the Ruby programming language

    PubMed Central

    Goto, Naohisa; Prins, Pjotr; Nakao, Mitsuteru; Bonnal, Raoul; Aerts, Jan; Katayama, Toshiaki

    2010-01-01

    Summary: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser. Availability: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/. Contact: katayama@bioruby.org PMID:20739307

  18. The DBCLS BioHackathon: standardization and interoperability for bioinformatics web services and workflows. The DBCLS BioHackathon Consortium*

    PubMed Central

    2010-01-01

    Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies. PMID:20727200

  19. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics

    PubMed Central

    2010-01-01

    Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976

  20. IFPA Meeting 2013 Workshop Report II: use of 'omics' in understanding placental development, bioinformatics tools for gene expression analysis, planning and coordination of a placenta research network, placental imaging, evolutionary approaches to understanding pre-eclampsia.

    PubMed

    Ackerman, W E; Adamson, L; Carter, A M; Collins, S; Cox, B; Elliot, M G; Ermini, L; Gruslin, A; Hoodless, P A; Huang, J; Kniss, D A; McGowen, M R; Post, M; Rice, G; Robinson, W; Sadovsky, Y; Salafia, C; Salomon, C; Sled, J G; Todros, T; Wildman, D E; Zamudio, S; Lash, G E

    2014-02-01

    Workshops are an important part of the IFPA annual meeting as they allow for discussion of specialized topics. At the IFPA meeting 2013 twelve themed workshops were presented, five of which are summarized in this report. These workshops related to various aspects of placental biology but collectively covered areas of new technologies for placenta research: 1) use of 'omics' in understanding placental development and pathologies; 2) bioinformatics and use of omics technologies; 3) planning and coordination of a placenta research network; 4) clinical imaging and pathological outcomes; 5) placental evolution. PMID:24315655

  1. Emerging role of bioinformatics tools and software in evolution of clinical research

    PubMed Central

    Gill, Supreet Kaur; Christopher, Ajay Francis; Gupta, Vikas; Bansal, Parveen

    2016-01-01

    Clinical research is making toiling efforts for promotion and wellbeing of the health status of the people. There is a rapid increase in number and severity of diseases like cancer, hepatitis, HIV etc, resulting in high morbidity and mortality. Clinical research involves drug discovery and development whereas clinical trials are performed to establish safety and efficacy of drugs. Drug discovery is a long process starting with the target identification, validation and lead optimization. This is followed by the preclinical trials, intensive clinical trials and eventually post marketing vigilance for drug safety. Softwares and the bioinformatics tools play a great role not only in the drug discovery but also in drug development. It involves the use of informatics in the development of new knowledge pertaining to health and disease, data management during clinical trials and to use clinical data for secondary research. In addition, new technology likes molecular docking, molecular dynamics simulation, proteomics and quantitative structure activity relationship in clinical research results in faster and easier drug discovery process. During the preclinical trials, the software is used for randomization to remove bias and to plan study design. In clinical trials software like electronic data capture, Remote data capture and electronic case report form (eCRF) is used to store the data. eClinical, Oracle clinical are software used for clinical data management and for statistical analysis of the data. After the drug is marketed the safety of a drug could be monitored by drug safety software like Oracle Argus or ARISg. Therefore, softwares are used from the very early stages of drug designing, to drug development, clinical trials and during pharmacovigilance. This review describes different aspects related to application of computers and bioinformatics in drug designing, discovery and development, formulation designing and clinical research. PMID:27453827

  2. Emerging role of bioinformatics tools and software in evolution of clinical research.

    PubMed

    Gill, Supreet Kaur; Christopher, Ajay Francis; Gupta, Vikas; Bansal, Parveen

    2016-01-01

    Clinical research is making toiling efforts for promotion and wellbeing of the health status of the people. There is a rapid increase in number and severity of diseases like cancer, hepatitis, HIV etc, resulting in high morbidity and mortality. Clinical research involves drug discovery and development whereas clinical trials are performed to establish safety and efficacy of drugs. Drug discovery is a long process starting with the target identification, validation and lead optimization. This is followed by the preclinical trials, intensive clinical trials and eventually post marketing vigilance for drug safety. Softwares and the bioinformatics tools play a great role not only in the drug discovery but also in drug development. It involves the use of informatics in the development of new knowledge pertaining to health and disease, data management during clinical trials and to use clinical data for secondary research. In addition, new technology likes molecular docking, molecular dynamics simulation, proteomics and quantitative structure activity relationship in clinical research results in faster and easier drug discovery process. During the preclinical trials, the software is used for randomization to remove bias and to plan study design. In clinical trials software like electronic data capture, Remote data capture and electronic case report form (eCRF) is used to store the data. eClinical, Oracle clinical are software used for clinical data management and for statistical analysis of the data. After the drug is marketed the safety of a drug could be monitored by drug safety software like Oracle Argus or ARISg. Therefore, softwares are used from the very early stages of drug designing, to drug development, clinical trials and during pharmacovigilance. This review describes different aspects related to application of computers and bioinformatics in drug designing, discovery and development, formulation designing and clinical research. PMID:27453827

  3. Highlights of the 2 nd Bioinformatics Student Symposium by ISCB RSG-UK

    PubMed Central

    White, Benjamen; Fatima, Vayani; Fatima, Nazeefa; Das, Sayoni; Rahman, Farzana; Hassan, Mehedi

    2016-01-01

    Following the success of the 1 st Student Symposium by ISCB RSG-UK, a 2 nd Student Symposium took place on 7 th October 2015 at The Genome Analysis Centre, Norwich, UK. This short report summarizes the main highlights from the 2 nd Bioinformatics Student Symposium. PMID:27239284

  4. An "in silico" Bioinformatics Laboratory Manual for Bioscience Departments: "Prediction of Glycosylation Sites in Phosphoethanolamine Transferases"

    ERIC Educational Resources Information Center

    Alyuruk, Hakan; Cavas, Levent

    2014-01-01

    Genomics and proteomics projects have produced a huge amount of raw biological data including DNA and protein sequences. Although these data have been stored in data banks, their evaluation is strictly dependent on bioinformatics tools. These tools have been developed by multidisciplinary experts for fast and robust analysis of biological data.…

  5. Translational Bioinformatics Approaches to Drug Development

    PubMed Central

    Readhead, Ben; Dudley, Joel

    2013-01-01

    Significance A majority of therapeutic interventions occur late in the pathological process, when treatment outcome can be less predictable and effective, highlighting the need for new precise and preventive therapeutic development strategies that consider genomic and environmental context. Translational bioinformatics is well positioned to contribute to the many challenges inherent in bridging this gap between our current reactive methods of healthcare delivery and the intent of precision medicine, particularly in the areas of drug development, which forms the focus of this review. Recent Advances A variety of powerful informatics methods for organizing and leveraging the vast wealth of available molecular measurements available for a broad range of disease contexts have recently emerged. These include methods for data driven disease classification, drug repositioning, identification of disease biomarkers, and the creation of disease network models, each with significant impacts on drug development approaches. Critical Issues An important bottleneck in the application of bioinformatics methods in translational research is the lack of investigators who are versed in both biomedical domains and informatics. Efforts to nurture both sets of competencies within individuals and to increase interfield visibility will help to accelerate the adoption and increased application of bioinformatics in translational research. Future Directions It is possible to construct predictive, multiscale network models of disease by integrating genotype, gene expression, clinical traits, and other multiscale measures using causal network inference methods. This can enable the identification of the “key drivers” of pathology, which may represent novel therapeutic targets or biomarker candidates that play a more direct role in the etiology of disease. PMID:24527359

  6. NMR structure improvement: A structural bioinformatics & visualization approach

    NASA Astrophysics Data System (ADS)

    Block, Jeremy N.

    The overall goal of this project is to enhance the physical accuracy of individual models in macromolecular NMR (Nuclear Magnetic Resonance) structures and the realism of variation within NMR ensembles of models, while improving agreement with the experimental data. A secondary overall goal is to combine synergistically the best aspects of NMR and crystallographic methodologies to better illuminate the underlying joint molecular reality. This is accomplished by using the powerful method of all-atom contact analysis (describing detailed sterics between atoms, including hydrogens); new graphical representations and interactive tools in 3D and virtual reality; and structural bioinformatics approaches to the expanded and enhanced data now available. The resulting better descriptions of macromolecular structure and its dynamic variation enhances the effectiveness of the many biomedical applications that depend on detailed molecular structure, such as mutational analysis, homology modeling, molecular simulations, protein design, and drug design.

  7. A Survey of Bioinformatics Database and Software Usage through Mining the Literature

    PubMed Central

    Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L.; Stevens, Robert

    2016-01-01

    Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371. PMID:27331905

  8. A Survey of Bioinformatics Database and Software Usage through Mining the Literature.

    PubMed

    Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert

    2016-01-01

    Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371. PMID:27331905

  9. Neutron-activation analysis by standard addition and solvent extraction Determination of traces of antimony.

    PubMed

    Alian, A; Shabana, R; Sanad, W; Allam, B; Khalifa, K

    1968-02-01

    The application of neutron activation analysis by standard addition and solvent extraction to the determination of traces of antimony in aluminium and rocks is reported. Three simple extraction procedures, using isopropyl ether, hexone, and tributyl phosphate, are described for the selective separation of radioantimony from interfering radionuclides. Antimony concentration is measured by counting the activities of the (122)Sb and (124)Sb photopeaks at 0.564 and 0.603 MeV. PMID:18960289

  10. Quantum Bio-Informatics II From Quantum Information to Bio-Informatics

    NASA Astrophysics Data System (ADS)

    Accardi, L.; Freudenberg, Wolfgang; Ohya, Masanori

    2009-02-01

    / H. Kamimura -- Massive collection of full-length complementary DNA clones and microarray analyses: keys to rice transcriptome analysis / S. Kikuchi -- Changes of influenza A(H5) viruses by means of entropic chaos degree / K. Sato and M. Ohya -- Basics of genome sequence analysis in bioinformatics - its fundamental ideas and problems / T. Suzuki and S. Miyazaki -- A basic introduction to gene expression studies using microarray expression data analysis / D. Wanke and J. Kilian -- Integrating biological perspectives: a quantum leap for microarray expression analysis / D. Wanke ... [et al.].

  11. Microbial bioinformatics for food safety and production

    PubMed Central

    Alkema, Wynand; Boekhorst, Jos; Wels, Michiel

    2016-01-01

    In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput ‘omics’ technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety. PMID:26082168

  12. Critical Issues in Bioinformatics and Computing

    PubMed Central

    Kesh, Someswa; Raghupathi, Wullianallur

    2004-01-01

    This article provides an overview of the field of bioinformatics and its implications for the various participants. Next-generation issues facing developers (programmers), users (molecular biologists), and the general public (patients) who would benefit from the potential applications are identified. The goal is to create awareness and debate on the opportunities (such as career paths) and the challenges such as privacy that arise. A triad model of the participants' roles and responsibilities is presented along with the identification of the challenges and possible solutions. PMID:18066389

  13. Translational Bioinformatics: Past, Present, and Future

    PubMed Central

    Tenenbaum, Jessica D.

    2016-01-01

    Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field. PMID:26876718

  14. Mobyle: a new full web bioinformatics framework

    PubMed Central

    Néron, Bertrand; Ménager, Hervé; Maufrais, Corinne; Joly, Nicolas; Maupetit, Julien; Letort, Sébastien; Carrere, Sébastien; Tuffery, Pierre; Letondal, Catherine

    2009-01-01

    Motivation: For the biologist, running bioinformatics analyses involves a time-consuming management of data and tools. Users need support to organize their work, retrieve parameters and reproduce their analyses. They also need to be able to combine their analytic tools using a safe data flow software mechanism. Finally, given that scientific tools can be difficult to install, it is particularly helpful for biologists to be able to use these tools through a web user interface. However, providing a web interface for a set of tools raises the problem that a single web portal cannot offer all the existing and possible services: it is the user, again, who has to cope with data copy among a number of different services. A framework enabling portal administrators to build a network of cooperating services would therefore clearly be beneficial. Results: We have designed a system, Mobyle, to provide a flexible and usable Web environment for defining and running bioinformatics analyses. It embeds simple yet powerful data management features that allow the user to reproduce analyses and to combine tools using a hierarchical typing system. Mobyle offers invocation of services distributed over remote Mobyle servers, thus enabling a federated network of curated bioinformatics portals without the user having to learn complex concepts or to install sophisticated software. While being focused on the end user, the Mobyle system also addresses the need, for the bioinfomatician, to automate remote services execution: PlayMOBY is a companion tool that automates the publication of BioMOBY web services, using Mobyle program definitions. Availability: The Mobyle system is distributed under the terms of the GNU GPLv2 on the project web site (http://bioweb2.pasteur.fr/projects/mobyle/). It is already deployed on three servers: http://mobyle.pasteur.fr, http://mobyle.rpbs.univ-paris-diderot.fr and http://lipm-bioinfo.toulouse.inra.fr/Mobyle. The PlayMOBY companion is distributed under the

  15. InCoB2010 - 9th International Conference on Bioinformatics at Tokyo, Japan, September 26-28, 2010

    PubMed Central

    2010-01-01

    The International Conference on Bioinformatics (InCoB), the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet), is hosted in one of countries of the Asia-Pacific region. The 2010 conference was awarded to Japan and has attracted more than one hundred high-quality research paper submissions. Thorough peer reviewing resulted in 47 (43.5%) accepted papers out of 108 submissions. Submissions from Japan, R.O. Korea, P.R. China, Australia, Singapore and U.S.A totaled 43.8% and contributed to 57.4% of accepted papers. Manuscripts originating from Taiwan and India added up to 42.8% of submissions and 28.3% of acceptances. The fifteen articles published in this BMC Bioinformatics supplement cover disease informatics, structural bioinformatics and drug design, biological databases and software tools, signaling pathways, gene regulatory and biochemical networks, evolution and sequence analysis. PMID:21106116

  16. Comparative Bioinformatics Analyses and Profiling of Lysosome-Related Organelle Proteomes

    PubMed Central

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for 7 lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles. PMID:17375895

  17. Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes

    NASA Astrophysics Data System (ADS)

    Hu, Zhang-Zhi; Valencia, Julio C.; Huang, Hongzhan; Chi, An; Shabanowitz, Jeffrey; Hearing, Vincent J.; Appella, Ettore; Wu, Cathy

    2007-01-01

    Complete and accurate profiling of cellular organelle proteomes, while challenging, is important for the understanding of detailed cellular processes at the organelle level. Mass spectrometry technologies coupled with bioinformatics analysis provide an effective approach for protein identification and functional interpretation of organelle proteomes. In this study, we have compiled human organelle reference datasets from large-scale proteomic studies and protein databases for seven lysosome-related organelles (LROs), as well as the endoplasmic reticulum and mitochondria, for comparative organelle proteome analysis. Heterogeneous sources of human organelle proteins and rodent homologs are mapped to human UniProtKB protein entries based on ID and/or peptide mappings, followed by functional annotation and categorization using the iProXpress proteomic expression analysis system. Cataloging organelle proteomes allows close examination of both shared and unique proteins among various LROs and reveals their functional relevance. The proteomic comparisons show that LROs are a closely related family of organelles. The shared proteins indicate the dynamic and hybrid nature of LROs, while the unique transmembrane proteins may represent additional candidate marker proteins for LROs. This comparative analysis, therefore, provides a basis for hypothesis formulation and experimental validation of organelle proteins and their functional roles.

  18. Bioinformatics Annotation of Human Y Chromosome-Encoded Protein Pathways and Interactions.

    PubMed

    Rengaraj, Deivendran; Kwon, Woo-Sung; Pang, Myung-Geol

    2015-09-01

    We performed a comprehensive analysis of human Y chromosome-encoded proteins, their pathways, and their interactions using bioinformatics tools. From the NCBI annotation release 107 of human genome, we retrieved a total of 66 proteins encoded on Y chromosome. Most of the retrieved proteins were also matched with the proteins listed in the core databases of the Human Proteome Project including neXtProt, PeptideAtlas, and the Human Protein Atlas. When we examined the pathways of human Y-encoded proteins through KEGG database and Pathway Studio software, many of proteins fall into the categories related to cell signaling pathways. Using the STRING program, we found a total of 49 human Y-encoded proteins showing strong/medium interaction with each other. While using the Pathway studio software, we found that a total of 16 proteins interact with other chromosome-encoded proteins. In particular, the SRY protein interacted with 17 proteins encoded on other chromosomes. Additionally, we aligned the sequences of human Y-encoded proteins with the sequences of chimpanzee and mouse Y-encoded proteins using the NCBI BLAST program. This analysis resulted in a significant number of orthologous proteins between human, chimpanzee, and mouse. Collectively, our findings provide the scientific community with additional information on the human Y chromosome-encoded proteins. PMID:26279084

  19. The potential of translational bioinformatics approaches for pharmacology research.

    PubMed

    Li, Lang

    2015-10-01

    The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of 'omics' to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines. PMID:25753093

  20. OpenHelix: bioinformatics education outside of a different box.

    PubMed

    Williams, Jennifer M; Mangan, Mary E; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C

    2010-11-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181

  1. Translational Bioinformatics: Linking the Molecular World to the Clinical World

    PubMed Central

    Altman, RB

    2014-01-01

    Translational bioinformatics represents the union of translational medicine and bioinformatics. Translational medicine moves basic biological discoveries from the research bench into the patient-care setting and uses clinical observations to inform basic biology. It focuses on patient care, including the creation of new diagnostics, prognostics, prevention strategies, and therapies based on biological discoveries. Bioinformatics involves algorithms to represent, store, and analyze basic biological data, including DNA sequence, RNA expression, and protein and small-molecule abundance within cells. Translational bioinformatics spans these two fields; it involves the development of algorithms to analyze basic molecular and cellular data with an explicit goal of affecting clinical care. PMID:22549287

  2. OpenHelix: bioinformatics education outside of a different box

    PubMed Central

    Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.

    2010-01-01

    The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181

  3. Tools and collaborative environments for bioinformatics research

    PubMed Central

    Giugno, Rosalba; Pulvirenti, Alfredo

    2011-01-01

    Advanced research requires intensive interaction among a multitude of actors, often possessing different expertise and usually working at a distance from each other. The field of collaborative research aims to establish suitable models and technologies to properly support these interactions. In this article, we first present the reasons for an interest of Bioinformatics in this context by also suggesting some research domains that could benefit from collaborative research. We then review the principles and some of the most relevant applications of social networking, with a special attention to networks supporting scientific collaboration, by also highlighting some critical issues, such as identification of users and standardization of formats. We then introduce some systems for collaborative document creation, including wiki systems and tools for ontology development, and review some of the most interesting biological wikis. We also review the principles of Collaborative Development Environments for software and show some examples in Bioinformatics. Finally, we present the principles and some examples of Learning Management Systems. In conclusion, we try to devise some of the goals to be achieved in the short term for the exploitation of these technologies. PMID:21984743

  4. ANALYSIS OF DISTRIBUTION FEEDER LOSSES DUE TO ADDITION OF DISTRIBUTED PHOTOVOLTAIC GENERATORS

    SciTech Connect

    Tuffner, Francis K.; Singh, Ruchi

    2011-08-09

    Distributed generators (DG) are small scale power supplying sources owned by customers or utilities and scattered throughout the power system distribution network. Distributed generation can be both renewable and non-renewable. Addition of distributed generation is primarily to increase feeder capacity and to provide peak load reduction. However, this addition comes with several impacts on the distribution feeder. Several studies have shown that addition of DG leads to reduction of feeder loss. However, most of these studies have considered lumped load and distributed load models to analyze the effects on system losses, where the dynamic variation of load due to seasonal changes is ignored. It is very important for utilities to minimize the losses under all scenarios to decrease revenue losses, promote efficient asset utilization, and therefore, increase feeder capacity. This paper will investigate an IEEE 13-node feeder populated with photovoltaic generators on detailed residential houses with water heater, Heating Ventilation and Air conditioning (HVAC) units, lights, and other plug and convenience loads. An analysis of losses for different power system components, such as transformers, underground and overhead lines, and triplex lines, will be performed. The analysis will utilize different seasons and different solar penetration levels (15%, 30%).

  5. Analysis of redox additive-based overcharge protection for rechargeable lithium batteries

    NASA Technical Reports Server (NTRS)

    Narayanan, S. R.; Surampudi, S.; Attia, A. I.; Bankston, C. P.

    1991-01-01

    The overcharge condition in secondary lithium batteries employing redox additives for overcharge protection, has been theoretically analyzed in terms of a finite linear diffusion model. The analysis leads to expressions relating the steady-state overcharge current density and cell voltage to the concentration, diffusion coefficient, standard reduction potential of the redox couple, and interelectrode distance. The model permits the estimation of the maximum permissible overcharge rate for any chosen set of system conditions. Digital simulation of the overcharge experiment leads to numerical representation of the potential transients, and estimate of the influence of diffusion coefficient and interelectrode distance on the transient attainment of the steady state during overcharge. The model has been experimentally verified using 1,1-prime-dimethyl ferrocene as a redox additive. The analysis of the experimental results in terms of the theory allows the calculation of the diffusion coefficient and the formal potential of the redox couple. The model and the theoretical results may be exploited in the design and optimization of overcharge protection by the redox additive approach.

  6. Analysis of error-prone survival data under additive hazards models: measurement error effects and adjustments.

    PubMed

    Yan, Ying; Yi, Grace Y

    2016-07-01

    Covariate measurement error occurs commonly in survival analysis. Under the proportional hazards model, measurement error effects have been well studied, and various inference methods have been developed to correct for error effects under such a model. In contrast, error-contaminated survival data under the additive hazards model have received relatively less attention. In this paper, we investigate this problem by exploring measurement error effects on parameter estimation and the change of the hazard function. New insights of measurement error effects are revealed, as opposed to well-documented results for the Cox proportional hazards model. We propose a class of bias correction estimators that embraces certain existing estimators as special cases. In addition, we exploit the regression calibration method to reduce measurement error effects. Theoretical results for the developed methods are established, and numerical assessments are conducted to illustrate the finite sample performance of our methods. PMID:26328545

  7. In-line image analysis on the effects of additives in batch cooling crystallization

    NASA Astrophysics Data System (ADS)

    Qu, Haiyan; Louhi-Kultanen, Marjatta; Kallas, Juha

    2006-03-01

    The effects of two potassium salt additives, ethylene diamine tetra acetic acid dipotassium salt (EDTA) and potassium pyrophosphate (KPY), on the batch cooling crystallization of potassium dihydrogen phosphate (KDP) were investigated. The crystal growth rates of certain crystal faces were determined from in-line images taken with a MTS particle image analysis (PIA) video microscope. An in-line image processing method was developed to characterize the size and shape of the crystals. The nucleation kinetics was studied by measurement of the metastable zone width and induction time. A significant promotion effect on both nucleation and growth of KDP was observed when EDTA was used as an additive. KPY, however, exhibited strong inhibiting impacts. The mechanism underlying the EDTA promotion effect on crystal growth was further studied with the 2-dimension nucleation model. It is shown that the presence of EDTA increased the density of adsorbed molecules of the crystallizing solute on the surface of the crystal.

  8. Genomic and bioinformatics analysis of HAdV-7, a human adenovirus of species B1 that causes acute respiratory disease: implications for vector development in human gene therapy.

    PubMed

    Purkayastha, Anjan; Su, Jing; Carlisle, Steve; Tibbetts, Clark; Seto, Donald

    2005-02-01

    Human adenovirus serotype 7 (HAdV-7) is a reemerging pathogen identified in acute respiratory disease (ARD), particularly in epidemics affecting basic military trainee populations of otherwise healthy young adults. The genome has been sequenced and annotated (GenBank accession no. ). Comparative genomics and bioinformatics analyses of the HAdV-7 genome sequence provide insight into its natural history and phylogenetic relationships. A putative origin of HAdV-7 from a chimpanzee host is observed. This has implications within the current biotechnological interest of using chimpanzee adenoviruses as vectors for human gene therapy and DNA vaccine delivery. Rapid genome sequencing and analyses of this species B1 member provide an example of exploiting accurate low-pass DNA sequencing technology in pathogen characterization and epidemic outbreak surveillance through the identification, validation, and application of unique pathogen genome signatures. PMID:15661145

  9. Why Polyphenols have Promiscuous Actions? An Investigation by Chemical Bioinformatics.

    PubMed

    Tang, Guang-Yan

    2016-05-01

    Despite their diverse pharmacological effects, polyphenols are poor for use as drugs, which have been traditionally ascribed to their low bioavailability. However, Baell and co-workers recently proposed that the redox potential of polyphenols also plays an important role in this, because redox reactions bring promiscuous actions on various protein targets and thus produce non-specific pharmacological effects. To investigate whether the redox reactivity behaves as a critical factor in polyphenol promiscuity, we performed a chemical bioinformatics analysis on the structure-activity relationships of twenty polyphenols. It was found that the gene expression profiles of human cell lines induced by polyphenols were not correlated with the presence or not of redox moieties in the polyphenols, but significantly correlated with their molecular structures. Therefore, it is concluded that the promiscuous actions of polyphenols are likely to result from their inherent structural features rather than their redox potential. PMID:27319142

  10. Bioinformatics and the Politics of Innovation in the Life Sciences

    PubMed Central

    Zhou, Yinhua; Datta, Saheli; Salter, Charlotte

    2016-01-01

    The governments of China, India, and the United Kingdom are unanimous in their belief that bioinformatics should supply the link between basic life sciences research and its translation into health benefits for the population and the economy. Yet at the same time, as ambitious states vying for position in the future global bioeconomy they differ considerably in the strategies adopted in pursuit of this goal. At the heart of these differences lies the interaction between epistemic change within the scientific community itself and the apparatus of the state. Drawing on desk-based research and thirty-two interviews with scientists and policy makers in the three countries, this article analyzes the politics that shape this interaction. From this analysis emerges an understanding of the variable capacities of different kinds of states and political systems to work with science in harnessing the potential of new epistemic territories in global life sciences innovation. PMID:27546935

  11. Identification of novel genes and pathways in carotid atheroma using integrated bioinformatic methods

    PubMed Central

    Nai, Wenqing; Threapleton, Diane; Lu, Jingbo; Zhang, Kewei; Wu, Hongyuan; Fu, You; Wang, Yuanyuan; Ou, Zejin; Shan, Lanlan; Ding, Yan; Yu, Yanlin; Dai, Meng

    2016-01-01

    Atherosclerosis is the primary cause of cardiovascular events and its molecular mechanism urgently needs to be clarified. In our study, atheromatous plaques (ATH) and macroscopically intact tissue (MIT) sampled from 32 patients were compared and an integrated series of bioinformatic microarray analyses were used to identify altered genes and pathways. Our work showed 816 genes were differentially expressed between ATH and MIT, including 443 that were up-regulated and 373 that were down-regulated in ATH tissues. GO functional-enrichment analysis for differentially expressed genes (DEGs) indicated that genes related to the “immune response” and “muscle contraction” were altered in ATHs. KEGG pathway-enrichment analysis showed that up-regulated DEGs were significantly enriched in the “FcεRI-mediated signaling pathway”, while down-regulated genes were significantly enriched in the “transforming growth factor-β signaling pathway”. Protein-protein interaction network and module analysis demonstrated that VAV1, SYK, LYN and PTPN6 may play critical roles in the network. Additionally, similar observations were seen in a validation study where SYK, LYN and PTPN6 were markedly elevated in ATH. All in all, identification of these genes and pathways not only provides new insights into the pathogenesis of atherosclerosis, but may also aid in the development of prognostic and therapeutic biomarkers for advanced atheroma. PMID:26742467

  12. Identification of novel genes and pathways in carotid atheroma using integrated bioinformatic methods.

    PubMed

    Nai, Wenqing; Threapleton, Diane; Lu, Jingbo; Zhang, Kewei; Wu, Hongyuan; Fu, You; Wang, Yuanyuan; Ou, Zejin; Shan, Lanlan; Ding, Yan; Yu, Yanlin; Dai, Meng

    2016-01-01

    Atherosclerosis is the primary cause of cardiovascular events and its molecular mechanism urgently needs to be clarified. In our study, atheromatous plaques (ATH) and macroscopically intact tissue (MIT) sampled from 32 patients were compared and an integrated series of bioinformatic microarray analyses were used to identify altered genes and pathways. Our work showed 816 genes were differentially expressed between ATH and MIT, including 443 that were up-regulated and 373 that were down-regulated in ATH tissues. GO functional-enrichment analysis for differentially expressed genes (DEGs) indicated that genes related to the "immune response" and "muscle contraction" were altered in ATHs. KEGG pathway-enrichment analysis showed that up-regulated DEGs were significantly enriched in the "FcεRI-mediated signaling pathway", while down-regulated genes were significantly enriched in the "transforming growth factor-β signaling pathway". Protein-protein interaction network and module analysis demonstrated that VAV1, SYK, LYN and PTPN6 may play critical roles in the network. Additionally, similar observations were seen in a validation study where SYK, LYN and PTPN6 were markedly elevated in ATH. All in all, identification of these genes and pathways not only provides new insights into the pathogenesis of atherosclerosis, but may also aid in the development of prognostic and therapeutic biomarkers for advanced atheroma. PMID:26742467

  13. Neutron activation analysis by standard addition and solvent extraction: Determination of impurities in aluminium.

    PubMed

    Alian, A; Haggag, A

    1967-09-01

    A separation scheme based on selective extraction in conjunction with the standard addition technique has been developed for the determination of impurities in aluminium by neutron activation. Preliminary investigations have been carried out on the extractability of Sc, Co, Hf, Fe, Sn, Cd, Zn, Ag, Cr, Ce, Cs and Rb by TDA and TBP from acidic media. The best conditions are predicted for the separation of these elements into fractions suitable for analysis by gamma-ray spectrometry. Recovery values of approximately 90% were obtained for all the elements. PMID:18960206

  14. A near-infrared spectroscopic study of young field ultracool dwarfs: additional analysis

    NASA Astrophysics Data System (ADS)

    Allers, K. N.; Liu, M. C.

    We present additional analysis of the classification system presented in \\citet{allers13}. We refer the reader to \\citet{allers13} for a detailed discussion of our near-IR spectral type and gravity classification system. Here, we address questions and comments from participants of the Brown Dwarfs Come of Age meeting. In particular, we examine the effects of binarity and metallicity on our classification system. We also present our classification of Pleiades brown dwarfs using published spectra. Lastly, we determine SpTs and calculate gravity-sensitive indices for the BT-Settl atmospheric models and compare them to observations.

  15. Addition of three-dimensional isoparametric elements to NASA structural analysis program (NASTRAN)

    NASA Technical Reports Server (NTRS)

    Field, E. I.; Johnson, S. E.

    1973-01-01

    Implementation is made of the three-dimensional family of linear, quadratic and cubic isoparametric solid elements into the NASA Structural Analysis program, NASTRAN. This work included program development, installation, testing, and documentation. The addition of these elements to NASTRAN provides a significant increase in modeling capability particularly for structures requiring specification of temperatures, material properties, displacements, and stresses which vary throughout each individual element. Complete program documentation is presented in the form of new sections and updates for direct insertion to the three NASTRAN manuals. The results of demonstration test problems are summarized. Excellent results are obtained with the isoparametric elements for static, normal mode, and buckling analyses.

  16. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  17. Hydroxysteroid dehydrogenases (HSDs) in bacteria: a bioinformatic perspective.

    PubMed

    Kisiela, Michael; Skarka, Adam; Ebert, Bettina; Maser, Edmund

    2012-03-01

    Steroidal compounds including cholesterol, bile acids and steroid hormones play a central role in various physiological processes such as cell signaling, growth, reproduction, and energy homeostasis. Hydroxysteroid dehydrogenases (HSDs), which belong to the superfamily of short-chain dehydrogenases/reductases (SDR) or aldo-keto reductases (AKR), are important enzymes involved in the steroid hormone metabolism. HSDs function as an enzymatic switch that controls the access of receptor-active steroids to nuclear hormone receptors and thereby mediate a fine-tuning of the steroid response. The aim of this study was the identification of classified functional HSDs and the bioinformatic annotation of these proteins in all complete sequenced bacterial genomes followed by a phylogenetic analysis. For the bioinformatic annotation we constructed specific hidden Markov models in an iterative approach to provide a reliable identification for the specific catalytic groups of HSDs. Here, we show a detailed phylogenetic analysis of 3α-, 7α-, 12α-HSDs and two further functional related enzymes (3-ketosteroid-Δ(1)-dehydrogenase, 3-ketosteroid-Δ(4)(5α)-dehydrogenase) from the superfamily of SDRs. For some bacteria that have been previously reported to posses a specific HSD activity, we could annotate the corresponding HSD protein. The dominating phyla that were identified to express HSDs were that of Actinobacteria, Proteobacteria, and Firmicutes. Moreover, some evolutionarily more ancient microorganisms (e.g., Cyanobacteria and Euryachaeota) were found as well. A large number of HSD-expressing bacteria constitute the normal human gastro-intestinal flora. Another group of bacteria were originally isolated from natural habitats like seawater, soil, marine and permafrost sediments. These bacteria include polycyclic aromatic hydrocarbons-degrading species such as Pseudomonas, Burkholderia and Rhodococcus. In conclusion, HSDs are found in a wide variety of microorganisms including

  18. Re-analysis of survival data of cancer patients utilizing additive homeopathy.

    PubMed

    Gleiss, Andreas; Frass, Michael; Gaertner, Katharina

    2016-08-01

    In this short communication we present a re-analysis of homeopathic patient data in comparison to control patient data from the same Outpatient´s Unit "Homeopathy in malignant diseases" of the Medical University of Vienna. In this analysis we took account of a probable immortal time bias. For patients suffering from advanced stages of cancer and surviving the first 6 or 12 months after diagnosis, respectively, the results show that utilizing homeopathy gives a statistically significant (p<0.001) advantage over control patients regarding survival time. In conclusion, bearing in mind all limitations, the results of this retrospective study suggest that patients with advanced stages of cancer might benefit from additional homeopathic treatment until a survival time of up to 12 months after diagnosis. PMID:27515878

  19. BIRI: a new approach for automatically discovering and indexing available public bioinformatics resources from the literature

    PubMed Central

    de la Calle, Guillermo; García-Remesal, Miguel; Chiesa, Stefano; de la Iglesia, Diana; Maojo, Victor

    2009-01-01

    Background The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. Results We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. Conclusion These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at . PMID:19811635

  20. Effectiveness and Usability of Bioinformatics Tools to Analyze Pathways Associated with miRNA Expression

    PubMed Central

    Mullany, Lila E; Wolff, Roger K; Slattery, Martha L

    2015-01-01

    MiRNAs are small, nonprotein-coding RNA molecules involved in gene regulation. While bioinformatics help guide miRNA research, it is less clear how they perform when studying biological pathways. We used 13 criteria to evaluate effectiveness and usability of existing bioinformatics tools. We evaluated the performance of six bioinformatics tools with a cluster of 12 differentially expressed miRNAs in colorectal tumors and three additional sets of 12 miRNAs that are not part of a known cluster. MiRPath performed the best of all the tools in linking miRNAs, with 92% of all miRNAs linked as well as the highest based on our established criteria followed by Ingenuity (58% linked). Other tools, including Empirical Gene Ontology, miRó, miRMaid, and PhenomiR, were limited by their lack of available tutorials, lack of flexibility and interpretability, and/or difficulty using the tool. In summary, we observed a lack of standardization across bioinformatic tools and a general lack of specificity in terms of pathways identified between groups of miRNAs. Hopefully, this evaluation will help guide the development of new tools. PMID:26560461

  1. Assessment of a Bioinformatics across Life Science Curricula Initiative

    ERIC Educational Resources Information Center

    Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.

    2007-01-01

    At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…

  2. Evaluating an Inquiry-Based Bioinformatics Course Using Q Methodology

    ERIC Educational Resources Information Center

    Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.

    2008-01-01

    Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…

  3. Generative Topic Modeling in Image Data Mining and Bioinformatics Studies

    ERIC Educational Resources Information Center

    Chen, Xin

    2012-01-01

    Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…

  4. Genomics and Bioinformatics Resources for Crop Improvement

    PubMed Central

    Mochida, Keiichi; Shinozaki, Kazuo

    2010-01-01

    Recent remarkable innovations in platforms for omics-based research and application development provide crucial resources to promote research in model and applied plant species. A combinatorial approach using multiple omics platforms and integration of their outcomes is now an effective strategy for clarifying molecular systems integral to improving plant productivity. Furthermore, promotion of comparative genomics among model and applied plants allows us to grasp the biological properties of each species and to accelerate gene discovery and functional analyses of genes. Bioinformatics platforms and their associated databases are also essential for the effective design of approaches making the best use of genomic resources, including resource integration. We review recent advances in research platforms and resources in plant omics together with related databases and advances in technology. PMID:20208064

  5. The European Bioinformatics Institute's data resources 2014.

    PubMed

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue. PMID:24271396

  6. Postgenomics: Proteomics and Bioinformatics in Cancer Research

    PubMed Central

    2003-01-01

    Now that the human genome is completed, the characterization of the proteins encoded by the sequence remains a challenging task. The study of the complete protein complement of the genome, the “proteome,” referred to as proteomics, will be essential if new therapeutic drugs and new disease biomarkers for early diagnosis are to be developed. Research efforts are already underway to develop the technology necessary to compare the specific protein profiles of diseased versus nondiseased states. These technologies provide a wealth of information and rapidly generate large quantities of data. Processing the large amounts of data will lead to useful predictive mathematical descriptions of biological systems which will permit rapid identification of novel therapeutic targets and identification of metabolic disorders. Here, we present an overview of the current status and future research approaches in defining the cancer cell's proteome in combination with different bioinformatics and computational biology tools toward a better understanding of health and disease. PMID:14615629

  7. Bioinformatics Resources for MicroRNA Discovery

    PubMed Central

    Moore, Alyssa C.; Winkjer, Jonathan S.; Tseng, Tsai-Tien

    2015-01-01

    Biomarker identification is often associated with the diagnosis and evaluation of various diseases. Recently, the role of microRNA (miRNA) has been implicated in the development of diseases, particularly cancer. With the advent of next-generation sequencing, the amount of data on miRNA has increased tremendously in the last decade, requiring new bioinformatics approaches for processing and storing new information. New strategies have been developed in mining these sequencing datasets to allow better understanding toward the actions of miRNAs. As a result, many databases have also been established to disseminate these findings. This review focuses on several curated databases of miRNAs and their targets from both predicted and validated sources. PMID:26819547

  8. Survey: Translational Bioinformatics embraces Big Data

    PubMed Central

    Shah, Nigam H.

    2015-01-01

    Summary We review the latest trends and major developments in translational bioinformatics in the year 2011–2012. Our emphasis is on highlighting the key events in the field and pointing at promising research areas for the future. The key take-home points are: Translational informatics is ready to revolutionize human health and healthcare using large-scale measurements on individuals.Data–centric approaches that compute on massive amounts of data (often called “Big Data”) to discover patterns and to make clinically relevant predictions will gain adoption.Research that bridges the latest multimodal measurement technologies with large amounts of electronic healthcare data is increasing; and is where new breakthroughs will occur. PMID:22890354

  9. 4273π: Bioinformatics education on low cost ARM hardware

    PubMed Central

    2013-01-01

    Background Teaching bioinformatics at universities is complicated by typical computer classroom settings. As well as running software locally and online, students should gain experience of systems administration. For a future career in biology or bioinformatics, the installation of software is a useful skill. We propose that this may be taught by running the course on GNU/Linux running on inexpensive Raspberry Pi computer hardware, for which students may be granted full administrator access. Results We release 4273π, an operating system image for Raspberry Pi based on Raspbian Linux. This includes minor customisations for classroom use and includes our Open Access bioinformatics course, 4273π Bioinformatics for Biologists. This is based on the final-year undergraduate module BL4273, run on Raspberry Pi computers at the University of St Andrews, Semester 1, academic year 2012–2013. Conclusions 4273π is a means to teach bioinformatics, including systems administration tasks, to undergraduates at low cost. PMID:23937194

  10. A Fully Non-Metallic Gas Turbine Engine Enabled by Additive Manufacturing Part I: System Analysis, Component Identification, Additive Manufacturing, and Testing of Polymer Composites

    NASA Technical Reports Server (NTRS)

    Grady, Joseph E.; Haller, William J.; Poinsatte, Philip E.; Halbig, Michael C.; Schnulo, Sydney L.; Singh, Mrityunjay; Weir, Don; Wali, Natalie; Vinup, Michael; Jones, Michael G.; Patterson, Clark; Santelle, Tom; Mehl, Jeremy

    2015-01-01

    The research and development activities reported in this publication were carried out under NASA Aeronautics Research Institute (NARI) funded project entitled "A Fully Nonmetallic Gas Turbine Engine Enabled by Additive Manufacturing." The objective of the project was to conduct evaluation of emerging materials and manufacturing technologies that will enable fully nonmetallic gas turbine engines. The results of the activities are described in three part report. The first part of the report contains the data and analysis of engine system trade studies, which were carried out to estimate reduction in engine emissions and fuel burn enabled due to advanced materials and manufacturing processes. A number of key engine components were identified in which advanced materials and additive manufacturing processes would provide the most significant benefits to engine operation. The technical scope of activities included an assessment of the feasibility of using additive manufacturing technologies to fabricate gas turbine engine components from polymer and ceramic matrix composites, which were accomplished by fabricating prototype engine components and testing them in simulated engine operating conditions. The manufacturing process parameters were developed and optimized for polymer and ceramic composites (described in detail in the second and third part of the report). A number of prototype components (inlet guide vane (IGV), acoustic liners, engine access door) were additively manufactured using high temperature polymer materials. Ceramic matrix composite components included turbine nozzle components. In addition, IGVs and acoustic liners were tested in simulated engine conditions in test rigs. The test results are reported and discussed in detail.

  11. Bioinformatics Approach for Prediction of Functional Coding/Noncoding Simple Polymorphisms (SNPs/Indels) in Human BRAF Gene.

    PubMed

    Hassan, Mohamed M; Omer, Shaza E; Khalf-Allah, Rahma M; Mustafa, Razaz Y; Ali, Isra S; Mohamed, Sofia B

    2016-01-01

    This study was carried out for Homo sapiens single variation (SNPs/Indels) in BRAF gene through coding/non-coding regions. Variants data was obtained from database of SNP even last update of November, 2015. Many bioinformatics tools were used to identify functional SNPs and indels in proteins functions, structures and expressions. Results shown, for coding polymorphisms, 111 SNPs predicted as highly damaging and six other were less. For UTRs, showed five SNPs and one indel were altered in micro RNAs binding sites (3' UTR), furthermore nil SNP or indel have functional altered in transcription factor binding sites (5' UTR). In addition for 5'/3' splice sites, analysis showed that one SNP within 5' splice site and one Indel in 3' splice site showed potential alteration of splicing. In conclude these previous functional identified SNPs and indels could lead to gene alteration, which may be directly or indirectly contribute to the occurrence of many diseases. PMID:27478437

  12. Comparative modeling of proteins: a method for engaging students' interest in bioinformatics tools.

    PubMed

    Badotti, Fernanda; Barbosa, Alan Sales; Reis, André Luiz Martins; do Valle, Italo Faria; Ambrósio, Lara; Bitar, Mainá

    2014-01-01

    The huge increase in data being produced in the genomic era has produced a need to incorporate computers into the research process. Sequence generation, its subsequent storage, interpretation, and analysis are now entirely computer-dependent tasks. Universities from all over the world have been challenged to seek a way of encouraging students to incorporate computational and bioinformatics skills since undergraduation in order to understand biological processes. The aim of this article is to report the experience of awakening students' interest in bioinformatics tools during a course focused on comparative modeling of proteins. The authors start by giving a full description of the course environmental context and students' backgrounds. Then they detail each class and present a general overview of the protein modeling protocol. The positive and negative aspects of the course are also reported, and some of the results generated in class and in projects outside the classroom are discussed. In the last section of the article, general perspectives about the course from students' point of view are given. This work can serve as a guide for professors who teach subjects for which bioinformatics tools are useful and for universities that plan to incorporate bioinformatics into the curriculum. PMID:24167006

  13. Warming and drying of the eastern Mediterranean: Additional evidence from trend analysis

    NASA Astrophysics Data System (ADS)

    Shohami, David; Dayan, Uri; Morin, Efrat

    2011-11-01

    The climate of the eastern Mediterranean (EM), at the transition zone between the Mediterranean climate and the semi-arid/arid climate, has been studied for a 39-year period to determine whether climate changes have taken place. A thorough trend analysis using the nonparametric Mann-Kendall test with Sen's slope estimator has been applied to ground station measurements, atmospheric reanalysis data, synoptic classification data and global data sets for the years 1964-2003. In addition, changes in atmospheric regional patterns between the first and last twenty years were determined by visual comparisons of their composite mean. The main findings of the analysis are: 1) changes of atmospheric conditions during summer and the transitional seasons (mainly autumn) support a warmer climate over the EM and this change is already statistically evident in surface temperatures having exhibited positive trends of 0.2-1°C/decade; 2) changes of atmospheric conditions during winter and the transitional