Hoon, Shawn; Ratnapu, Kiran Kumar; Chia, Jer-Ming; Kumarasamy, Balamurugan; Juguang, Xiao; Clamp, Michele; Stabenau, Arne; Potter, Simon; Clarke, Laura; Stupka, Elia
We identify several challenges facing bioinformatics analysis today. Firstly, to fulfill the promise of comparative studies, bioinformatics analysis will need to accommodate different sources of data residing in a federation of databases that, in turn, come in different formats and modes of accessibility. Secondly, the tsunami of data to be handled will require robust systems that enable bioinformatics analysis to be carried out in a parallel fashion. Thirdly, the ever-evolving state of bioinformatics presents new algorithms and paradigms in conducting analysis. This means that any bioinformatics framework must be flexible and generic enough to accommodate such changes. In addition, we identify the need for introducing an explicit protocol-based approach to bioinformatics analysis that will lend rigorousness to the analysis. This makes it easier for experimentation and replication of results by external parties. Biopipe is designed in an effort to meet these goals. It aims to allow researchers to focus on protocol design. At the same time, it is designed to work over a compute farm and thus provides high-throughput performance. A common exchange format that encapsulates the entire protocol in terms of the analysis modules, parameters, and data versions has been developed to provide a powerful way in which to distribute and reproduce results. This will enable researchers to discuss and interpret the data better as the once implicit assumptions are now explicitly defined within the Biopipe framework.
Lu, Qiang; Hao, Pei; Curcin, Vasa; He, Weizhong; Li, Yuan-Yuan; Luo, Qing-Ming; Guo, Yi-Ke; Li, Yi-Xue
Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.
Most biochemical reactions in a cell are regulated by highly specialized proteins, which are the prime mediators of the cellular phenotype. Therefore the identification, quantitation and characterization of all proteins in a cell are of utmost importance to understand the molecular processes that mediate cellular physiology. With the advent of robust and reliable mass spectrometers that are able to analyze complex protein mixtures within a reasonable timeframe, the systematic analysis of all proteins in a cell becomes feasible. Besides the ongoing improvements of analytical hardware, standardized methods to analyze and study all proteins have to be developed that allow the generation of testable new hypothesis based on the enormous pre-existing amount of biological information. Here we discuss current strategies on how to gather, filter and analyze proteomic data sates using available software packages. PMID:25033288
Moriconi, Francesco; Beard, Michael R; Yuen, Lilly Kw
HBV and HCV are the only hepatotropic viruses capable of establishing chronic infections. More than 500 million people worldwide are estimated to have chronic infections with HBV and/or HCV, and they have an increased risk of developing liver complications, such as cirrhosis or hepatocellular carcinoma. During the past decade, several antiviral agents including immune-modulatory drugs and nucleoside/nucleotide analogues have been approved for the treatment of HBV and HCV infections. In recent years, the focus has been on the development of new and better therapeutic agents for management of chronic HCV infections. Bioinformatics has only been applied recently to the field of viral hepatitis research. In addition to the wide range of general tools freely available for identification of open reading frames, gene prediction, homology searching, sequence alignment, and motif and epitope recognition, several public database systems designed specifically for HBV and HCV research have now been developed. The focus of these databases ranged from being viral sequence repositories for the provision of bioinformatics tools for viral genome analysis, as well as HBV or HCV drug resistance prediction. This review provides an overview of these public databases, which have integrated bioinformatics tools for HBV and HCV research. Properly managed and developed, these databases have the potential to have a broad effect on hepatitis research and treatment strategies. However, the effect will depend on the comprehensive collection of not only molecular sequence data, but also anonymous patient clinical and treatment data.
Maudsley, Stuart; Chadwick, Wayne; Wang, Liyun; Zhou, Yu; Martin, Bronwen; Park, Sung-Soo
The growth and development in the last decade of accurate and reliable mass data collection techniques has greatly enhanced our comprehension of cell signaling networks and pathways. At the same time however, these technological advances have also increased the difficulty of satisfactorily analyzing and interpreting these ever-expanding datasets. At the present time, multiple diverse scientific communities including molecular biological, genetic, proteomic, bioinformatic, and cell biological, are converging upon a common endpoint, that is, the measurement, interpretation, and potential prediction of signal transduction cascade activity from mass datasets. Our ever increasing appreciation of the complexity of cellular or receptor signaling output and the structural coordination of intracellular signaling cascades has to some extent necessitated the generation of a new branch of informatics that more closely associates functional signaling effects to biological actions and even whole-animal phenotypes. The ability to untangle and hopefully generate theoretical models of signal transduction information flow from transmembrane receptor systems to physiological and pharmacological actions may be one of the greatest advances in cell signaling science. In this overview, we shall attempt to assist the navigation into this new field of cell signaling and highlight several methodologies and technologies to appreciate this exciting new age of signal transduction. PMID:21870222
Maudsley, Stuart; Chadwick, Wayne; Wang, Liyun; Zhou, Yu; Martin, Bronwen; Park, Sung-Soo
The growth and development in the last decade of accurate and reliable mass data collection techniques has greatly enhanced our comprehension of cell signaling networks and pathways. At the same time however, these technological advances have also increased the difficulty of satisfactorily analyzing and interpreting these ever-expanding datasets. At the present time, multiple diverse scientific communities including molecular biological, genetic, proteomic, bioinformatic, and cell biological, are converging upon a common endpoint, that is, the measurement, interpretation, and potential prediction of signal transduction cascade activity from mass datasets. Our ever increasing appreciation of the complexity of cellular or receptor signaling output and the structural coordination of intracellular signaling cascades has to some extent necessitated the generation of a new branch of informatics that more closely associates functional signaling effects to biological actions and even whole-animal phenotypes. The ability to untangle and hopefully generate theoretical models of signal transduction information flow from transmembrane receptor systems to physiological and pharmacological actions may be one of the greatest advances in cell signaling science. In this overview, we shall attempt to assist the navigation into this new field of cell signaling and highlight several methodologies and technologies to appreciate this exciting new age of signal transduction.
Kawakami, Akinori; Fisher, David E.
Bioinformatic analysis of genome-wide gene expression allows us to characterize cells, including melanomas. Gene expression profiles have been generated in various stages of melanomas and analyzed by researchers in unique ways. Lauss et al. compared their melanoma subtypes with those of The Cancer Genome Atlas Network and found consistency between the two studies. PMID:27884291
Calabrese, Barbara; Cannataro, Mario
High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.
Lue, Jaw-Chyng (Inventor); Fang, Wai-Chi (Inventor)
A system with applications in pattern recognition, or classification, of DNA assay samples. Because DNA reference and sample material in wells of an assay may be caused to fluoresce depending upon dye added to the material, the resulting light may be imaged onto an embodiment comprising an array of photodetectors and an adaptive neural network, with applications to DNA analysis. Other embodiments are described and claimed.
Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat
In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.
Ma, Guoda; Wang, Haiyang; Li, You; Cui, Lili; Cui, Yudong; Li, Qingzhang; Li, Keshen; Zhao, Bin
The cardiac ankyrin repeat protein (CARP) is a multifunctional protein that is expressed specifically in mammalian cardiac muscle and plays important roles in stress responses, transcriptional regulation, myofibrillar assembly, and the development of cardiac and skeletal muscle. In this study, the sheep homolog of the CARP gene was cloned and characterized. The coding region of the gene consists of 960 bp and encodes 319 amino acids with molecular weight 36.2 KD. Bioinformatics analysis demonstrated that the 3' untranslated region (3'-UTR) of the gene contains many AU-rich elements that are associated with mRNA stability and a potential regulatory site for miRNA binding. The protein was predicted to contain 14 potential phosphorylation sites and an O-GlcNAc glycosylation site and to be expressed in both the nucleus and cytoplasm. The evolutionary analysis revealed that the sheep CARP exhibited a high level of homology with the mammalian counterparts; however, the protein exhibited an increased evolutionary distance from the chicken, frog, and fish homologs. RT-PCR revealed that in addition to its high mRNA expression level in cardiac muscle, trace amounts of the sheep CARP mRNA were expressed in the skeletal muscle, stomach, and small intestine. However, western blot analysis demonstrated that the CARP protein was expressed only in cardiac muscle. The coding sequence was cloned into the pET30a-TEV-LIC vector, and the soluble CARP-MBP (maltose-binding protein) fusion protein was expressed in a prokaryotic host and purified by affinity chromatography. Our data provide the basis for future studies of the structure and function of sheep CARP.
Gap junction beta 2 (GJB2) gene is the most commonly mutated connexin gene in patients with autosomal recessive and dominant hearing loss. According to Ensembl (release 74) database, 1347 sequence variations are reported in the GJB2 gene and about 13.5% of them are categorized as missense SNPs or nonsynonymous variant. Because of the high incidence of GJB2 mutations in hearing loss patients, revealing the molecular effect of GJB2 mutations on protein structure may also provide clear point of view regarding the molecular etiology of deafness. Hence, the aim of this study is to analyze structural and functional consequences of all known GJB2 missense variations to the Cx26 protein by applying multiple bioinformatics methods. Two-hundred and eleven nonsynonymous variants were collected from Ensembl release 74, Leiden Open Variation Database (LOVD) and The Human Gene Mutation Database (HGMD). A number of bioinformatic tools were utilized for predicting the effect of GJB2 missense mutations at the sequence, structural, and functional levels. Some of the mutations were found to locate highly conserved regions and have structural and functional properties. Moreover, GJB2 mutations were also found to affect Cx26 protein at the molecular level via loss or gain of disorder, catalytic site, and post-translational modifications, including methylation, glycosylation, and ubiquitination. Findings, presented here, demonstrated the application of bioinformatic algorithms to predict the effects of mutations causing hearing impairment. I expect, this type of analysis will serve as a start point for future experimental evaluation of the GJB2 gene mutations and it will also be helpful in evaluating other deafness-related gene mutations.
Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N.
The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at . PMID:16845021
Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N
The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.
Yalcin, Dicle; Hakguder, Zeynep M; Otu, Hasan H
Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging and omics techniques on single cells. There have been improvements in computational single-cell image analysis in developmental biology regarding feature extraction, segmentation, image enhancement and machine learning, handling limitations of optical resolution to gain new perspectives from the raw microscopy images. Omics approaches, such as transcriptomics, genomics and epigenomics, targeting gene and small RNA expression, single nucleotide and structural variations and methylation and histone modifications, rely heavily on high-throughput sequencing technologies. Although there are well-established bioinformatics methods for analysis of sequence data, there are limited bioinformatics approaches which address experimental design, sample size considerations, amplification bias, normalization, differential expression, coverage, clustering and classification issues, specifically applied at the single-cell level. In this review, we summarize biological and technological advancements, discuss challenges faced in the aforementioned data acquisition and analysis issues and present future prospects for application of single-cell analyses to developmental biology.
Aiamkitsumrit, Benjamas; Dampier, Will; Antell, Gregory; Rivera, Nina; Martin-Garcia, Julio; Pirrone, Vanessa; Nonnemacher, Michael R.; Wigdahl, Brian
The evolution of human immunodeficiency virus type 1 (HIV-1) with respect to co-receptor utilization has been shown to be relevant to HIV-1 pathogenesis and disease. The CCR5-utilizing (R5) virus has been shown to be important in the very early stages of transmission and highly prevalent during asymptomatic infection and chronic disease. In addition, the R5 virus has been proposed to be involved in neuroinvasion and central nervous system (CNS) disease. In contrast, the CXCR4-utilizing (X4) virus is more prevalent during the course of disease progression and concurrent with the loss of CD4+ T cells. The dual-tropic virus is able to utilize both co-receptors (CXCR4 and CCR5) and has been thought to represent an intermediate transitional virus that possesses properties of both X4 and R5 viruses that can be encountered at many stages of disease. The use of computational tools and bioinformatic approaches in the prediction of HIV-1 co-receptor usage has been growing in importance with respect to understanding HIV-1 pathogenesis and disease, developing diagnostic tools, and improving the efficacy of therapeutic strategies focused on blocking viral entry. Current strategies have enhanced the sensitivity, specificity, and reproducibility relative to the prediction of co-receptor use; however, these technologies need to be improved with respect to their efficient and accurate use across the HIV-1 subtypes. The most effective approach may center on the combined use of different algorithms involving sequences within and outside of the env-V3 loop. This review focuses on the HIV-1 entry process and on co-receptor utilization, including bioinformatic tools utilized in the prediction of co-receptor usage. It also provides novel preliminary analyses for enabling identification of linkages between amino acids in V3 with other components of the HIV-1 genome and demonstrates that these linkages are different between X4 and R5 viruses. PMID:24862329
Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen or co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance
Wattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J.C.; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.
The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein–protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue. PMID:24225323
Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang
Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.
Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas
Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/. PMID:26882475
Römer, Michael; Eichner, Johannes; Dräger, Andreas; Wrzodek, Clemens; Wrzodek, Finja; Zell, Andreas
Bioinformatics analysis has become an integral part of research in biology. However, installation and use of scientific software can be difficult and often requires technical expert knowledge. Reasons are dependencies on certain operating systems or required third-party libraries, missing graphical user interfaces and documentation, or nonstandard input and output formats. In order to make bioinformatics software easily accessible to researchers, we here present a web-based platform. The Center for Bioinformatics Tuebingen (ZBIT) Bioinformatics Toolbox provides web-based access to a collection of bioinformatics tools developed for systems biology, protein sequence annotation, and expression data analysis. Currently, the collection encompasses software for conversion and processing of community standards SBML and BioPAX, transcription factor analysis, and analysis of microarray data from transcriptomics and proteomics studies. All tools are hosted on a customized Galaxy instance and run on a dedicated computation cluster. Users only need a web browser and an active internet connection in order to benefit from this service. The web platform is designed to facilitate the usage of the bioinformatics tools for researchers without advanced technical background. Users can combine tools for complex analyses or use predefined, customizable workflows. All results are stored persistently and reproducible. For each tool, we provide documentation, tutorials, and example data to maximize usability. The ZBIT Bioinformatics Toolbox is freely available at https://webservices.cs.uni-tuebingen.de/.
Liu, Xiao; Wolfe, Richard; Welch, Lonnie R.; Domozych, David S.; Popper, Zoë A.; Showalter, Allan M.
Extensins (EXTs) are a family of plant cell wall hydroxyproline-rich glycoproteins (HRGPs) that are implicated to play important roles in plant growth, development, and defense. Structurally, EXTs are characterized by the repeated occurrence of serine (Ser) followed by three to five prolines (Pro) residues, which are hydroxylated as hydroxyproline (Hyp) and glycosylated. Some EXTs have Tyrosine (Tyr)-X-Tyr (where X can be any amino acid) motifs that are responsible for intramolecular or intermolecular cross-linkings. EXTs can be divided into several classes: classical EXTs, short EXTs, leucine-rich repeat extensins (LRXs), proline-rich extensin-like receptor kinases (PERKs), formin-homolog EXTs (FH EXTs), chimeric EXTs, and long chimeric EXTs. To guide future research on the EXTs and understand evolutionary history of EXTs in the plant kingdom, a bioinformatics study was conducted to identify and classify EXTs from 16 fully sequenced plant genomes, including Ostreococcus lucimarinus, Chlamydomonas reinhardtii, Volvox carteri, Klebsormidium flaccidum, Physcomitrella patens, Selaginella moellendorffii, Pinus taeda, Picea abies, Brachypodium distachyon, Zea mays, Oryza sativa, Glycine max, Medicago truncatula, Brassica rapa, Solanum lycopersicum, and Solanum tuberosum, to supplement data previously obtained from Arabidopsis thaliana and Populus trichocarpa. A total of 758 EXTs were newly identified, including 87 classical EXTs, 97 short EXTs, 61 LRXs, 75 PERKs, 54 FH EXTs, 38 long chimeric EXTs, and 346 other chimeric EXTs. Several notable findings were made: (1) classical EXTs were likely derived after the terrestrialization of plants; (2) LRXs, PERKs, and FHs were derived earlier than classical EXTs; (3) monocots have few classical EXTs; (4) Eudicots have the greatest number of classical EXTs and Tyr-X-Tyr cross-linking motifs are predominantly in classical EXTs; (5) green algae have no classical EXTs but have a number of long chimeric EXTs that are absent in
Fu, Wenjiang J; Stromberg, Arnold J; Viele, Kert; Carroll, Raymond J; Wu, Guoyao
Over the past 2 decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (Type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine growth retardation).
Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao
Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
Smith, David Roy
Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics.
Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.
Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.
Thiel, William H
Development of RNA and DNA aptamers for diagnostic and therapeutic applications is a rapidly growing field. Aptamers are identified through iterative rounds of selection in a process termed SELEX (Systematic Evolution of Ligands by EXponential enrichment). High-throughput sequencing (HTS) revolutionized the modern SELEX process by identifying millions of aptamer sequences across multiple rounds of aptamer selection. However, these vast aptamer HTS datasets necessitated bioinformatics techniques. Herein, we describe a semiautomated approach to analyze aptamer HTS datasets using the Galaxy Project, a web-based open source collection of bioinformatics tools that were originally developed to analyze genome, exome, and transcriptome HTS data. Using a series of Workflows created in the Galaxy webserver, we demonstrate efficient processing of aptamer HTS data and compilation of a database of unique aptamer sequences. Additional Workflows were created to characterize the abundance and persistence of aptamer sequences within a selection and to filter sequences based on these parameters. A key advantage of this approach is that the online nature of the Galaxy webserver and its graphical interface allow for the analysis of HTS data without the need to compile code or install multiple programs. PMID:28131286
Carvalho, Benilton S.; Rustici, Gabriella
High-throughput technologies are widely used in the field of functional genomics and used in an increasing number of applications. For many ‘wet lab’ scientists, the analysis of the large amount of data generated by such technologies is a major bottleneck that can only be overcome through very specialized training in advanced data analysis methodologies and the use of dedicated bioinformatics software tools. In this article, we wish to discuss the challenges related to delivering training in the analysis of high-throughput sequencing data and how we addressed these challenges in the hands-on training courses that we have developed at the European Bioinformatics Institute. PMID:23543353
Qi, Lin-jie; Yuan, Yuan; Wu, Chong; Huang, Lu-qi; Chen, Ping
The DNA demethylase genes are widespread in plants. Four DNA demethylase genes (LJDME1, LJDME2, LJDME3 and LJDME4) were obtained from transcriptome dataset of Lonicera japonica Thunb by using bioinformatics methods and the proteins' physicochemical properties they encoded were predicted. The phylogenetic tree showed that the four DNA demethylase genes and Arabidopsis thaliana DME had a close relationship. The result of gene expression model showed that four DNA demethylase genes were different between species. The expression levels of LJDME1 and LJDME2 were even more higher in Lonicera japonica var. chinensis than those in L. japonica. LJDME] and LJDME2 maybe regulate the active compounds of L. japonica. This study aims to lay a foundation for further understanding of the function of DNA demethylase genes in L. japonica.
Tsai, Pei-Lun; Chen, Sung-Fang
The purpose of this review is to provide updated information regarding bioinformatic software for the use in the characterization of glycosylated structures since 2013. A comprehensive review by Woodin et al. Analyst 138: 2793–2803, 2013 (ref. 1) described two main approaches that are introduced for starting researchers in this area; analysis of released glycans and the identification of glycopeptide in enzymatic digests, respectively. Complementary to that report, this review focuses on mass spectrometry related bioinformatics tools for the characterization of N-linked and O-linked glycopeptides. Specifically, it also provides information regarding automated tools that can be used for glycan profiling using mass spectrometry. PMID:28337402
Malin, Bradley; Carley, Kathleen
Objective The goal of this research is to learn how the editorial staffs of bioinformatics and medical informatics journals provide support for cross-community exposure. Models such as co-citation and co-author analysis measure the relationships between researchers; but they do not capture how environments that support knowledge transfer across communities are organized. Methods In this paper, we propose a social network analysis model to study how editorial boards integrate researchers from disparate communities. We evaluate our model by building relational networks based on the editorial boards of approximately 40 journals that serve as research outlets in medical informatics and bioinformatics. We track the evolution of editorial relationships through a longitudinal investigation over the years 2000 through 2005. Results Our findings suggest that there are research journals that support the collocation of editorial board members from the bioinformatics and medical informatics communities. Network centrality metrics indicate that editorial board members are located in the intersection of the communities and that the number of individuals in the intersection is growing with time. Conclusions Social network analysis methods provide insight into the relationships between the medical informatics and bioinformatics communities. The number of editorial board members facilitating the publication intersection of the communities has grown, but the intersection remains dependent on a small group of individuals and fragile. PMID:17329730
Shao, Jia; Yu, Miao; Jiang, Liang; Wu, Fengliang; Liu, Xiaoguang
The purpose of this study was to detect the differentially expressed genes between ossified herniated discs and herniated discs without ossification. In addition, we sought to identify a few candidate genes and pathways by using bioinformatics analysis. We analyzed 6 samples each of ossified herniated discs (experimental group) and herniated discs without ossification (control group). Purified mRNA and cDNA extracted from the samples were subjected to sequencing. The NOISeq method was used to statistically identify the differentially expressed genes (DEGs) between the 2 groups. An in-depth analysis using bioinformatics tools based on the DEGs was performed using Gene Ontology (GO) enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, and protein-protein interaction network analysis. The top 6 DEGs were verified using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A total of 132 DEGs was detected. A total of 129 genes in the ossified group were upregulated and 3 genes were found to be downregulated as compared to the control group. The top 3 cellular components in GO ontologies analysis were extracellular matrix components. GO functions were mainly related to the glycoprotein in the cell membrane and extracellular matrix. The GO process was related to completing response to stimulus, immune reflex and defense. The top 5 KEGG enrichment pathways were associated with infection and inflammation. Three of the top 20 DEGs [sclerostin (SOST), WNT inhibitory factor 1 (WIF1) and secreted frizzled related protein 4 (SFRP4)] were related to the inhibition of the Wnt pathway. The ossified discs exhibited a higher expression of the top 6 DEGs [SOST, joining chain of multimeric IgA and IgM (IGJ; also known as JCHAIN), defensin alpha 4 (DEFA4), SFRP4, proteinase 3 (PRTN3) and cathepsin G (CTSG)], with the associated P-values of 0.045, 0.000, 0.008, 0.010, 0.015 and 0.002, respectively, as calculated by the independent sample t
Alva, Vikram; Nam, Seung-Zin; Söding, Johannes; Lupas, Andrei N.
The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts and non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) and internally (e.g. HHpred, HHblits, PCOILS). While a beta version of the Toolkit was released 10 years ago, the current production-level release has been available since 2008 and has serviced more than 1.6 million external user queries. The usage of the Toolkit has continued to increase linearly over the years, reaching more than 400 000 queries in 2015. In fact, through the breadth of its tools and their tight interconnection, the Toolkit has become an excellent platform for experimental scientists as well as a useful resource for teaching bioinformatic inquiry to students in the life sciences. In this article, we report on the evolution of the Toolkit over the last ten years, focusing on the expansion of the tool repertoire (e.g. CS-BLAST, HHblits) and on infrastructural work needed to remain operative in a changing web environment. PMID:27131380
Zhong, Lijun; Zhou, Juntuo; Wang, Dawei; Zou, Xiajuan; Lou, Yaxin; Liu, Dan; Yang, Bin; Zhu, Yi; Li, Xiaoxia
The aim of this study was to identify differently expressed proteins in the presence and absence of EPHX2 gene in mouse hypothalamus using proteomics profiling and bioinformatics analysis. This study was performed on 3 wild type (WT) and 3 EPHX2 gene global knockout (KO) mice (EPHX2 -/-). Using the nano- electrospray ionization (ESI)-LC-MS/MS detector, we identified 31 over-expressed proteins in WT mouse hypothalamus compared to the KO counterparts. Gene Ontology (GO) annotation in terms of the protein-protein interaction network indicated that cellular metabolic process, protein metabolic process, signaling transduction and protein post-translation biological processes involved in EPHX2 -/- regulatory network. In addition, signaling pathway enrichment analysis also highlighted chronic neurodegenerative diseases and some other signaling pathways, such as TGF-beta signaling pathway, T cell receptor signaling pathway, ErbB signaling pathway, Neurotrophin signaling pathway and MAPK signaling pathway, were strongly coupled with EPHX2 gene knockout. Further studies into the molecular functions of EPHX2 gene in hypothalamus will help to provide new perspective in neurogenesis. PMID:26722453
Xiang, Fang; Ningqiu, Li; Xiaozhe, Fu; Kaibin, Li; Qiang, Lin; Lihui, Liu; Cunbin, Shi; Shuqin, Wu
As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.
Hu, Qi; Wu, Xueling; Jiang, Ying; Liu, Yuandong; Liang, Yili; Liu, Xueduan; Yin, Huaqun; Baba, Ngom
Copper resistance of acidophilic bacteria is very significant in bioleaching of copper ore since high concentration of copper are harmful to the growth of organisms. Copper resistance gene afe_1073 was putatively considered to be involved in copper homeostasis in Acidithiobacillus ferrooxidans ATCC23270. In the present study, differential expression of afe_1073 in A. ferrooxidans strain DY26 and DC was assessed with quantitative reverse transcription polymerase chain reaction. The results showed the expression of afe_1073 in two strains increased with the increment of copper concentrations. The expression of DY26 was lower than that of DC at the same copper concentration although A. ferrooxidans strain DY26 possessed higher copper resistance than strain DC. In addition, bioinformatics analysis showed AFE_1073 was a typical transmembrane protein P1b1-ATPase, which could reduce the harm of Cu(+) by pumping it out from the cell. There were two mutation sites in AFE_1073 between DY26 and DC and one may change the hydrophobicity of AFE_1073, which could enhance the ability of DY26 to pump out Cu(+). Therefore, DY26 needed less gene expression of afe_1073 for resisting copper toxicity than that of DC at the same copper stress. Our study will be beneficial to understanding the copper resistance mechanism of A. ferrooxidans.
Aim of the study To analyse the expression profile of hepatocellular carcinoma compared with normal liver by using bioinformatics methods. Material and methods In this study, we analysed the microarray expression data of HCC and adjacent normal liver samples from the Gene Expression Omnibus (GEO) database to screen for differentially expressed genes. Then, functional analyses were performed using GenCLiP analysis, Gene Ontology categories, and aberrant pathway identification. In addition, we used the CMap database to identify small molecules that can induce HCC. Results Overall, 2721 differentially expressed genes (DEGs) were identified. We found 180 metastasis-related genes and constructed co-occurrence networks. Several significant pathways, including the transforming growth factor β (TGF-β) signalling pathway, were identified as closely related to these DEGs. Some candidate small molecules (such as betahistine) were identified that might provide a basis for developing HCC treatments in the future. Conclusions Although we functionally analysed the differences in the gene expression profiles of HCC and normal liver tissues, our study is essentially preliminary, and it may be premature to apply our results to clinical trials. Further research and experimental testing are required in future studies. PMID:27095935
Faksri, Kiatichai; Tan, Jun Hao; Chaiprasert, Angkana; Teo, Yik-Ying; Ong, Rick Twee-Hee
Tuberculosis (TB) is an infectious disease of global public health importance caused by Mycobacterium tuberculosis complex (MTC) in which M. tuberculosis (Mtb) is the major causative agent. Recent advancements in genomic technologies such as next generation sequencing have enabled high throughput cost-effective generation of whole genome sequence information from Mtb clinical isolates, providing new insights into the evolution, genomic diversity and transmission of the Mtb bacteria, including molecular mechanisms of antibiotic resistance. The large volume of sequencing data generated however necessitated effective and efficient management, storage, analysis and visualization of the data and results through development of novel and customized bioinformatics software tools and databases. In this review, we aim to provide a comprehensive survey of the current freely available bioinformatics software tools and publicly accessible databases for genomic analysis of Mtb for identifying disease transmission in molecular epidemiology and in rapid determination of the antibiotic profiles of clinical isolates for prompt and optimal patient treatment.
Background Advances in sequencing efficiency have vastly increased the sizes of biological sequence databases, including many thousands of genome-sequenced species. The BLAST algorithm remains the main search engine for retrieving sequence information, and must consequently handle data on an unprecedented scale. This has been possible due to high-performance computers and parallel processing. However, the raw BLAST output from contemporary searches involving thousands of queries becomes ill-suited for direct human processing. Few programs attempt to directly visualize and interpret BLAST output; those that do often provide a mere basic structuring of BLAST data. Results Here we present a bioinformatics application named BLASTGrabber suitable for high-throughput sequencing analysis. BLASTGrabber, being implemented as a Java application, is OS-independent and includes a user friendly graphical user interface. Text or XML-formatted BLAST output files can be directly imported, displayed and categorized based on BLAST statistics. Query names and FASTA headers can be analysed by text-mining. In addition to visualizing sequence alignments, BLAST data can be ordered as an interactive taxonomy tree. All modes of analysis support selection, export and storage of data. A Java interface-based plugin structure facilitates the addition of customized third party functionality. Conclusion The BLASTGrabber application introduces new ways of visualizing and analysing massive BLAST output data by integrating taxonomy identification, text mining capabilities and generic multi-dimensional rendering of BLAST hits. The program aims at a non-expert audience in terms of computer skills; the combination of new functionalities makes the program flexible and useful for a broad range of operations. PMID:24885091
Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.
In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.
YAN, BINGBING; YIN, FUQIANG; WANG, QI; ZHANG, WEI; LI, LI
The main obstacle to the successful treatment of ovarian cancer is the development of drug resistance to combined chemotherapy. Among all the factors associated with drug resistance, DNA methylation apparently plays a critical role. In this study, we performed an integrative analysis of the 26 DNA-methylated genes associated with drug resistance in ovarian cancer, and the genes were further evaluated by comprehensive bioinformatics analysis including gene/protein interaction, biological process enrichment and annotation. The results from the protein interaction analyses revealed that at least 20 of these 26 methylated genes are present in the protein interaction network, indicating that they interact with each other, have a correlation in function, and may participate as a whole in the regulation of ovarian cancer drug resistance. There is a direct interaction between the phosphatase and tensin homolog (PTEN) gene and at least half of the other genes, indicating that PTEN may possess core regulatory functions among these genes. Biological process enrichment and annotation demonstrated that most of these methylated genes were significantly associated with apoptosis, which is possibly an essential way for these genes to be involved in the regulation of multidrug resistance in ovarian cancer. In addition, a comprehensive analysis of clinical factors revealed that the methylation level of genes that are associated with the regulation of drug resistance in ovarian cancer was significantly correlated with the prognosis of ovarian cancer. Overall, this study preliminarily explains the potential correlation between the genes with DNA methylation and drug resistance in ovarian cancer. This finding has significance for our understanding of the regulation of resistant ovarian cancer by methylated genes, the treatment of ovarian cancer, and improvement of the prognosis of ovarian cancer. PMID:27347118
Kang, Jonghoon; Park, Seyeon; Venkat, Aarya; Gopinath, Adarsh
New interdisciplinary biological sciences like bioinformatics, biophysics, and systems biology have become increasingly relevant in modern science. Many papers have suggested the importance of adding these subjects, particularly bioinformatics, to an undergraduate curriculum; however, most of their assertions have relied on qualitative arguments. In this paper, we will show our metadata analysis of a scientific literature database (PubMed) that quantitatively describes the importance of the subjects of bioinformatics, systems biology, and biophysics as compared with a well-established interdisciplinary subject, biochemistry. Specifically, we found that the development of each subject assessed by its publication volume was well described by a set of simple nonlinear equations, allowing us to characterize them quantitatively. Bioinformatics, which had the highest ratio of publications produced, was predicted to grow between 77% and 93% by 2025 according to the model. Due to the large number of publications produced in bioinformatics, which nearly matches the number published in biochemistry, it can be inferred that bioinformatics is almost equal in significance to biochemistry. Based on our analysis, we suggest that bioinformatics be added to the standard biology undergraduate curriculum. Adding this course to an undergraduate curriculum will better prepare students for future research in biology.
Kou, Yubin; Zhang, Suya; Chen, Xiaoping; Hu, Sanyuan
This study aimed to explore the underlying molecular mechanisms of colorectal cancer (CRC) using bioinformatics analysis. Using GSE4107 datasets downloaded from the Gene Expression Omnibus, the differentially expressed genes (DEGs) were screened by comparing the RNA expression from the colonic mucosa between 12 CRC patients and ten healthy controls using a paired t-test. The Gene Ontology (GO) functional and pathway enrichment analyses of DEGs were performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) software followed by the construction of a protein–protein interaction (PPI) network. In addition, hub gene identification and GO functional and pathway enrichment analyses of the modules were performed. A total of 612 up- and 639 downregulated genes were identified. The upregulated DEGs were mainly involved in the regulation of cell growth, migration, and the MAPK signaling pathway. The downregulated DEGs were significantly associated with oxidative phosphorylation, Alzheimer’s disease, and Parkinson’s disease. Moreover, FOS, FN1, PPP1CC, and CYP2B6 were selected as hub genes in the PPI networks. Two modules (up-A and up-B) in the upregulated PPI network and three modules (d-A, d-B, and d-C) in the downregulated PPI were identified with the threshold of Molecular Complex Detection (MCODE) Molecular Complex Detection (MCODE) score ≥4 and nodes ≥6. The genes in module up-A were significantly enriched in neuroactive ligand–receptor interactions and the calcium signaling pathway. The genes in module d-A were enriched in four pathways, including oxidative phosphorylation and Parkinson’s disease. DEGs, such as FOS, FN1, PPP1CC, and CYP2B6, may be used as potential targets for CRC diagnosis and treatment. PMID:25914544
Tian, Fengde; An, Ning; Yang, Tiejun; Wang, Changcheng; Wang, Bo; Zhou, Zihao
Background Rheumatoid arthritis (RA) is a chronic auto-inflammatory disorder of joints. The present study aimed to identify the key genes in RA for better understanding the underlying mechanisms of RA. Methods The integrated analysis of expression profiling was conducted to identify differentially expressed genes (DEGs) in RA. Moreover, functional annotation, protein–protein interaction (PPI) network and transcription factor (TF) regulatory network construction were applied for exploring the potential biological roles of DEGs in RA. In addition, the expression level of identified candidate DEGs was preliminarily detected in peripheral blood cells of RA patients in the GSE17755 dataset. Quantitative real-time polymerase chain reaction (qRT-PCR) was conducted to validate the expression levels of identified DEGs in RA. Results A total of 378 DEGs, including 202 up- and 176 down-regulated genes, were identified in synovial tissues of RA patients compared with healthy controls. DEGs were significantly enriched in axon guidance, RNA transport and MAPK signaling pathway. RBFOX2, LCK and SERBP1 were the hub proteins in the PPI network. In the TF-target gene network, RBFOX2, POU6F1, WIPF1 and PFKFB3 had the high connectivity with TFs. The expression status of 11 candidate DEGs was detected in GSE17755, the expression levels of MAT2A and NSA2 were significantly down-regulated and CD47 had the up-regulated tendency in peripheral blood cells of patients with RA compared with healthy individuals. qRT-PCR results of MAT2A, NSA2, CD47 were compatible with our bioinformatics analyses. Discussion Our study might provide valuable information for exploring the pathogenesis mechanism of RA and identifying the potential biomarkers for RA diagnosis. PMID:28316886
Santos, Eliane Macedo Sobrinho; Santos, Hércules Otacílio; dos Santos Dias, Ivoneth; Santos, Sérgio Henrique; Batista de Paula, Alfredo Maurício; Feltenberger, John David; Sena Guimarães, André Luiz; Farias, Lucyana Conceição
Pathogenesis of odontogenic tumors is not well known. It is important to identify genetic deregulations and molecular alterations. This study aimed to investigate, through bioinformatic analysis, the possible genes involved in the pathogenesis of ameloblastoma (AM) and keratocystic odontogenic tumor (KCOT). Genes involved in the pathogenesis of AM and KCOT were identified in GeneCards. Gene list was expanded, and the gene interactions network was mapped using the STRING software. “Weighted number of links” (WNL) was calculated to identify “leader genes” (highest WNL). Genes were ranked by K-means method and Kruskal-Wallis test was used (P<0.001). Total interactions score (TIS) was also calculated using all interaction data generated by the STRING database, in order to achieve global connectivity for each gene. The topological and ontological analyses were performed using Cytoscape software and BinGO plugin. Literature review data was used to corroborate the bioinformatics data. CDK1 was identified as leader gene for AM. In KCOT group, results show PCNA and TP53. Both tumors exhibit a power law behavior. Our topological analysis suggested leader genes possibly important in the pathogenesis of AM and KCOT, by clustering coefficient calculated for both odontogenic tumors (0.028 for AM, zero for KCOT). The results obtained in the scatter diagram suggest an important relationship of these genes with the molecular processes involved in AM and KCOT. Ontological analysis for both AM and KCOT demonstrated different mechanisms. Bioinformatics analyzes were confirmed through literature review. These results may suggest the involvement of promising genes for a better understanding of the pathogenesis of AM and KCOT. PMID:28357197
Long, Hao; Liang, Chaofeng; Zhang, Xi'an; Fang, Luxiong; Wang, Gang; Qi, Songtao
Understanding the mechanisms of glioblastoma at the molecular and structural level is not only interesting for basic science but also valuable for biotechnological application, such as the clinical treatment. In the present study, bioinformatics analysis was performed to reveal and identify the key genes of glioblastoma multiforme (GBM). The results obtained in the present study signified the importance of some genes, such as COL3A1, FN1, and MMP9, for glioblastoma. Based on the selected genes, a prediction model was built, which achieved 94.4% prediction accuracy. These findings might provide more insights into the genetic basis of glioblastoma. PMID:28191466
Shachak, Aviv; Ophir, Ron; Rubin, Eitan
The need to support bioinformatics training has been widely recognized by scientists, industry, and government institutions. However, the discussion of instructional methods for teaching bioinformatics is only beginning. Here we report on a systematic attempt to design two bioinformatics workshops for graduate biology students on the basis of…
Jiménez-Barrón, Laura T.; O'Rawe, Jason A.; Wu, Yiyang; Yoon, Margaret; Fang, Han; Iossifov, Ivan; Lyon, Gholson J.
Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations. PMID:27148569
Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.
Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094
Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu
High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients.
Lü, Bing-Jian; Cui, Jing; Xu, Jing; Zhang, Hao; Luo, Min-Jie; Zhu, Yi-Min; Lai, Mao-De
We established a colonic adenoma-normal mucosa suppressive subtraction hybridization (SSH) library in 1999. In this study, we wanted to explore the expression profile of all candidate genes in this library. We developed an EST pipeline which contained two in-house software packages, nucleic acid analytical software and GetUni. The nucleic acid analytical software, an integrator of the universal bioinformatics tools including phred, phd2fasta, cross_match, repeatmasker and blast2.0, can blast sequences of differential clones with the downloaded non-redundant nucleotide (NR) database. GetUni can cluster these NR sequences into Unigene via matching with the downloaded Homo Sapiens UniGene database. Sixty-two candidate genes in A-N library were obtained via the high throughput automatic gene expression bioinformatics pipeline. Gene Ontology online analysis revealed that ribosome genes and immunity-regulating genes were the two most common categories in the KEGG or Biocarta Pathway. We also detected the expression of 2 genes with highest hits, Reg4 and FAM46A, by semi-quantitative RT-PCR. Both genes were up-regulated in 10 or 9 out of 10 adenomas in comparison with the paired normal mucosa, respectively. The candidate genes in A-N library would be of great significance in disclosing the molecular mechanism underlying in colonic adenoma initiation and progression.
Depiereux, Sophie; De Meulder, Bertrand; Bareke, Eric; Berger, Fabrice; Le Gac, Florence; Depiereux, Eric; Kestemont, Patrick
Sex steroids play a key role in triggering sex differentiation in fish, the use of exogenous hormone treatment leading to partial or complete sex reversal. This phenomenon has attracted attention since the discovery that even low environmental doses of exogenous steroids can adversely affect gonad morphology (ovotestis development) and induce reproductive failure. Modern genomic-based technologies have enhanced opportunities to find out mechanisms of actions (MOA) and identify biomarkers related to the toxic action of a compound. However, high throughput data interpretation relies on statistical analysis, species genomic resources, and bioinformatics tools. The goals of this study are to improve the knowledge of feminisation in fish, by the analysis of molecular responses in the gonads of rainbow trout fry after chronic exposure to several doses (0.01, 0.1, 1 and 10 μg/L) of ethynylestradiol (EE2) and to offer target genes as potential biomarkers of ovotestis development. We successfully adapted a bioinformatics microarray analysis workflow elaborated on human data to a toxicogenomic study using rainbow trout, a fish species lacking accurate functional annotation and genomic resources. The workflow allowed to obtain lists of genes supposed to be enriched in true positive differentially expressed genes (DEGs), which were subjected to over-representation analysis methods (ORA). Several pathways and ontologies, mostly related to cell division and metabolism, sexual reproduction and steroid production, were found significantly enriched in our analyses. Moreover, two sets of potential ovotestis biomarkers were selected using several criteria. The first group displayed specific potential biomarkers belonging to pathways/ontologies highlighted in the experiment. Among them, the early ovarian differentiation gene foxl2a was overexpressed. The second group, which was highly sensitive but not specific, included the DEGs presenting the highest fold change and lowest p
Ju, Feng; Zhang, Tong
Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.
Liu, Hong-Bo; Yang, Guang-Fei; Liang, Si-Jia; Lin, Jun
This study bioinformatically analyzed the non-VP1 capsid proteins (VP2-VP4) of Coxasckievirus A6 (CVA6), with an attempt to predict their basic physicochemical properties, structural/functional features and linear B cell eiptopes. The online tools SubLoc, TargetP and the others from ExPASy Bioinformatics Resource Portal, and SWISS-MODEL (an online protein structure modeling server), were utilized to analyze the amino acid (AA) sequences of VP2-VP4 proteins of CVA6. Our results showed that the VP proteins of CVA6 were all of hydrophilic nature, contained phosphorylation and glycosylation sites and harbored no signal peptide sequences and acetylation sites. Except VP3, the other proteins did not have transmembrane helix structure and nuclear localization signal sequences. Random coils were the major conformation of the secondary structure of the capsid proteins. Analysis of the linear B cell epitopes by employing Bepipred showed that the average antigenic indices (AI) of individual VP proteins were all greater than 0 and the average AI of VP4 was substantially higher than that of VP2 and VP3. The VP proteins all contained a number of potential B cell epitopes and some eiptopes were located at the internal side of the viral capsid or were buried. We successfully predicted the fundamental physicochemical properties, structural/functional features and the linear B cell eiptopes and found that different VP proteins share some common features and each has its unique attributes. These findings will help us understand the pathogenicity of CVA6 and develop related vaccines and immunodiagnostic reagents.
Min, Seonwoo; Lee, Byunghan; Yoon, Sungroh
In the era of big data, transformation of biomedical big data into valuable knowledge has been one of the most important challenges in bioinformatics. Deep learning has advanced rapidly since the early 2000s and now demonstrates state-of-the-art performance in various fields. Accordingly, application of deep learning in bioinformatics to gain insight from data has been emphasized in both academia and industry. Here, we review deep learning in bioinformatics, presenting examples of current research. To provide a useful and comprehensive perspective, we categorize research both by the bioinformatics domain (i.e. omics, biomedical imaging, biomedical signal processing) and deep learning architecture (i.e. deep neural networks, convolutional neural networks, recurrent neural networks, emergent architectures) and present brief descriptions of each study. Additionally, we discuss theoretical and practical issues of deep learning in bioinformatics and suggest future research directions. We believe that this review will provide valuable insights and serve as a starting point for researchers to apply deep learning approaches in their bioinformatics studies.
Kong, Fan-Yun; Zhu, Ting; Li, Nan; Cai, Yun-Fei; Zhou, Kai; Wei, Xiao; Kou, Yan-Bo; You, Hong-Juan; Zheng, Kui-Yang; Tang, Ren-Xian
LIM and SH3 domain protein (LASP-1) is responsible for the development of several types of human cancers via the interaction with other proteins; however, the precise biological functions of proteins interacting with LASP-1 are not fully clarified. Although the role of LASP-1 in hepatocarcinogenesis has been reported, the implication of LASP-1 interactors in HBV-related hepatocellular carcinoma (HCC) is not clearly evaluated. We obtained information regarding LASP-1 interactors from public databases and published studies. Via bioinformatics analysis, we found that LASP-1 interactors were related to distinct molecular functions and associated with various biological processes. Through an integrated network analysis of the interaction and pathways of LASP-1 interactors, cross-talk between different proteins and associated pathways was found. In addition, LASP-1 and several its interactors are significantly altered in HBV-related HCC through microarray analysis and could form a complex co-expression network. In the disease, LASP-1 and its interactors were further predicted to be regulated by a complex interaction network composed of different transcription factors. Besides, numerous LASP-1 interactors were associated with various clinical factors and related to the survival and recurrence of HBV-related HCC. Taken together, these results could help enrich our understanding of LASP-1 interactors and their relationships with HBV-related HCC. PMID:28266596
Osunkoya, Adeboye O; Yin-Goen, Qiqin; Phan, John H; Moffitt, Richard A; Stokes, Todd H; Wang, May D; Young, Andrew N
Summary The differential diagnosis of clear cell, papillary and chromophobe renal cell carcinoma is clinically important, because these tumor subtypes are associated with different pathobiology and clinical behavior. For cases in which histopathology is equivocal, immunohistochemistry and quantitative RT-PCR can assist in the differential diagnosis by measuring expression of subtype-specific biomarkers. Several renal tumor biomarkers have been discovered in expression microarray studies. However, due to heterogeneity of gene and protein expression, additional biomarkers are needed for reliable diagnostic classification. We developed novel bioinformatics systems to identify candidate renal tumor biomarkers from the microarray profiles of 45 clear cell, 16 papillary and 10 chromophobe renal cell carcinoma; the microarray data was derived from two independent published studies. The ArrayWiki biocomputing system merged the microarray datasets into a single file, so gene expression could be analyzed from a larger number of tumors. The caCORRECT system removed non-random sources of error from the microarray data, and the omniBioMarker system analyzed data with several gene-ranking algorithms, in order to identify algorithms effective at recognizing previously described renal tumor biomarkers. We predicted these algorithms would also be effective at identifying unknown biomarkers that could be verified by independent methods. We selected six novel candidate biomakers from the omniBioMarker analysis, and verified their differential expression in formalin-fixed paraffin-embedded tissues by quantitative RT-PCR and immunohistochemistry. The candidate biomarkers were carbonic anhydrase IX, ceruloplasmin, schwannomin-interacting protein 1, E74-like factor 3, cytochrome c oxidase subunit 5a and acetyl-CoA acetyltransferase 1. Quantitative RT-PCR was performed on 17 clear cell, 13 papillary and 7 chromophobe renal cell carcinoma. Carbonic anhydrase IX and ceruloplasmin were
Cui, Yubao; Zhou, Ying; Ma, Guifang; Yang, Li; Wang, Yungang; Shi, Weihong
Crude extracts of house dust mites are used clinically for diagnosis and immunotherapy of allergic diseases, including bronchial asthma, perennial rhinitis, and atopic dermatitis. However, crude extracts are complexes with non-allergenic antigens and lack effective concentrations of important allergens, resulting in several side effects. Dermatophagoides farinae (Hughes; Acari: Pyroglyphidae) is one of the predominant sources of dust mite allergens, which has more than 30 groups of allergen. The cDNA coding for the group 5 allergen of D. farinae from China was cloned, sequenced and expressed. According to alignment using the VECTOR NTI 9.0 software, there were eight mismatched nucleotides in five cDNA clones resulting in seven incompatible amino acid residues, suggesting that the Der f 5 allergen might have sequence polymorphism. Bioinformatics analysis revealed that the matured Der f 5 allergen has a molecular mass of 13604.03 Da, a theoretical pI of 5.43 and is probably hydrophobic and cytoplasmic. Similarities in amino acid sequences between Der f 5 and allergens of other domestic mite species, viz. Der p 5, Blo t 5, Sui m 5, and Lep d 5, were 79, 48, 53, and 37%, respectively. Phylogenetic analysis indicated that Der f 5 and Der p 5 clustered together. Blo t 5 and Ale o 5 also clustered together, although Blomia tropicalis and Aleuroglyphus ovatus belong to different mite families, viz. Echimyopodidae and Acaridae, respectively.
Zhang, J; Liu, X; Li, X-J
The phages of Acinetobacter baumannii has drawn increasing attention because of the multi-drug resistance of A. baumanni. The aim of this study was to sequence Acinetobacter baumannii phage AB3 and conduct bioinformatic analysis to lay a foundation for genome remodeling and phage therapy. We isolated and sequenced A. baumannii phage AB3 and attempted to annotate and analyze its genome. The results showed that the genome is a double-stranded DNA with a total length of 31,185 base pairs (bp) and 97 open reading frames greater than 100 bp. The genome includes 28 predicted genes, of which 24 are homologous to phage AB1. The entire coding sequence is located on the negative strand, representing 90.8% of the total length. The G+C mol% was 39.18%, without areas of high G+C content over 200 bp in length. No GC island, tRNA gene, or repeated sequence was identified. Gene lengths were 120-3099 bp, with an average of 1011 bp. Six genes were found to be greater than 2000 bp in length. Genomic alignment and phylogenetic analysis of the RNA polymerase gene showed that similar to phage AB1, phage AB3 is a phiKMV-like virus in the T7 phage family.
Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu
Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.
Moisá, Sonia J; Shike, Daniel W; Graugnard, Daniel E; Rodriguez-Zas, Sandra L; Everts, Robin E; Lewin, Harris A; Faulkner, Dan B; Berger, Larry L; Loor, Juan J
Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth.
Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu
Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM’s diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients’ target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ’s cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the “multi-component, multi-target and multi-pathway” combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM’s molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm. PMID:26879404
Abel, Ana; Sánchez, Sandra; Arenas, Jesús; Criado, María T; Ferreirós, Carlos M
Two-dimensional electrophoresis (isoelectric focusing/SDS-PAGE) and Western-blotting techniques were used to analyze and compare common and/or specific outer-membrane proteins and antigens from Neisseria meningitidis and Neisseria lactamica. Bioinformatic image analyses of proteome and immunoproteome maps indicated the presence of numerous proteins and several antigens shared by N. meningitidis and N. lactamica, although the inter-strain variation in the maps was of similar magnitude to the inter-species variation, and digital comparison of the maps did not reveal proteins found to be identical by MALDI-TOF fingerprinting analysis. PorA and RmpM, two relevant outer-membrane antigens, manifested as various spots at several different positions. While some of these were common to all the strains analyzed, others were exclusive to N. meningitidis and their electrophoretic mobilities were different than expected. One such spot, with a molecular mass of 19 kDa, may be the C-terminal fragment of RmpM (RmpM-Cter). The results demonstrate that computer-driven analysis based exclusively on spot positions in the proteome or immunoproteome maps is not a reliable approach to predict the identity of proteins or antigens; rather, other identification techniques are necessary to obtain accurate comparisons.
Wattam, Alice R.; Davis, James J.; Assaf, Rida; Boisvert, Sébastien; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Dietrich, Emily M.; Disz, Terry; Gabbard, Joseph L.; Gerdes, Svetlana; Henry, Christopher S.; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olsen, Gary J.; Murphy-Olson, Daniel E.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Warren, Andrew; Xia, Fangfang; Yoo, Hyunseung; Stevens, Rick L.
The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics. PMID:27899627
Liu, Zhongyang; Guo, Feifei; Wang, Yong; Li, Chun; Zhang, Xinlei; Li, Honglei; Diao, Lihong; Gu, Jiangyong; Wang, Wei; Li, Dong; He, Fuchu
Traditional Chinese Medicine (TCM), with a history of thousands of years of clinical practice, is gaining more and more attention and application worldwide. And TCM-based new drug development, especially for the treatment of complex diseases is promising. However, owing to the TCM's diverse ingredients and their complex interaction with human body, it is still quite difficult to uncover its molecular mechanism, which greatly hinders the TCM modernization and internationalization. Here we developed the first online Bioinformatics Analysis Tool for Molecular mechANism of TCM (BATMAN-TCM). Its main functions include 1) TCM ingredients' target prediction; 2) functional analyses of targets including biological pathway, Gene Ontology functional term and disease enrichment analyses; 3) the visualization of ingredient-target-pathway/disease association network and KEGG biological pathway with highlighted targets; 4) comparison analysis of multiple TCMs. Finally, we applied BATMAN-TCM to Qishen Yiqi dripping Pill (QSYQ) and combined with subsequent experimental validation to reveal the functions of renin-angiotensin system responsible for QSYQ's cardioprotective effects for the first time. BATMAN-TCM will contribute to the understanding of the "multi-component, multi-target and multi-pathway" combinational therapeutic mechanism of TCM, and provide valuable clues for subsequent experimental validation, accelerating the elucidation of TCM's molecular mechanism. BATMAN-TCM is available at http://bionet.ncpsb.org/batman-tcm.
Spengler, Sylvia J.
There is a well-known story about the blind man examining the elephant: the part of the elephant examined determines his perception of the whole beast. Perhaps bioinformatics--the shotgun marriage between biology and mathematics, computer science, and engineering--is like an elephant that occupies a large chair in the scientific living room. Given the demand for and shortage of researchers with the computer skills to handle large volumes of biological data, where exactly does the bioinformatics elephant sit? There are probably many biologists who feel that a major product of this bioinformatics elephant is large piles of waste material. If you have tried to plow through Web sites and software packages in search of a specific tool for analyzing and collating large amounts of research data, you may well feel the same way. But there has been progress with major initiatives to develop more computing power, educate biologists about computers, increase funding, and set standards. For our purposes, bioinformatics is not simply a biologically inclined rehash of information theory (1) nor is it a hodgepodge of computer science techniques for building, updating, and accessing biological data. Rather bioinformatics incorporates both of these capabilities into a broad interdisciplinary science that involves both conceptual and practical tools for the understanding, generation, processing, and propagation of biological information. As such, bioinformatics is the sine qua non of 21st-century biology. Analyzing gene expression using cDNA microarrays immobilized on slides or other solid supports (gene chips) is set to revolutionize biology and medicine and, in so doing, generate vast quantities of data that have to be accurately interpreted (Fig. 1). As discussed at a meeting a few months ago (Microarray Algorithms and Statistical Analysis: Methods and Standards; Tahoe City, California; 9-12 November 1999), experiments with cDNA arrays must be subjected to quality control
Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén
Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure-Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron-Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods.
Pastur-Romay, Lucas Antón; Cedrón, Francisco; Pazos, Alejandro; Porto-Pazos, Ana Belén
Over the past decade, Deep Artificial Neural Networks (DNNs) have become the state-of-the-art algorithms in Machine Learning (ML), speech recognition, computer vision, natural language processing and many other tasks. This was made possible by the advancement in Big Data, Deep Learning (DL) and drastically increased chip processing abilities, especially general-purpose graphical processing units (GPGPUs). All this has created a growing interest in making the most of the potential offered by DNNs in almost every field. An overview of the main architectures of DNNs, and their usefulness in Pharmacology and Bioinformatics are presented in this work. The featured applications are: drug design, virtual screening (VS), Quantitative Structure–Activity Relationship (QSAR) research, protein structure prediction and genomics (and other omics) data mining. The future need of neuromorphic hardware for DNNs is also discussed, and the two most advanced chips are reviewed: IBM TrueNorth and SpiNNaker. In addition, this review points out the importance of considering not only neurons, as DNNs and neuromorphic chips should also include glial cells, given the proven importance of astrocytes, a type of glial cell which contributes to information processing in the brain. The Deep Artificial Neuron–Astrocyte Networks (DANAN) could overcome the difficulties in architecture design, learning process and scalability of the current ML methods. PMID:27529225
Tang, Vivian W
Background Zonula occludens, also known as the tight junction, is a specialized cell-cell interaction characterized by membrane "kisses" between epithelial cells. A cytoplasmic plaque of ~100 nm corresponding to a meshwork of densely packed proteins underlies the tight junction membrane domain. Due to its enormous size and difficulties in obtaining a biochemically pure fraction, the molecular composition of the tight junction remains largely unknown. Results A novel biochemical purification protocol has been developed to isolate tight junction protein complexes from cultured human epithelial cells. After identification of proteins by mass spectroscopy and fingerprint analysis, candidate proteins are scored and assessed individually. A simple algorithm has been devised to incorporate transmembrane domains and protein modification sites for scoring membrane proteins. Using this new scoring system, a total of 912 proteins have been identified. These 912 hits are analyzed using a bioinformatics approach to bin the hits in 4 categories: configuration, molecular function, cellular function, and specialized process. Prominent clusters of proteins related to the cytoskeleton, cell adhesion, and vesicular traffic have been identified. Weaker clusters of proteins associated with cell growth, cell migration, translation, and transcription are also found. However, the strongest clusters belong to synaptic proteins and signaling molecules. Localization studies of key components of synaptic transmission have confirmed the presence of both presynaptic and postsynaptic proteins at the tight junction domain. To correlate proteomics data with structure, the tight junction has been examined using electron microscopy. This has revealed many novel structures including end-on cytoskeletal attachments, vesicles fusing/budding at the tight junction membrane domain, secreted substances encased between the tight junction kisses, endocytosis of tight junction double membranes, satellite
Kulichenko, Darya; Bogdanov, Yuri F.
Background Shugoshins (SGOs) are proteins that protect cohesins located at the centromeres of sister chromatids from their early cleavage during mitosis and meiosis in plants, fungi, and animals. Their function is to prevent premature sister-chromatid disjunction and segregation. The study focused on the structural differences among SGOs acting during mitosis and meiosis that cause differences in chromosome behavior in these two types of cell division in different organisms. Methods A bioinformatical analysis of protein domains, conserved amino acid motifs, and physicochemical properties of 32 proteins from 25 species of plants, fungi, and animals was performed. Results We identified a C-terminal amino acid motif that is highly evolutionarily conserved among the SGOs protecting centromere cohesion of sister chromatids in meiotic anaphase I, but not among mitotic SGOs. This meiotic motif is arginine-rich in vertebrates. SGOs differ in different eukaryotic kingdoms by the sets and locations of amino acid motifs and the number of α-helical regions in the protein molecule. Discussion These structural differences between meiotic and mitotic SGOs probably could be responsible for the prolonged SGOs resistance to degradation during meiotic metaphase I and anaphase I. We suggest that the “arginine comb” in C-end meiotic motifs is capable of interaction by hydrogen bonds with guanine bases in the minor groove of DNA helix, thus protecting SGOs from hydrolysis. Our findings support independent evolution of meiosis in different lineages of multicellular organisms. PMID:27917322
Li, Zhen-Hua; Tang, Zhen-Xing; Fang, Xiu-Juan; Zhang, Zhi-Liang; Shi, Lu-E
In this paper, the physical and chemical characteristics, biological structure and function of a non-specific nuclease from Yersinia enterocolitica subsp. palearctica (Y. NSN) found in our group were studied using multiple bioinformatics approaches. The results showed that Y. NSN had 283 amino acids, a weight of 30,692.5 ku and a certain hydrophilic property. Y. NSN had a signal peptide, no transmembrane domains and disulphide bonds. Cleavage site in Y. NSN was between pos. 23 and 24. The prediction result of the secondary structure showed Y. NSN was a coil structure-based protein. The ratio of α-helix, β-folded and random coil were 18.73%, 16.96% and 64.31%, respectively. Active sites were pos. 124, 125, 127, 157, 165 and 169. Mg(2+) binding site was pos. 157. Substrate binding sites were pos. 124, 125 and 169. The analysis of multisequencing alignment and phylogenetic tree indicated that Y. NSN shared high similarity with the nuclease from Y. enterocolitica subsp. enterocolitica 8081. The enzyme activity results showed that Y. NSN was a nuclease with good thermostability.
Wang, Zhou-yong; Jiang, Chao; Chen, Min; Chen, Ping; Yuan, Yuan; Lin, Shu-fang; Wu, Zhi-gang
A FatB unigene was obtained from the transcriptome dataset of Lonicera japonica Thunb. Full-length FatB cDNA was cloned from buds of Lonicera japonica Thunb., Lonicera japonica Thunb. var. chinensis (Wats.) Bak., Lonicera hypoglauca Miq. and Lonicera dasystyla Rehd. using RT-PCR technology, and named as LJFatB, LHFatB, LJCFatB and LDFatB. The results of bioinformatic analysis showed that LJFatB, LJCFatB, LHFatB and LDFatB and Arabidopsis thaliana AtFatB had a closely relationship. Nucleotide sequences and protein secondary structure of LJFatB, LJCFatB, LHFatB and LDFatB are different and their proteins had conserved FatB substrate binding sites and catalytic activity sites. Transcriptive level of LJFatB, LJCFatB, LHFatB and LDFatB in bud was not significantly different. Therefore, LJFatB, LJCFatB, LHFatB and LDFatB could have the same biological function as AtFatB.
Hu, Guiping; Liu, Jiaxing; Zhang, Yongming; Zheng, Pai; Wang, Lele; Zhao, Lin; Xu, Huadong; Chen, Zhangjian; Wang, Tiancheng; Jia, Guang
Hexavalent chromium [Cr(VI)] compounds are widely used in industry and agriculture and are also ubiquitous environmental contaminant which are recognized as one kind of carcinogen, mutagen and teratogen towards humans and animals. To determined the Cr(VI) toxicity effects, gene expression profile can be meaningful for discovering underlying mechanisms of toxicity, and identifying potential specific genetic markers of Cr(VI) exposure and effects. In the current study, gene expression profiling and bioinformatics analysis in 16HBE cells treated by chromium(VI) compound were performed. The MTT assay was done to determine the optimal Cr(VI) treated concentration and time. The mRNA expression profile was performed using Arraystar Microarray V3.0 at 10.00μM Cr(VI). RT-qPCR was applied to verify some interested significantly altered genes at different treatment groups. Comprehensive analysis including biological processes, GO ontology network, pathway network analysis and gene-gene network analysis was conducted to identify the related biological processes, signal pathway and critical genes. It was found that Cr(VI) could induce reduced cells viability and alter gene expression profile of human bronchial epithelial cells. 2273 significantly differential expressed genes formed a complex network and some expressions changed in a Cr(VI) concentration dependent manner. In conclusion, Cr(VI) toxicity effects may involve in oxidative stress, inflammation, energy metabolism, protein synthesis endocytosis, ion binding, DNA binding and metabolism, cell morphogenesis, cell cycle regulation, autophagy, apoptosis, cell death, and carcinogenesis by some specific pathway. Meanwhile, some significantly differential expression genes can be used as potential biomarkers of Cr(VI) exposure.
When bioinformatics education is considered, several issues are addressed. At the undergraduate level, the main issue revolves around conveying information from two main and different fields: biology and computer science. At the graduate level, the main issue is bridging the gap between biology students and computer science students. However, there is an educational component that is rarely addressed within the context of bioinformatics education: the ethics component. Here, a different perspective is provided on bioinformatics education, and the current status of ethics is analyzed within the existing bioinformatics programs. Analysis of the existing undergraduate and graduate programs, in both Europe and the United States, reveals the minimal attention given to ethics within bioinformatics education. Given that bioinformaticians speedily and effectively shape the biomedical sciences and hence their implications for society, here redesigning of the bioinformatics curricula is suggested in order to integrate the necessary ethics education. Unique ethical problems awaiting bioinformaticians and bioinformatics ethics as a separate field of study are discussed. In addition, a template for an "Ethics in Bioinformatics" course is provided.
Suzuki, Tomonori; Miyazaki, Satoru
The genome sequences are one of the most fundamental data among various omics analyses. So far, basic bioinformatics tools have developing to treat genome sequences. First step of genome sequence analysis is to predict or assign "genes" on genome sequences. In the case of Eukaryotes, we can identify genes by use of full length cDNA sequences with local alignment tools such as search, blast and fasta, etc. However, it is difficult to catch mRNAs (transcripts) in Prokaryotes. Therefore, computational prediction for gene identification is first choice to start genome sequence analysis. In this review, we pick up methods for computational gene prediction first. Once genes are predicted, next step is to functions for proteins or RNAs encoded on a gene. Then, how we can define the distance between gene sequences is very important for the further analysis. So, we describe the basics of mathematical concept for gene comparison. And we also introduce our novel concept for biological sequence comparisons for the view point of informational theory. In the post genome era, many researchers are very interested in not only gene functions but also the gene regulations whose information is also on genome sequences. Cis-regulatory elements, however, is too short to find some mathematical rules. Therefore, computationally predicted cis-elements tend to include many false-positives. To reduce the ratio false-positives, we need reliable database of set of cis-regulatory elements called cis-regulatory modules for a gene. So, we are trying to develop the Cis-Regulatory Elements Module Reference Database. In the third section, we introduce you the procedure to construct the Cis-Regulatory Elements Module Reference Database and its user interfaces.
Das, Dibash K; Ali, Thahmina; Krampis, Konstantinos; Ogunwobi, Olorunseun O
Prostate cancer is the second most commonly diagnosed male cancer in the world. The molecular mechanisms underlying its development and progression are still unclear. Here we show analysis of a prostate cancer RNA-sequencing dataset that was originally generated by Ren et al.  from the prostate tumor and adjacent normal tissues of 14 patients. The data presented here was analyzed using our RNA-sequencing bioinformatics analysis pipeline implemented on the bioinformatics web platform, Galaxy. The relative expression of fibronectin (FN1) and the androgen receptor (AR) were calculated in fragments per kilobase of transcript per million mapped reads, and represented in FPKM unit. A subanalysis is also shown for data from three patients, that includes the relative expression of FN1 and AR and their fold change. For interpretation and discussion, please refer to the article, "miR-1207-3p regulates the androgen receptor in prostate cancer via FNDC1/fibronectin"  by Das et al.
Hernández, Sergio; Gómez, Antonio; Cedano, Juan; Querol, Enrique
The advent of genomics should have facilitated the identification of microbial virulence factors, a key objective for vaccine design. When the bacterial pathogen infects the host it expresses a set of genes, a number of them being virulence factors. Among the genes identified by techniques as microarrays, in vivo expression technology, signature-tagged mutagenesis and differential fluorescence induction there are many related to cellular stress, basal metabolism, etc., which cannot be directly involved in virulence, or at least cannot be considered useful candidates to be deleted for designing a live attenuated vaccine. Among the genes disclosed by these methodologies there are a number of hypothetical or unknown proteins. As they can hide some true virulence factors, we have reannotated all of these hypothetical proteins from several respiratory pathogens by a careful and in-depth analysis of each one. Although some of the re-annotations match with functions that can be related to microbial virulence, the identification of virulence factors remains difficult.
Sadok Menna-Barreto, Rubem Figueiredo; Belloze, Kele Teixeira; Perales, Jonas; Silva-Jr, Floriano Paes
Chagas disease is endemic in Latin America and is caused by the protozoan hemoflagellate parasite Trypanosoma cruzi. Nowadays, it has also been disseminated to non-endemic countries due to the ease of global mobility. The nitroheterocycle benznidazole is currently used to treat this neglected tropical disease, although this drug causes severe side effects and has limited efficacy during the chronic phase of the disease. Proteomics and bioinformatics have recently become powerful tools in the identification of new drug targets. In the last decade, proteomic profiles of different T. cruzi forms under distinct experimental conditions were assessed. These reports have pointed to many potential drug targets, with ergosterol biosynthesis-related proteins and redox system enzymes being the most promising candidates. Nevertheless, the majority of the compounds active against T. cruzi still have unclear mechanisms of action, and most proteomic efforts have studied epimastigotes (the non-clinically relevant insect form of the parasite). Additional analyses with the clinically relevant parasite forms should be performed to identify proteins that actually bind drugs active against T. cruzi. Nonetheless, due to the known technical hurdles in generating such experimental data, bioinformatic approaches that integrate currently available data to generate additional knowledge will also be useful. Here, we review T. cruzi proteomics and describe the main chemoproteomic methods and their application to the identification of trypanosomatid drug targets. Finally, we discuss the potential benefits of more extensively integrating all proteomic data with other molecular databases via bioinformatic analyses to develop novel, viable strategies for alternative treatments of Chagas disease.
Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics.
Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.
Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: firstname.lastname@example.org Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469
Yao, Qiuming; Xu, Dong
Protein phosphorylation is one of the most pervasive protein post-translational modification events in plant cells. It is involved in many plant biological processes, such as plant growth, organ development, and plant immunology, by regulating or switching signaling and metabolic pathways. High-throughput experimental methods like mass spectrometry can easily characterize hundreds to thousands of phosphorylation events in a single experiment. With the increasing volume of the data sets, Plant Protein Phosphorylation DataBase (P3DB, http://p3db.org ) provides a comprehensive, systematic, and interactive online platform to deposit, query, analyze, and visualize these phosphorylation events in many plant species. It stores the protein phosphorylation sites in the context of identified mass spectra, phosphopeptides, and phosphoproteins contributed from various plant proteome studies. In addition, P3DB associates these plant phosphorylation sites to protein physicochemical information in the protein charts and tertiary structures, while various protein annotations from hierarchical kinase phosphatase families, protein domains, and gene ontology are also added into the database. P3DB not only provides rich information, but also interconnects and provides visualization of the data in networks, in systems biology context. Currently, P3DB includes the KiC (Kinase Client) assay network, the protein-protein interaction network, the kinase-substrate network, the phosphatase-substrate network, and the protein domain co-occurrence network. All of these are available to query for and visualize existing phosphorylation events. Although P3DB only hosts experimentally identified phosphorylation data, it provides a plant phosphorylation prediction model for any unknown queries on the fly. P3DB is an entry point to the plant phosphorylation community to deposit and visualize any customized data sets within this systems biology framework. Nowadays, P3DB has become one of the major
During the last few decades, most of microbiology laboratories have become familiar in analyzing Sanger sequence data for ITS barcoding. However, with the availability of next-generation sequencing platforms in many centers, it has become important for medical mycologists to know how to make sense of the massive sequence data generated by these new sequencing technologies. In many reference laboratories, the analysis of such data is not a big deal, since suitable IT infrastructure and well-trained bioinformatics scientists are always available. However, in small research laboratories and clinical microbiology laboratories the availability of such resources are always lacking. In this report, simple and user-friendly bioinformatics work-flow is suggested for fast and reproducible ITS barcoding of fungi.
Campbell, Chad E.; Nehm, Ross H.
The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…
Zhang, Hu; Yu, Zhuo; He, Jianchao; Hua, Baotong; Zhang, Guiming
In the present study, gene expression profiles of patients with dilated cardiomyopathy (DCM) were re-analyzed with bioinformatics tools to investigate the molecular mechanisms underlying DCM. Gene expression dataset GSE3585 was downloaded from Gene Expression Omnibus, which included seven heart biopsy samples obtained from patients with DCM and five healthy controls. Differential analysis was performed using a Limma package in R to screen for differentially expressed genes (DEGs). Functional enrichment analysis was subsequently conducted for DEGs using the Database for Annotation, Visualization and Integration Discovery. A protein-protein interaction (PPI) network was constructed using information from Search Tool for the Retrieval of Interacting Genes software. A total of 89 DEGs were identified in the patients with DCM, including 67 upregulated and 22 downregulated genes. Functional enrichment analysis demonstrated that the downregulated genes predominantly encoded chromosomal proteins and transport-related proteins, which were significantly associated with the biological processes of ‘nucleosome assembly’, ‘chromatin assembly’, ‘protein-DNA complex assembly’, ‘nucleosome organization’ and ‘DNA packaging’ (H1 histone family member 0, histone cluster 1 H1c, histone cluster 1 H2bd and H2A histone family member Z). The upregulated genes detected in the present study encoded secreted proteins or phosphotransferase, which were associated with biological processes including ‘cell adhesion’ [connective tissue growth factor (CTGF)], ‘skeletal system development’ [CTGF and insulin-like growth factor binding protein 3 (IGFBP3)], ‘muscle organ development’ (SMAD7) and ‘regulation of cell migration’ [SMAD7, IGFBP3 and insulin receptor (INSR)]. Notably, signal transducer and activator of transcription 3, SMAD7, INSR, CTGF, exportin 1, IGFBP3 and phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha were hub nodes with the
Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin
This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.
Pawełkowicz, Magdalena E.; Skarzyńska, Agnieszka; Posyniak, Kacper; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew
An important computational challenge is finding the regulatory elements across the promotor region. In this work we present the advantages and disadvantages from the application of different bioinformatics programs for localization of transcription factor binding sites in the upstream region of genes connected with sex determination in cucumber. We use PlantCARE, PlantPAN and SignalScan to find motifs in the promotor regions. The results have been compared and possible function of chosen motifs has been described.
Li, Yan-Hui; Zhang, Gai-Gai
DAF-16, the C. elegans FOXO transcription factor, is an important determinant in aging and longevity. In this work, we manually curated FOXODB http://lyh.pkmu.cn/foxodb/, a database of FOXO direct targets. It now covers 208 genes. Bioinformatics analysis on 109 DAF-16 direct targets in C. elegans found interesting results. (i) DAF-16 and transcription factor PQM-1 co-regulate some targets. (ii) Seventeen targets directly regulate lifespan. (iii) Four targets are involved in lifespan extension induced by dietary restriction. And (iv) DAF-16 direct targets might play global roles in lifespan regulation. PMID:27027346
Felsani, Armando; Gudmundsson, Bjarki; Nanni, Simona; Brini, Elena; Moles, Anna; Thormar, Hans Guttormur; Estibeiro, Peter; Gaetano, Carlo; Capogrossi, Maurizio; Farsetti, Antonella; Jonsson, Jon Johannes; Guffanti, Alessandro
Different ChIP-Seq protocols may have a significant impact on the final outcome in terms of quality, number and distribution of called peaks. Sample DNA undergoes a long procedure before the final sequencing step, and damaged DNA can result in excessive mismatches in the alignment with reference genome. In this letter, we present the effect of well-defined modifications (timing of formaldehyde crosslink reversal, brand of the sonicator) of standard ChIP-Seq protocol on parallel samples derived from the same cell line correlating the initial DNA quality control metrics to the final bioinformatics analysis results.
Campbell, Chad E; Nehm, Ross H
The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are necessary tools for answering this question, their outputs are dependent on their quality. Our study 1) reviews the central importance of reliability and construct validity evidence in the development and evaluation of science assessments and 2) examines the extent to which published assessments in genomics and bioinformatics education (GBE) have been developed using such evidence. We identified 95 GBE articles (out of 226) that contained claims of knowledge increases, affective changes, or skill acquisition. We found that 1) the purpose of most of these studies was to assess summative learning gains associated with curricular change at the undergraduate level, and 2) a minority (<10%) of studies provided any reliability or validity evidence, and only one study out of the 95 sampled mentioned both validity and reliability. Our findings raise concerns about the quality of evidence derived from these instruments. We end with recommendations for improving assessment quality in GBE.
Campbell, Chad E.; Nehm, Ross H.
The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students’ knowledge, attitudes, or skills. Although assessments are necessary tools for answering this question, their outputs are dependent on their quality. Our study 1) reviews the central importance of reliability and construct validity evidence in the development and evaluation of science assessments and 2) examines the extent to which published assessments in genomics and bioinformatics education (GBE) have been developed using such evidence. We identified 95 GBE articles (out of 226) that contained claims of knowledge increases, affective changes, or skill acquisition. We found that 1) the purpose of most of these studies was to assess summative learning gains associated with curricular change at the undergraduate level, and 2) a minority (<10%) of studies provided any reliability or validity evidence, and only one study out of the 95 sampled mentioned both validity and reliability. Our findings raise concerns about the quality of evidence derived from these instruments. We end with recommendations for improving assessment quality in GBE. PMID:24006400
Bertucci Barbosa, Luiz Carlos; Garrido, Saulo Santesso; Garcia, Anderson; Delfino, Davi Barbosa; Gonçalves, Rodrigo Duarte; Marchetto, Reinaldo
Burr, Tom L
Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.
Veneman, Wouter J; de Sonneville, Jan; van der Kolk, Kees-Jan; Ordas, Anita; Al-Ars, Zaid; Meijer, Annemarie H; Spaink, Herman P
We present a RNA deep sequencing (RNAseq) analysis of a comparison of the transcriptome responses to infection of zebrafish larvae with Staphylococcus epidermidis and Mycobacterium marinum bacteria. We show how our developed GeneTiles software can improve RNAseq analysis approaches by more confidently identifying a large set of markers upon infection with these bacteria. For analysis of RNAseq data currently, software programs such as Bowtie2 and Samtools are indispensable. However, these programs that are designed for a LINUX environment require some dedicated programming skills and have no options for visualisation of the resulting mapped sequence reads. Especially with large data sets, this makes the analysis time consuming and difficult for non-expert users. We have applied the GeneTiles software to the analysis of previously published and newly obtained RNAseq datasets of our zebrafish infection model, and we have shown the applicability of this approach also to published RNAseq datasets of other organisms by comparing our data with a published mammalian infection study. In addition, we have implemented the DEXSeq module in the GeneTiles software to identify genes, such as glucagon A, that are differentially spliced under infection conditions. In the analysis of our RNAseq data, this has led to the possibility to improve the size of data sets that could be efficiently compared without using problem-dedicated programs, leading to a quick identification of marker sets. Therefore, this approach will also be highly useful for transcriptome analyses of other organisms for which well-characterised genomes are available.
Wang, Jingrui; Tang, Wei; Zheng, Yongna; Xing, Zhuqing; Wang, Yanping
A novel lactic acid bacteria strain Lactobacillus kefiranofaciens ZW3 exhibited the characteristics of high production of exopolysaccharide (EPS). The epsN gene, located in the eps gene cluster of this strain, is associated with EPS biosynthesis. Bioinformatics analysis of this gene was performed. The conserved domain analysis showed that the EpsN protein contained MATE-Wzx-like domains. Then the epsN gene was amplified to construct the recombinant expression vector pMG36e-epsN. The results showed that the EPS yields of the recombinants were significantly improved. By determining the yields of EPS and intracellular polysaccharide, it was considered that epsN gene could play its Wzx flippase role in the EPS biosynthesis. This is the first time to prove the effect of EpsN on L. kefiranofaciens EPS biosynthesis and further prove its functional property.
Capriotti, Emidio; Nehrt, Nathan L; Kann, Maricel G; Bromberg, Yana
An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.
Good, Benjamin M.; Su, Andrew I.
Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: email@example.com PMID:23782614
Kim, Jinkyu; Kim, Gunn; An, Sungbae; Kwon, Young-Kyun; Yoon, Sungroh
The assessment of information transfer in the global economic network helps to understand the current environment and the outlook of an economy. Most approaches on global networks extract information transfer based mainly on a single variable. This paper establishes an entirely new bioinformatics-inspired approach to integrating information transfer derived from multiple variables and develops an international economic network accordingly. In the proposed methodology, we first construct the transfer entropies (TEs) between various intra- and inter-country pairs of economic time series variables, test their significances, and then use a weighted sum approach to aggregate information captured in each TE. Through a simulation study, the new method is shown to deliver better information integration compared to existing integration methods in that it can be applied even when intra-country variables are correlated. Empirical investigation with the real world data reveals that Western countries are more influential in the global economic network and that Japan has become less influential following the Asian currency crisis.
Akkuratov, Evgeny E; Walters, Lorraine; Saha-Mandal, Arnab; Khandekar, Sushant; Crawford, Erin; Zirbel, Craig L; Leisner, Scott; Prakash, Ashwin; Fedorova, Larisa; Fedorov, Alexei
Orthologous introns have identical positions relative to the coding sequence in orthologous genes of different species. By analyzing the complete genomes of five plants we generated a database of 40,512 orthologous intron groups of dicotyledonous plants, 28,519 orthologous intron groups of angiosperms, and 15,726 of land plants (moss and angiosperms). Multiple sequence alignments of each orthologous intron group were obtained using the Mafft algorithm. The number of conserved regions in plant introns appeared to be hundreds of times fewer than that in mammals or vertebrates. Approximately three quarters of conserved intronic regions among angiosperms and dicots, in particular, correspond to alternatively-spliced exonic sequences. We registered only a handful of conserved intronic ncRNAs of flowering plants. However, the most evolutionarily conserved intronic region, which is ubiquitous for all plants examined in this study, including moss, possessed multiple structural features of tRNAs, which caused us to classify it as a putative tRNA-like ncRNA. Intronic sequences encoding tRNA-like structures are not unique to plants. Bioinformatics examination of the presence of tRNA inside introns revealed an unusually long-term association of four glycine tRNAs inside the Vac14 gene of fish, amniotes, and mammals.
Basyuni, M.; Wati, R.
This study described the bioinformatics methods to analyze seven oxidosqualene cyclase (OSC) genes from mangrove plants on DDBJ/EMBL/GenBank as well as predicted the structure, composition, similarity, subcellular localization and phylogenetic. The physical and chemical properties of seven mangrove OSC showed variation among the genes. The percentage of the secondary structure of seven mangrove OSC genes followed the order of a helix > random coil > extended chain structure. The values of chloroplast or signal peptide were too low, indicated that no chloroplast transit peptide or signal peptide of secretion pathway in mangrove OSC genes. The target peptide value of mitochondria varied from 0.163 to 0.430, indicated it was possible to exist. These results suggested the importance of understanding the diversity and functional of properties of the different amino acids in mangrove OSC genes. To clarify the relationship among the mangrove OSC gene, a phylogenetic tree was constructed. The phylogenetic tree shows that there are three clusters, Kandelia KcMS join with Bruguiera BgLUS, Rhizophora RsM1 was close to Bruguiera BgbAS, and Rhizophora RcCAS join with Kandelia KcCAS. The present study, therefore, supported the previous results that plant OSC genes form distinct clusters in the tree.
An, Sungbae; Kwon, Young-Kyun; Yoon, Sungroh
The assessment of information transfer in the global economic network helps to understand the current environment and the outlook of an economy. Most approaches on global networks extract information transfer based mainly on a single variable. This paper establishes an entirely new bioinformatics-inspired approach to integrating information transfer derived from multiple variables and develops an international economic network accordingly. In the proposed methodology, we first construct the transfer entropies (TEs) between various intra- and inter-country pairs of economic time series variables, test their significances, and then use a weighted sum approach to aggregate information captured in each TE. Through a simulation study, the new method is shown to deliver better information integration compared to existing integration methods in that it can be applied even when intra-country variables are correlated. Empirical investigation with the real world data reveals that Western countries are more influential in the global economic network and that Japan has become less influential following the Asian currency crisis. PMID:23300959
Wang, Fen; Ye, Bin
Cyst echinococcosis caused by the matacestodal larvae of Echinococcus granulosus (Eg), is a chronic, worldwide, and severe zoonotic parasitosis. The treatment of cyst echinococcosis is still difficult since surgery cannot fit the needs of all patients, and drugs can lead to serious adverse events as well as resistance. The screen of target proteins interacted with new anti-hydatidosis drugs is urgently needed to meet the prevailing challenges. Here, we analyzed the sequences and structure properties, and constructed a phylogenetic tree by bioinformatics methods. The MIP family signature and Protein kinase C phosphorylation sites were predicted in all nine EgAQPs. α-helix and random coil were the main secondary structures of EgAQPs. The numbers of transmembrane regions were three to six, which indicated that EgAQPs contained multiple hydrophobic regions. A neighbor-joining tree indicated that EgAQPs were divided into two branches, seven EgAQPs formed a clade with AQP1 from human, a "strict" aquaporins, other two EgAQPs formed a clade with AQP9 from human, an aquaglyceroporins. Unfortunately, homology modeling of EgAQPs was aborted. These results provide a foundation for understanding and researches of the biological function of E. granulosus.
Stangeland, Biljana; Mughal, Awais A; Grieg, Zanina; Sandberg, Cecilie Jonsgar; Joel, Mrinal; Nygård, Ståle; Meling, Torstein; Murrell, Wayne; Vik Mo, Einar O; Langmoen, Iver A
Glioblastoma (GBM) is both the most common and the most lethal primary brain tumor. It is thought that GBM stem cells (GSCs) are critically important in resistance to therapy. Therefore, there is a strong rationale to target these cells in order to develop new molecular therapies.To identify molecular targets in GSCs, we compared gene expression in GSCs to that in neural stem cells (NSCs) from the adult human brain, using microarrays. Bioinformatic filtering identified 20 genes (PBK/TOPK, CENPA, KIF15, DEPDC1, CDC6, DLG7/DLGAP5/HURP, KIF18A, EZH2, HMMR/RHAMM/CD168, NOL4, MPP6, MDM1, RAPGEF4, RHBDD1, FNDC3B, FILIP1L, MCC, ATXN7L4/ATXN7L1, P2RY5/LPAR6 and FAM118A) that were consistently expressed in GSC cultures and consistently not expressed in NSC cultures. The expression of these genes was confirmed in clinical samples (TCGA and REMBRANDT). The first nine genes were highly co-expressed in all GBM subtypes and were part of the same protein-protein interaction network. Furthermore, their combined up-regulation correlated negatively with patient survival in the mesenchymal GBM subtype. Using targeted proteomics and the COGNOSCENTE database we linked these genes to GBM signalling pathways.Nine genes: PBK, CENPA, KIF15, DEPDC1, CDC6, DLG7, KIF18A, EZH2 and HMMR should be further explored as targets for treatment of GBM.
Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas
MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'.
Xiao, Jing; Lu, Fu-Ping; Li, Yu; Li, Jin-Ting
In order to exploit new genetic resources, Pectate lyase(PEL) gene was amplified by PCR using the genome DNA from an alkaline Bacillus subtilis521. The PCR product was inserted into pET22b(+) vector. The recombinant plasmids were cloned in E.coli DH5α and then expressed in E.coli BL21. When cultured in the optimized medium, the positive clones E.coli BL21(pET22b(+)pel)showed intracellular pectate lyase activity of 90.0 U/mL. It was indicated that we had obtained the correct PEL gene. The pel has an open reading frame of 1263 nucleotides and codes for a product of 420 amino acids with a calculated molecular mass of 45.5 kD. Based on computer assisted analysis, a signal peptides and two conserved domains were revealed. The sequence analysis for PEL showed that it shares 26-82% homology with other strains in GenBank. In addition, the advanced structure of PEL were also predicted and analysed. This study will help to the experimental design of PEL fermentation and production purification and enzyme evolution.
Kreinovich, Vladik; Longpre, Luc; Starks, Scott A.; Xiang, Gang; Beck, Jan; Kandathi, Raj; Nayak, Asis; Ferson, Scott; Hajagos, Janos
In many areas of science and engineering, it is desirable to estimate statistical characteristics (mean, variance, covariance, etc.) under interval uncertainty. For example, we may want to use the measured values x(t) of a pollution level in a lake at different moments of time to estimate the average pollution level; however, we do not know the exact values x(t)--e.g., if one of the measurement results is 0, this simply means that the actual (unknown) value of x(t) can be anywhere between 0 and the detection limit (DL). We must, therefore, modify the existing statistical algorithms to process such interval data. Such a modification is also necessary to process data from statistical databases, where, in order to maintain privacy, we only keep interval ranges instead of the actual numeric data (e.g., a salary range instead of the actual salary). Most resulting computational problems are NP-hard--which means, crudely speaking, that in general, no computationally efficient algorithm can solve all particular cases of the corresponding problem. In this paper, we overview practical situations in which computationally efficient algorithms exist: e.g., situations when measurements are very accurate, or when all the measurements are done with one (or few) instruments. As a case study, we consider a practical problem from bioinformatics: to discover the genetic difference between the cancer cells and the healthy cells, we must process the measurements results and find the concentrations c and h of a given gene in cancer and in healthy cells. This is a particular case of a general situation in which, to estimate states or parameters which are not directly accessible by measurements, we must solve a system of equations in which coefficients are only known with interval uncertainty. We show that in general, this problem is NP-hard, and we describe new efficient algorithms for solving this problem in practically important situations.
Wang, E L; Wang, K Y; Chen, D F; Geng, Y; Huang, L Y; Wang, J; He, Y
Cytidine monophosphate (CMP) N-acetylneuraminic acid (NeuNAc) synthetase, which is encoded by the neuA gene, can catalyze the activation of sialic acid with CMP, and plays an important role in Streptococcus agalactiae infection pathogenesis. To study the structure and function of the S. agalactiae neuA gene, we isolated it from diseased tilapia, amplified it using polymerase chain reaction (PCR) with specific primers, and cloned it into a pMD19-T vector. The recombinant plasmid was confirmed by PCR and restriction enzyme digestion, and identified by sequencing. Molecular characterization analyses of the neuA nucleotide amino acid sequence were performed using bioinformatic tools and an online server. The results showed that the neuA nucleotide sequence contained a complete coding region, which comprised 1242 bp, encoding 413 amino acids (aa). The aa sequence was highly conserved and contained a Glyco_tranf_GTA_type superfamily and an SGNH_hydrolase superfamily conserved domain, which are related to sialic acid activation catalysis. The NeuA protein possessed many important sites related to post-translational modification, including 28 potential phosphorylation sites and 2 potential N-glycosylation sites, had no signal peptides or transmembrane regions, and was predicted to reside in the cytoplasm. Moreover, the protein had some B-cell epitopes, which suggests its potential in development of a vaccine against S. agalactiae infection. The codon usage frequency of neuA differed greatly in Escherichia coli and Homo sapiens genes, and neuA may be more efficiently expressed in eukaryotes (yeast). S. agalactiae neuA from tilapia maintains high structural homology and sequence identity with CMP-NeuNAc synthetases from other bacteria.
Tomoiaga, Delia; D’Hulst, Charlotte; Krampis, Konstantinos; Feinstein, Paul
We performed an extensive mutational analysis of the canonical mouse odorant receptor (OR) M71 to determine the properties of ORs that inhibit plasma membrane trafficking in heterologous expression systems. We employed the use of the M71::GFP fusion protein to directly assess plasma membrane localization and functionality of M71 in heterologous cells in vitro or in olfactory sensory neurons (OSNs) in vivo. OSN expression of M71::GFP show only small differences in activity compared to untagged M71. However, M71::GFP could not traffic to the plasma membrane even in the presence of proposed accessory proteins RTP1S or mβ2AR. To ask if ORs contain an internal “kill sequence”, we mutated ~15 of the most highly conserved OR specific amino acids not found amongst the trafficking non-OR GPCR superfamily; none of these mutants rescued trafficking. Addition of various amino terminal signal sequences or different glycosylation motifs all failed to produce trafficking. The addition of the amino and carboxy terminal domains of mβ2AR or the mutation Y289A in the highly conserved GPCR motif NPxxY does not rescue plasma membrane trafficking. The failure of targeted mutagenesis on rescuing plasma membrane localization in heterologous cells suggests that OR trafficking deficits may not be attributable to conserved collinear motifs, but rather the overall amino acid composition of the OR family. Thus, we performed an in silico analysis comparing the OR and other amine receptor superfamilies. We find that ORs contain fewer charged residues and more hydrophobic residues distributed throughout the protein and a conserved overall amino acid composition. From our analysis, we surmise that it may be difficult to traffic ORs at high levels to the cell surface in vitro, without making significant amino acid modifications. Finally, we observed specific increases in methionine and histidine residues as well as a marked decrease in tryptophan residues, suggesting that these changes
van Kampen, Antoine H C; Moerland, Perry D
Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.
Celik, Nermin; Webb, Chaille T.; Leyton, Denisse L.; Holt, Kathryn E.; Heinz, Eva; Gorrell, Rebecca; Kwok, Terry; Naderer, Thomas; Strugnell, Richard A.; Speed, Terence P.; Teasdale, Rohan D.; Likić, Vladimir A.; Lithgow, Trevor
Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters. PMID:22905239
Marino, Francesca; Vindigni, Alessandro; Onesti, Silvia
RecQ helicases play essential roles in the maintenance of genome stability and contain a highly conserved helicase region generally followed by a characteristic RecQ-C-terminal (RQC) domain, plus a number of variable associated domains. Notable exceptions are the RecQ4 helicases, where none of these additional regions have been described. Particularly striking was the fact that no RQC domain had been reported, considering that the RQC domain had been shown to play an essential role in the catalytic mechanism of most RecQ family members. Here we present the results of detailed bioinformatic analyses of RecQ4 proteins that identify, for the first time, the presence of a putative RQC domain, including some of the key residues involved in DNA binding and unwinding. We also describe the presence of a novel "Zn knuckle" domain, as well as an additional Sld2-homology region, providing new insights into the architecture, function and evolution of these enzymes.
Jha, Prabhash Kumar; Vijay, Aatira; Sahu, Anita; Ashraf, Mohammad Zahid
Thrombosis is a leading cause of morbidity and mortality in patients with myeloproliferative disorders (MPDs), particularly polycythemia vera (PV) and essential thrombocythemia (ET). Despite the attempts to establish a link between them, the shared biological mechanisms are yet to be characterized. An integrated gene expression meta-analysis of five independent publicly available microarray data of the three diseases was conducted to identify shared gene expression signatures and overlapping biological processes. Using INMEX bioinformatic tool, based on combined Effect Size (ES) approaches, we identified a total of 1,157 differentially expressed genes (DEGs) (697 overexpressed and 460 underexpressed genes) shared between the three diseases. EnrichR tool’s rich library was used for comprehensive functional enrichment and pathway analysis which revealed “mRNA Splicing” and “SUMO E3 ligases SUMOylate target proteins” among the most enriched terms. Network based meta-analysis identified MYC and FN1 to be the most highly ranked hub genes. Our results reveal that the alterations in biomarkers of the coagulation cascade like F2R, PROS1, SELPLG and ITGB2 were common between the three diseases. Interestingly, the study has generated a novel database of candidate genetic markers, pathways and transcription factors shared between thrombosis and MPDs, which might aid in the development of prognostic therapeutic biomarkers. PMID:27892526
Lü, Dingding; Hou, Chengxiang; Qin, Guangxing; Gao, Kun; Chen, Tian
A full-length cDNA of lebocin 5 (BmLeb5) was first cloned from silkworm, Bombyx mori, by rapid amplification of cDNA ends. The BmLeb5 gene is 808 bp in length and the open reading frame encodes a 179-amino acid hydroxyproline-rich peptide. Bioinformatic analysis results showed that BmLeb5 owns an O-glycosylation site and four RXXR motifs as other lebocins. Sequence similarity and phylogenic analysis results indicated that lebocins form a multiple gene family in silkworm as cecropins. Quantitative real-time PCR analysis revealed that BmLeb5 was highest expressed in the fat body. In the silkworm larvae infected by Beauveria bassiana, the expression level of BmLeb5 was upregulated in the fat body and hemolymph which are the most important immune tissues in silkworm. The recombinant protein of BmLeb5 was for the first time successfully expressed with prokaryotic expression system and purified. There are no reports so far that the expression of lebocins could be induced by entomopathogenic fungus. Our study suggested that BmLeb5 might play an important role in the immune response of silkworm to defend B. bassiana infection. The results also provided helpful information for further studying the lebocin family functioned in antifungal immune response in the silkworm. PMID:28194425
Liao, Jiangquan; Wei, Benjun; Chen, Hengwen; Liu, Yongmei; Wang, Jie
Background: Xuesaitong soft capsule (XST) which consists of panax notoginseng saponin (PNS) has been used to treat ischemic cerebrovascular diseases in China. The therapeutic mechanism of XST has not been elucidated yet from prospective of genomics and bioinformatics. Methods: A transcriptome analysis was performed to review series concerning middle cerebral artery occlusion (MCAO) rat model and XST intervention after MCAO from Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were compared between blank group and model group, model group and XST group. Functional enrichment and pathway analysis were performed. Protein-Protein interaction network was constructed. The overlapping genes from two DEGs sets were screened out and profound analysis was performed. Results: Two series including 22 samples were obtained. 870 DEGs were identified between blank group and model group, and 1189 DEGs were identified between model group and XST group. GO terms and KEGG pathways of MCAO and XST intervention were significantly enriched. PPI networks were constructed to demonstrate the gene-gene interactions. The overlapping genes from two DEGs sets were highlighted. ANTXR2, FHL3, PRCP, TYROBP, TAF9B, FGFR2, BCL11B, RB1CC1 and MBNL2 were the pivotal genes and possible action sites of XST therapeutic mechanisms. Conclusion: MCAO is a pathological process with multiple. PMID:27347353
Lü, Dingding; Hou, Chengxiang; Qin, Guangxing; Gao, Kun; Chen, Tian; Guo, Xijie
A full-length cDNA of lebocin 5 (BmLeb5) was first cloned from silkworm, Bombyx mori, by rapid amplification of cDNA ends. The BmLeb5 gene is 808 bp in length and the open reading frame encodes a 179-amino acid hydroxyproline-rich peptide. Bioinformatic analysis results showed that BmLeb5 owns an O-glycosylation site and four RXXR motifs as other lebocins. Sequence similarity and phylogenic analysis results indicated that lebocins form a multiple gene family in silkworm as cecropins. Quantitative real-time PCR analysis revealed that BmLeb5 was highest expressed in the fat body. In the silkworm larvae infected by Beauveria bassiana, the expression level of BmLeb5 was upregulated in the fat body and hemolymph which are the most important immune tissues in silkworm. The recombinant protein of BmLeb5 was for the first time successfully expressed with prokaryotic expression system and purified. There are no reports so far that the expression of lebocins could be induced by entomopathogenic fungus. Our study suggested that BmLeb5 might play an important role in the immune response of silkworm to defend B. bassiana infection. The results also provided helpful information for further studying the lebocin family functioned in antifungal immune response in the silkworm.
Ranganathan, Shoba; Hsu, Wen-Lian; Yang, Ueng-Cheng; Tan, Tin Wee
The 2008 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation set up in 1998, was organized as the 7th International Conference on Bioinformatics (InCoB), jointly with the Bioinformatics and Systems Biology in Taiwan (BIT 2008) Conference, Oct. 20–23, 2008 at Taipei, Taiwan. Besides bringing together scientists from the field of bioinformatics in this region, InCoB is actively involving researchers from the area of systems biology, to facilitate greater synergy between these two groups. Marking the 10th Anniversary of APBioNet, this InCoB 2008 meeting followed on from a series of successful annual events in Bangkok (Thailand), Penang (Malaysia), Auckland (New Zealand), Busan (South Korea), New Delhi (India) and Hong Kong. Additionally, tutorials and the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) immediately prior to the 20th Federation of Asian and Oceanian Biochemists and Molecular Biologists (FAOBMB) Taipei Conference provided ample opportunity for inducting mainstream biochemists and molecular biologists from the region into a greater level of awareness of the importance of bioinformatics in their craft. In this editorial, we provide a brief overview of the peer-reviewed manuscripts accepted for publication herein, grouped into thematic areas. As the regional research expertise in bioinformatics matures, the papers fall into thematic areas, illustrating the specific contributions made by APBioNet to global bioinformatics efforts. PMID:19091008
Background Ovarian cancer is a cancerous growth arising from the ovary. Objective This study was aimed to explore the molecular mechanism of the development and progression of the ovarian cancer. Methods We first identified the differentially expressed genes (DEGs) between the ovarian cancer samples and the healthy controls by analyzing the GSE14407 affymetrix microarray data, and then the functional enrichments of the DEGs were investigated. Furthermore, we constructed the protein-protein interaction network of the DEGs using the STRING online tools to find the genes which might play important roles in the progression of ovarian cancer. In addition, we performed the enrichment analysis to the PPI network. Results Our study screened 659 DEGs, including 77 up- and 582 down-regulated genes. These DEGs were enriched in pathways such as Cell cycle, p53 signaling pathway, Pathways in cancer and Drug metabolism. CCNE1, CCNB2 and CYP3A5 were the significant genes identified from these pathways. Protein-protein interaction (PPI) network was constructed and network Module A was found closely associated with ovarian cancer. Hub nodes such as VEGFA, CALM1, BIRC5 and POLD1 were found in the PPI network. Module A was related to biological processes such as mitotic cell cycle, cell cycle, nuclear division, and pathways namely Cell cycle, Oocyte meiosis and p53 signaling pathway. Conclusions It indicated that ovarian cancer was closely associated to the dysregulation of p53 signaling pathway, drug metabolism, tyrosine metabolism and cell cycle. Besides, we also predicted genes such as CCNE1, CCNB2, CYP3A5 and VEGFA might be target genes for diagnosing the ovarian cancer. PMID:24341673
Gao, Fan; Nan, FangRu; Song, Wei; Feng, Jia; Lv, JunPing; Xie, ShuLian
Chondrus crispus, an economically and medicinally important red alga, is a medicinally active substance and important for anti-tumor research. In this study, 117 C. crispus miRNAs (108 conserved and 9 novel) were identified from 2,416,181 small-RNA reads using high-throughput sequencing and bioinformatics methods. According to the BLAST search against the miRBase database, these miRNAs belonged to 110 miRNA families. Sequence alignment combined with homology searching revealed both the conservation and diversity of predicted potential miRNA families in different plant species. Four and 19 randomly selected miRNAs were validated by northern blotting and stem-loop quantitative real-time reverse transcription polymerase chain reaction detection, respectively. The validation rates (75% and 94.7%) demonstrated that most of the identified miRNAs could be credible. A total of 160 potential target genes were predicted and functionally annotated by Gene Ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis. We also analyzed the interrelationship of miRNAs, miRNA-target genes and target genes in C. crispus by constructing a Cytoscape network. The 117 miRNAs identified in our study should supply large quantities of information that will be important for red algae small RNA research. PMID:27193824
Rodríguez-Concepción, Manuel; Querol, Jordi; Lois, Luisa María; Imperial, Santiago; Boronat, Albert
Carotenoids are plastidic isoprenoid pigments of great biological and biotechnological interest. The precursors for carotenoid production are synthesized through the recently elucidated methylerythritol phosphate (MEP) pathway. Here we have identified a tomato ( Lycopersicon esculentum Mill.) cDNA sequence encoding a full-length protein with homology to the MEP pathway enzyme hydroxymethylbutenyl 4-diphosphate synthase (HDS, also called GCPE). Comparison with other plant and bacterial HDS sequences showed that the plant enzymes contain a plastid-targeting N-terminal sequence and two highly conserved plant-specific domains in the mature protein with no homology to any other sequence in the databases. The ubiquitous distribution of HDS-encoding expressed sequence tags (ESTs) in the tomato collections suggests that the corresponding gene is likely expressed throughout the plant. The role of HDS in controlling the supply of precursors for carotenoid biosynthesis was estimated from the bioinformatic and molecular analysis of transcript abundance in different stages of fruit development. No significant changes in HDS gene expression were deduced from the statistical analysis of EST distribution during fruit ripening, when an active MEP pathway is required to support a massive accumulation of carotenoids. RNA blot experiments confirmed that similar transcript levels were present in both the wild-type and carotenoid-depleted yellow ripe ( r) mutant fruit independent of the stage of development and the carotenoid composition of the fruit. Together, our results are consistent with a non-limiting role for HDS in carotenoid biosynthesis during tomato fruit ripening.
Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P
Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.
Orton, R J; Gu, Q; Hughes, J; Maabar, M; Modha, S; Vattipally, S B; Wilkie, G S; Davison, A J
The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing.
Kaas, Quentin; Craik, David J.
Venomics is a modern approach that combines transcriptomics and proteomics to explore the toxin content of venoms. This review will give an overview of computational approaches that have been created to classify and consolidate venomics data, as well as algorithms that have helped discovery and analysis of toxin nucleic acid and protein sequences, toxin three-dimensional structures and toxin functions. Bioinformatics is used to tackle specific challenges associated with the identification and annotations of toxins. Recognizing toxin transcript sequences among second generation sequencing data cannot rely only on basic sequence similarity because toxins are highly divergent. Mass spectrometry sequencing of mature toxins is challenging because toxins can display a large number of post-translational modifications. Identifying the mature toxin region in toxin precursor sequences requires the prediction of the cleavage sites of proprotein convertases, most of which are unknown or not well characterized. Tracing the evolutionary relationships between toxins should consider specific mechanisms of rapid evolution as well as interactions between predatory animals and prey. Rapidly determining the activity of toxins is the main bottleneck in venomics discovery, but some recent bioinformatics and molecular modeling approaches give hope that accurate predictions of toxin specificity could be made in the near future. PMID:26110505
d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna
We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.
Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael
Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…
Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia
One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…
Hutchinson-Gilford progeria syndrome (HGPS) is a rare human genetic disease that leads to premature aging. HGPS is caused by mutation in the Lamin-A (LMNA) gene that leads, in affected young individuals, to the accumulation of the progerin protein, usually present only in aging differentiated cells. Bioinformatics analyses of the network of interactions of the LMNA gene and transcripts are presented. The LMNA gene network has been analyzed using the BioGRID database (http://thebiogrid.org/) and related analysis tools such as Osprey (http://biodata.mshri.on.ca/osprey/servlet/Index) and GeneMANIA ( http://genemania.org/). The network of interaction of LMNA transcripts has been further analyzed following the competing endogenous (ceRNA) hypotheses (RNA cross-talk via microRNAs [miRNAs]) and using the miRWalk database and tools (www.ma.uni-heidelberg.de/apps/zmf/mirwalk/). These analyses suggest particular relevance of epigenetic modifiers (via acetylase complexes and specifically HTATIP histone acetylase) and adenosine triphosphate (ATP)-dependent chromatin remodelers (via pBAF, BAF, and SWI/SNF complexes).
Qi, Yuhua; Fan, Huan; Cui, Lunbiao; Shi, Zhiyang
Hand, foot, and mouth disease (HFMD), mainly caused by coxsackievirus A16 (CVA16) and enterovirus 71 (EV71) infections, remains a serious public health issue with thousands of newly diagnostic cases each year since 2008 in China. The mechanisms underlying viral infection, however, are elusive to date. In the present study, we systematically investigated the host cellular microRNA (miRNA) expression patterns in response to CVA16 and EV71 infections. Through microarray examination, 27 miRNAs (15 upregulated and 12 downregulated) were found to be coassociated with the replication process of two viruses, while the expression levels of 15 and 5 miRNAs were significantly changed in CVA16- and EV71-infected cells, respectively. A great number of target genes of 27 common differentially expressed miRNAs were predicted by combined use of two computational target prediction algorithms, TargetScan and MiRanda. Comprehensive bioinformatic analysis of target genes in GO categories and KEGG pathways indicated the involvement of diverse biological functions and signaling pathways during viral infection. These results provide an overview of the roles of miRNAs in virus-host interaction, which will contribute to further understanding of HFMD pathological mechanisms. PMID:27843944
He, Hailong; Mao, Lingzhou; Xu, Peng; Xi, Yanhai; Xu, Ning; Xue, Mingtao; Yu, Jiangming; Ye, Xiaojian
Ossification of the posterior longitudinal ligament (OPLL) is a kind of disease with physical barriers and neurological disorders. The objective of this study was to explore the differentially expressed genes (DEGs) in OPLL patient ligament cells and identify the target sites for the prevention and treatment of OPLL in clinic. Gene expression data GSE5464 was downloaded from Gene Expression Omnibus; then DEGs were screened by limma package in R language, and changed functions and pathways of OPLL cells compared to normal cells were identified by DAVID (The Database for Annotation, Visualization and Integrated Discovery); finally, an interaction network of DEGs was constructed by string. A total of 1536 DEGs were screened, with 31 down-regulated and 1505 up-regulated genes. Response to wounding function and Toll-like receptor signaling pathway may involve in the development of OPLL. Genes, such as PDGFB, PRDX2 may involve in OPLL through response to wounding function. Toll-like receptor signaling pathway enriched genes such as TLR1, TLR5, and TLR7 may involve in spine cord injury in OPLL. PIK3R1 was the hub gene in the network of DEGs with the highest degree; INSR was one of the most closely related genes of it. OPLL related genes screened by microarray gene expression profiling and bioinformatics analysis may be helpful for elucidating the mechanism of OPLL.
Chen, Xiwen; Cheng, Anchun; Wang, Mingshu; Xiang, Jun
In this study, the predicted information about structures and functions of VP23 encoded by the newly identified DEV UL18 gene through bioinformatics softwares and tools. The DEV UL18 was predicted to encode a polypeptide with 322 amino acids, termed VP23, with a putative molecular mass of 35.250 kDa and a predicted isoelectric point (PI) of 8.37, no signal peptide and transmembrane domain in the polypeptide. The prediction of subcellular localization showed that the DEV-VP23 located at endoplasmic reticulum with 33.3%, mitochondrial with 22.2%, extracellular, including cell wall with 11.1%, vesicles of secretory system with 11.1%, Golgi with 11.1%, and plasma membrane with 11.1%. The acid sequence of analysis showed that the potential antigenic epitopes are situated in 45-47, 53-60, 102-105, 173-180, 185-189, 260-265, 267-271, and 292-299 amino acids. All the consequences inevitably provide some insights for further research about the DEV-VP23 and also provide a fundament for further study on the the new type clinical diagnosis of DEV and can be used for the development of new DEV vaccine.
Hernández, Sergio; Franco, Luís; Calvo, Alejandra; Ferragut, Gabriela; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan
Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein–protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations – it requires the existence of multialigned family protein sequences – but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses. PMID:26157797
Gao, Fan; Nan, Fangru; Feng, Jia; Lv, Junping; Liu, Qi; Xie, Shulian
Eucheuma denticulatum, an economically and industrially important red alga, is a valuable marine resource. Although microRNAs (miRNAs) play an essential role in gene post-transcriptional regulation, no research has been conducted to identify and characterize miRNAs in E. denticulatum. In this study, we identified 134 miRNAs (133 conserved miRNAs and one novel miRNA) from 2,997,135 small-RNA reads by high-throughput sequencing combined with bioinformatics analysis. BLAST searching against miRBase uncovered 126 potential miRNA families. A conservation and diversity analysis of predicted miRNA families in different plant species was performed by comparative alignment and homology searching. A total of 4 and 13 randomly selected miRNAs were respectively validated by northern blotting and stem-loop reverse transcription PCR, thereby demonstrating the reliability of the miRNA sequencing data. Altogether, 871 potential target genes were predicted using psRobot and TargetFinder. Target genes classification and enrichment were conducted based on Gene Ontology analysis. The functions of target gene products and associated metabolic pathways were predicted by Kyoto Encyclopedia of Genes and Genomes pathway analysis. A Cytoscape network was constructed to explore the interrelationships of miRNAs, miRNA-target genes and target genes. A large number of miRNAs with diverse target genes will play important roles for further understanding some essential biological processes in E. denticulatum. The uncovered information can serve as an important reference for the protection and utilization of this unique red alga in the future.
Omura, Seiichi; Kawai, Eiichiro; Sato, Fumitaka; Martinez, Nicholas E.; Chaitanya, Ganta V.; Rollyson, Phoebe A.; Cvek, Urska; Trutschl, Marjan; Alexander, J. Steven; Tsunoda, Ikuo
Background Myocarditis is an inflammatory disease of the cardiac muscle and is mainly caused by viral infections. Viral myocarditis has been proposed to be divided into 3 phases: the acute viral phase, the subacute immune phase, and the chronic cardiac remodeling phase. Although individualized therapy should be applied depending on the phase, no clinical or experimental studies have found biomarkers that distinguish between the 3 phases. Theiler’s murine encephalomyelitis virus belongs to the genus Cardiovirus and can cause myocarditis in susceptible mouse strains. Methods and Results Using this novel model for viral myocarditis induced with Theiler’s murine encephalomyelitis virus, we conducted multivariate analysis including echocardiography, serum troponin and viral RNA titration, and microarray to identify the biomarker candidates that can discriminate the 3 phases. Using C3H mice infected with Theiler’s murine encephalomyelitis virus on 4, 7, and 60 days post infection, we conducted bioinformatics analyses, including principal component analysis and k-means clustering of microarray data, because our traditional cardiac and serum assays, including 2-way comparison of microarray data, did not lead to the identification of a single biomarker. Principal component analysis separated heart samples clearly between the groups of 4, 7, and 60 days post infection. Representative genes contributing to the separation were as follows: 4 and 7 days post infection, innate immunity–related genes, such as Irf7 and Cxcl9; 7 and 60 days post infection, acquired immunity–related genes, such as Cd3g and H2-Aa; and cardiac remodeling–related genes, such as Mmp12 and Gpnmb. Conclusions Sets of molecules, not single molecules, identified by unsupervised principal component analysis, were found to be useful as phase-specific biomarkers. PMID:25031303
Gao, Fan; Nan, Fangru; Feng, Jia; Lv, Junping; Liu, Qi; Xie, Shulian
ABSTRACT Eucheuma denticulatum, an economically and industrially important red alga, is a valuable marine resource. Although microRNAs (miRNAs) play an essential role in gene post-transcriptional regulation, no research has been conducted to identify and characterize miRNAs in E. denticulatum. In this study, we identified 134 miRNAs (133 conserved miRNAs and one novel miRNA) from 2,997,135 small-RNA reads by high-throughput sequencing combined with bioinformatics analysis. BLAST searching against miRBase uncovered 126 potential miRNA families. A conservation and diversity analysis of predicted miRNA families in different plant species was performed by comparative alignment and homology searching. A total of 4 and 13 randomly selected miRNAs were respectively validated by northern blotting and stem-loop reverse transcription PCR, thereby demonstrating the reliability of the miRNA sequencing data. Altogether, 871 potential target genes were predicted using psRobot and TargetFinder. Target genes classification and enrichment were conducted based on Gene Ontology analysis. The functions of target gene products and associated metabolic pathways were predicted by Kyoto Encyclopedia of Genes and Genomes pathway analysis. A Cytoscape network was constructed to explore the interrelationships of miRNAs, miRNA-target genes and target genes. A large number of miRNAs with diverse target genes will play important roles for further understanding some essential biological processes in E. denticulatum. The uncovered information can serve as an important reference for the protection and utilization of this unique red alga in the future. PMID:26717154
Roehr, Christina; Fischer, Axel; Isau, Melanie; Boerno, Stefan T.; Wunderlich, Andrea; Barmeyer, Christian; Seemann, Petra; Koenig, Jana; Lappe, Michael; Kuss, Andreas W.; Garshasbi, Masoud; Bertram, Lars; Trappe, Kathrin; Werber, Martin; Herrmann, Bernhard G.; Zatloukal, Kurt; Lehrach, Hans; Schweiger, Michal R.
Background Colorectal cancer (CRC) is with approximately 1 million cases the third most common cancer worldwide. Extensive research is ongoing to decipher the underlying genetic patterns with the hope to improve early cancer diagnosis and treatment. In this direction, the recent progress in next generation sequencing technologies has revolutionized the field of cancer genomics. However, one caveat of these studies remains the large amount of genetic variations identified and their interpretation. Methodology/Principal Findings Here we present the first work on whole exome NGS of primary colon cancers. We performed 454 whole exome pyrosequencing of tumor as well as adjacent not affected normal colonic tissue from microsatellite stable (MSS) and microsatellite instable (MSI) colon cancer patients and identified more than 50,000 small nucleotide variations for each tissue. According to predictions based on MSS and MSI pathomechanisms we identified eight times more somatic non-synonymous variations in MSI cancers than in MSS and we were able to reproduce the result in four additional CRCs. Our bioinformatics filtering approach narrowed down the rate of most significant mutations to 359 for MSI and 45 for MSS CRCs with predicted altered protein functions. In both CRCs, MSI and MSS, we found somatic mutations in the intracellular kinase domain of bone morphogenetic protein receptor 1A, BMPR1A, a gene where so far germline mutations are associated with juvenile polyposis syndrome, and show that the mutations functionally impair the protein function. Conclusions/Significance We conclude that with deep sequencing of tumor exomes one may be able to predict the microsatellite status of CRC and in addition identify potentially clinically relevant mutations. PMID:21203531
Agapito, Giuseppe; Botta, Cirino; Guzzi, Pietro Hiram; Arbitrio, Mariamena; Di Martino, Maria Teresa; Tassone, Pierfrancesco; Tagliaferri, Pierosandro; Cannataro, Mario
Background: The identification of biomarkers for the estimation of cancer patients’ survival is a crucial problem in modern oncology. Recently, the Affymetrix DMET (Drug Metabolizing Enzymes and Transporters) microarray platform has offered the possibility to determine the ADME (absorption, distribution, metabolism, and excretion) gene variants of a patient and to correlate them with drug-dependent adverse events. Therefore, the analysis of survival distribution of patients starting from their profile obtained using DMET data may reveal important information to clinicians about possible correlations among drug response, survival rate, and gene variants. Methods: In order to provide support to this analysis we developed OSAnalyzer, a software tool able to compute the overall survival (OS) and progression-free survival (PFS) of cancer patients and evaluate their association with ADME gene variants. Results: The tool is able to perform an automatic analysis of DMET data enriched with survival events. Moreover, results are ranked according to statistical significance obtained by comparing the area under the curves that is computed by using the log-rank test, allowing a quick and easy analysis and visualization of high-throughput data. Conclusions: Finally, we present a case study to highlight the usefulness of OSAnalyzer when analyzing a large cohort of patients. PMID:27669316
Alkhalili, Rawana N.; Bernfur, Katja; Dishisha, Tarek; Mamo, Gashaw; Schelin, Jenny; Canbäck, Björn; Emanuelsson, Cecilia; Hatti-Kaul, Rajni
A thermophilic bacterial strain, Geobacillus sp. ZGt-1, isolated from Zara hot spring in Jordan, was capable of inhibiting the growth of the thermophilic G. stearothermophilus and the mesophilic Bacillus subtilis and Salmonella typhimurium on a solid cultivation medium. Antibacterial activity was not observed when ZGt-1 was cultivated in a liquid medium; however, immobilization of the cells in agar beads that were subjected to sequential batch cultivation in the liquid medium at 60 °C showed increasing antibacterial activity up to 14 cycles. The antibacterial activity was lost on protease treatment of the culture supernatant. Concentration of the protein fraction by ammonium sulphate precipitation followed by denaturing polyacrylamide gel electrophoresis separation and analysis of the gel for antibacterial activity against G. stearothermophilus showed a distinct inhibition zone in 15–20 kDa range, suggesting that the active molecule(s) are resistant to denaturation by SDS. Mass spectrometric analysis of the protein bands around the active region resulted in identification of 22 proteins with molecular weight in the range of interest, three of which were new and are here proposed as potential antimicrobial protein candidates by in silico analysis of their amino acid sequences. Mass spectrometric analysis also indicated the presence of partial sequences of antimicrobial enzymes, amidase and dd-carboxypeptidase. PMID:27548162
Steffen-Munsberg, Fabian; Vickers, Clare; Kohls, Hannes; Land, Henrik; Mallin, Hendrik; Nobili, Alberto; Skalden, Lilly; van den Bergh, Tom; Joosten, Henk-Jan; Berglund, Per; Höhne, Matthias; Bornscheuer, Uwe T
In this review we analyse structure/sequence-function relationships for the superfamily of PLP-dependent enzymes with special emphasis on class III transaminases. Amine transaminases are highly important for applications in biocatalysis in the synthesis of chiral amines. In addition, other enzyme activities such as racemases or decarboxylases are also discussed. The substrate scope and the ability to accept chemically different types of substrates are shown to be reflected in conserved patterns of amino acids around the active site. These findings are condensed in a sequence-function matrix, which facilitates annotation and identification of biocatalytically relevant enzymes and protein engineering thereof.
Hadley, Stanton W; Gotham, Douglas J.; Luciani, Ralph L.
Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 14 topics was developed for further analysis. This paper brings together the earlier interim reports of the first 13 topics plus one additional topic into a single final report.
Alam, Khalid K; Chang, Jonathan L; Burke, Donald H
High-throughput sequence (HTS) analysis of combinatorial selection populations accelerates lead discovery and optimization and offers dynamic insight into selection processes. An underlying principle is that selection enriches high-fitness sequences as a fraction of the population, whereas low-fitness sequences are depleted. HTS analysis readily provides the requisite numerical information by tracking the evolutionary trajectory of individual sequences in response to selection pressures. Unlike genomic data, for which a number of software solutions exist, user-friendly tools are not readily available for the combinatorial selections field, leading many users to create custom software. FASTAptamer was designed to address the sequence-level analysis needs of the field. The open source FASTAptamer toolkit counts, normalizes and ranks read counts in a FASTQ file, compares populations for sequence distribution, generates clusters of sequence families, calculates fold-enrichment of sequences throughout the course of a selection and searches for degenerate sequence motifs. While originally designed for aptamer selections, FASTAptamer can be applied to any selection strategy that can utilize next-generation DNA sequencing, such as ribozyme or deoxyribozyme selections, in vivo mutagenesis and various surface display technologies (peptide, antibody fragment, mRNA, etc.). FASTAptamer software, sample data and a user's guide are available for download at http://burkelab.missouri.edu/fastaptamer.html. PMID:25734917
Grafström, Roland C; Nymark, Penny; Hongisto, Vesa; Spjuth, Ola; Ceder, Rebecca; Willighagen, Egon; Hardy, Barry; Kaski, Samuel; Kohonen, Pekka
This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety.
Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475
Cheng, X; Xiong, Y; Li, D H; Cheng, J; Cao, Y P; Yan, C C; Jin, Q; Sun, N; Cai, Y P; Lin, Y
With high nutritional value in its fruits, Dangshan Su pear has been widely cultivated in China. The stone cell content in fruits is a key factor affecting fruit quality in pear, and the formation of stone cells has been associated with lignin biosynthesis. O-Methyltransferase (OMT) is a key enzyme involved in lignin metabolism within the phenylpropanoid pathway. Here, we screened 26 OMT genes from the Pyrus bretschneideri cv. Dangshan Su genome using the DNATOOLs software. To characterize the OMT gene family in pear, gene structure, chromosomal localization, and conserved motifs of PbOMTs were analyzed. PbOMTs were divided into two categories, type I (designated PbCCOMTs) and type II (designated PbCOMTs), indicating the differentiation of function during evolution. Based on the analysis of multiple sequence alignment, cis-element prediction, and phylogenetic relationships, two candidate genes, PbCCOMT1 and PbCCOMT3, were selected for the analysis of temporal and spatial gene expression in pear. The promoter regions of both PbCCOMT1 and PbCCOMT3 contain regulatory motifs for lignin synthesis. Moreover, the two genes show high similarity and close phylogenetic relationships with CCOMTs in other species. Expression analysis showed that transcript levels of two PbCCOMTs were positively associated with the contents of both stone cells and lignin during the development of pear fruit. These results suggest that PbCCOMT1 and PbCCOMT3 are closely associated with lignin biosynthesis. These findings will help clarify the function of PbOMTs in lignin metabolism and to elucidate the mechanisms underlying stone cell formation in pear.
Biswas, Silpak; Raoult, Didier; Rolain, Jean-Marc
Intracellular bacteria survive within eukaryotic host cells and are difficult to kill with certain antibiotics. As a result, antibiotic resistance in intracellular bacteria is becoming commonplace in healthcare institutions. Owing to the lack of methods available for transforming these bacteria, we evaluated the mechanisms of resistance using molecular methods and in silico genome analysis. The objective of this review was to understand the molecular mechanisms of antibiotic resistance through in silico comparisons of the genomes of obligate and facultative intracellular bacteria. The available data on in vitro mutants reported for intracellular bacteria were also reviewed. These genomic data were analysed to find natural mutations in known target genes involved in antibiotic resistance and to look for the presence or absence of different resistance determinants. Our analysis revealed the presence of tetracycline resistance protein (Tet) in Bartonella quintana, Francisella tularensis and Brucella ovis; moreover, most of the Francisella strains possessed the blaA gene, AmpG protein and metallo-beta-lactamase family protein. The presence or absence of folP (dihydropteroate synthase) and folA (dihydrofolate reductase) genes in the genome could explain natural resistance to co-trimoxazole. Finally, multiple genes encoding different efflux pumps were studied. This in silico approach was an effective method for understanding the mechanisms of antibiotic resistance in intracellular bacteria. The whole genome sequence analysis will help to predict several important phenotypic characteristics, in particular resistance to different antibiotics. In the future, stable mutants should be obtained through transformation methods in order to demonstrate experimentally the determinants of resistance in intracellular bacteria.
Ji, Hong-Fang; Zhuang, Qi-Shuai; Shen, Liang
Our study investigated the shared genetic etiology underlying type 2 diabetes (T2D) and major depressive disorder (MDD) by analyzing large-scale genome wide association studies statistics. A total of 496 shared SNPs associated with both T2D and MDD were identified at p-value ≤ 1.0E-07. Functional enrichment analysis showed that the enriched pathways pertained to immune responses (Fc gamma R-mediated phagocytosis, T cell and B cell receptors signaling), cell signaling (MAPK, Wnt signaling), lipid metabolism, and cancer associated pathways. The findings will have potential implications for future interventional studies of the two diseases.
Ding, Anming; Li, Ling; Qu, Xu; Sun, Tingting; Chen, Yaqiong; Zong, Peng; Li, Zunqiang; Gong, Daping; Sun, Yuhe
Pentatricopeptide repeats (PPRs) genes constitute one of the largest gene families in plants, which play a broad and essential role in plant growth and development. In this study, the protein sequences annotated by the tomato (S. lycopersicum L.) genome project were screened with the Pfam PPR sequences. A total of 471 putative PPR-encoding genes were identified. Based on the motifs defined in A. thaliana L., protein structure and conserved sequences for each tomato motif were analyzed. We also analyzed phylogenetic relationship, subcellular localization, expression and GO analysis of the identified gene sequences. Our results demonstrate that tomato PPR gene family contains two subfamilies, P and PLS, each accounting for half of the family. PLS subfamily can be divided into four subclasses i.e., PLS, E, E+ and DYW. Each subclass of sequences forms a clade in the phylogenetic tree. The PPR motifs were found highly conserved among plants. The tomato PPR genes were distributed over 12 chromosomes and most of them lack introns. The majority of PPR proteins harbor mitochondrial or chloroplast localization sequences, whereas GO analysis showed that most PPR proteins participate in RNA-related biological processes.
Wang, Jinlan; Chang, Fen
Toll-like receptors (TLRs) play important role in the innate immune system. TLR15 is reported to have a unique role in defense against pathogens, but its structural and evolution characterizations are still poorly understood. In this study, we identified 57 completed TLR15 genes from avian and reptilian genomes. TLR15 clustered into an individual clade and was closely related to family 1 on the phylogenetic tree. Unlike the TLRs in family 1 with the broken asparagine ladders in the middle, TLR15 ectodomain had an intact asparagine ladder that is critical to maintain the overall shape of ectodomain. The conservation analysis found that TLR15 ectodomain had a highly evolutionarily conserved region on the convex surface of LRR11 module, which is probably involved in TLR15 activation process. Furthermore, the protein–protein docking analysis indicated that TLR15 TIR domains have the potential to form homodimers, the predicted interaction interface of TIR dimer was formed mainly by residues from the BB-loops and αC-helixes. Although TLR15 mainly underwent purifying selection, we detected 27 sites under positive selection for TLR15, 24 of which are located on its ectodomain. Our observations suggest the structural features of TLR15 which may be relevant to its function, but which requires further experimental validation. PMID:27257554
Xu, Jiahong; Liu, Yang; Xie, Yuan
Exercise-induced physiological cardiac hypertrophy is generally considered to be a type of adaptive change after exercise training and is beneficial for cardiovascular diseases. This study aims at investigating exercise-regulated microRNAs (miRNAs) and their potential biological pathways. Here, we collected 23 miRNAs from 8 published studies. MirPath v.3 from the DIANA tools website was used to execute the analysis, and TargetScan was used to predict the target genes. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed to identify potential pathways and functional annotations associated with exercise-induced physiological cardiac hypertrophy. Various miRNA targets and molecular pathways, such as Fatty acid elongation, Arrhythmogenic right ventricular cardiomyopathy (ARVC), and ECM-receptor interaction, were identified. This study could prompt the understanding of the regulatory mechanisms underlying exercise-induced physiological cardiac hypertrophy. PMID:28286759
Xu, Jiahong; Liu, Yang; Xie, Yuan; Zhao, Cuimei; Wang, Hongbao
Exercise-induced physiological cardiac hypertrophy is generally considered to be a type of adaptive change after exercise training and is beneficial for cardiovascular diseases. This study aims at investigating exercise-regulated microRNAs (miRNAs) and their potential biological pathways. Here, we collected 23 miRNAs from 8 published studies. MirPath v.3 from the DIANA tools website was used to execute the analysis, and TargetScan was used to predict the target genes. Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed to identify potential pathways and functional annotations associated with exercise-induced physiological cardiac hypertrophy. Various miRNA targets and molecular pathways, such as Fatty acid elongation, Arrhythmogenic right ventricular cardiomyopathy (ARVC), and ECM-receptor interaction, were identified. This study could prompt the understanding of the regulatory mechanisms underlying exercise-induced physiological cardiac hypertrophy.
Yu, Shu-lin; Huang, Lu-qi; Yuan, Yuan; Qi, Lin-jie; Liu, Da-hui
To obtain the key genes for chlorogenic acid biosynthesis of Lonicera hypoglauca, four new genes ware obtained from the our dataset of L. hypoglauca. And we also predicted the structure and function of LHPAL4, LHHCT1 , LHHCT2 and LHHCT3 proteins. The phylogenetic tree showed that LHPAL4 was closely related with LHPAL1, LHHCT1 was closely related with LHHCT3, LHHCT2 clustered into a single group. By Real-time PCR to detect the gene expressed level in different organs of L. hypoglauca, we found that the transcripted level of LHPAL4, LHHCT1 and LHHCT3 was the highest in defeat flowers, and the transcripted level of LHHCT2 was the highest in leaves. These result provided a basis to further analysis the mechanism of active ingredients in different organs, as well as the element for in vitro biosynthesis of active ingredients.
Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine
As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543
Liang, Bin; Li, Chunning; Zhao, Jianying
Colorectal cancer (CRC) is the most common malignant tumor of digestive system. The aim of this study was to identify gene signatures during CRC and uncover their potential mechanisms. The gene expression profiles of GSE21815 were downloaded from GEO database. The GSE21815 dataset contained 141 samples, including 132 CRC and 9 normal colon epitheliums. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) enrichment analyses were performed, and protein-protein interaction (PPI) network of the differentially expressed genes (DEGs) was constructed by Cytoscape software. In total, 3500 DEGs were identified in CRC, including 1370 up-regulated genes and 2130 down-regulated genes. GO analysis results showed that up-regulated DEGs were significantly enriched in biological processes (BP), including cell cycle, cell division, and cell proliferation; the down-regulated DEGs were significantly enriched in biological processes, including immune response, intracellular signaling cascade and defense response. KEGG pathway analysis showed the up-regulated DEGs were enriched in cell cycle and DNA replication, while the down-regulated DEGs were enriched in drug metabolism, metabolism of xenobiotics by cytochrome P450, and retinol metabolism pathways. The top 10 hub genes, GNG2, AGT, SAA1, ADCY5, LPAR1, NMU, IL8, CXCL12, GNAI1, and CCR2 were identified from the PPI network, and sub-networks revealed these genes were involved in significant pathways, including G protein-coupled receptors signaling pathway, gastrin-CREB signaling pathway via PKC and MAPK, and extracellular matrix organization. In conclusion, the present study indicated that the identified DEGs and hub genes promote our understanding of the molecular mechanisms underlying the development of CRC, and might be used as molecular targets and diagnostic biomarkers for the treatment of CRC.
Shen, Yinzhou; Wang, Xuelei; Jin, Yongchao; Lu, Jiasun; Qiu, Guangming; Wen, Xiaofei
The goal of this study was to identify cancer-associated differentially expressed genes (DEGs), analyze their biological functions and investigate the mechanism(s) of cancer occurrence and development, which may provide a theoretical foundation for bladder cancer (BCa) therapy. We downloaded the mRNA expression profiling dataset GSE13507 from the Gene Expression Omnibus database; the dataset includes 165 BCa and 68 control samples. T‑tests were used to identify DEGs. To further study the biological functions of the identified DEGs, we performed a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. Next, we built a network of potentially interacting pathways to study the synergistic relationships among DEGs. A total of 12,105 genes were identified as DEGs, of which 5,239 were upregulated and 6,866 were downregulated in BCa. The DEGs encoding activator protein 1 (AP-1), nuclear factor of activated T-cells (NFAT) proteins, nuclear factor κ-light-chain-enhancer of activated B cells (NF-κB) and interleukin (IL)-10 were revealed to participate in the significantly enriched immune pathways that were downregulated in BCa. KEGG enrichment analysis revealed 7 significantly upregulated and 47 significantly downregulated pathways enriched among the DEGs. We found a crosstalk interaction among a total of 44 pathways in the network of BCa-affected pathways. In conclusion, our results show that BCa involves dysfunctions in multiple systems. Our study is expected to pave ways for immune and inflammatory research and provide molecular insights for cancer therapy.
Yan, Ming; Song, Maomin; Bai, Rixing; Cheng, Shi; Yan, Wenmao
The aim of the present study was to identify potential therapeutic targets for colorectal cancer (CRC). The gene expression profile GSE32323, containing 34 samples, including 17 specimens of CRC tissues and 17 of paired normal tissues from CRC patients, was downloaded from the Gene Expression Omnibus database. Following data preprocessing using the Affy and preprocessCore packages, the differentially-expressed genes (DEGs) between the two types of samples were identified with the Linear Models for Microarray Analysis package. Next, functional and pathway enrichment analysis of the DEGs was performed using the Database for Annotation Visualization and Integrated Discovery. The protein-protein interaction (PPI) network was established using the Search Tool for the Retrieval of Interacting Genes database. Utilizing WebGestalt, the potential microRNAs (miRNAs/miRs) of the DEGs were screened and the integrated miRNA-target network was built. A cohort of 1,347 DEGs was identified, the majority of which were mainly enriched in cell cycle-related biological processes and pathways. Cyclin-dependent kinase 1 (CDK1), cyclin B1 (CCNB1), MAD2 mitotic arrest deficient-like 1 (MAD2L1) and BUB1 mitotic checkpoint serine/threonine kinase B (BUB1B) were prominent in the PPI network, while the over-represented genes in the integrated miRNA-target network were SRY (sex determining region Y)-box 4 (SOX4; targeted by hsa-mir-129), v-myc avian myelocytomatosis viral oncogene homolog (MYC; targeted by hsa-let-7c and hsa-mir-145) and cyclin D1 (CCND1; targeted by hsa-let-7b). CDK1, CCNB1 and CCND1 were also associated with the p53 signaling pathway. Overall, several genes associated with the cell cycle and p53 pathway were identified as biomarkers for CRC. CDK1, CCNB1, MAD2L1, BUB1B, SOX4, collagen type I α2 chain and MYC may play significant roles in CRC progression by affecting the cell cycle-related pathways, while CDK1, CCNB1 and CCND1 may serve as crucial regulators in the p53
Shilpi, Arunima; Bi, Yingtao; Jung, Segun; Patra, Samir K.; Davuluri, Ramana V.
INTRODUCTION Breast cancer being a multifaceted disease constitutes a wide spectrum of histological and molecular variability in tumors. However, the task for the identification of these variances is complicated by the interplay between inherited genetic and epigenetic aberrations. Therefore, this study provides an extrapolate outlook to the sinister partnership between DNA methylation and single-nucleotide polymorphisms (SNPs) in relevance to the identification of prognostic markers in breast cancer. The effect of these SNPs on methylation is defined as methylation quantitative trait loci (meQTL). MATERIALS AND METHODS We developed a novel method to identify prognostic gene signatures for breast cancer by integrating genomic and epigenomic data. This is based on the hypothesis that multiple sources of evidence pointing to the same gene or pathway are likely to lead to reduced false positives. We also apply random resampling to reduce overfitting noise by dividing samples into training and testing data sets. Specifically, the common samples between Illumina 450 DNA methylation, Affymetrix SNP array, and clinical data sets obtained from the Cancer Genome Atlas (TCGA) for breast invasive carcinoma (BRCA) were randomly divided into training and test models. An intensive statistical analysis based on log-rank test and Cox proportional hazard model has established a significant association between differential methylation and the stratification of breast cancer patients into high- and low-risk groups, respectively. RESULTS The comprehensive assessment based on the conjoint effect of CpG–SNP pair has guided in delaminating the breast cancer patients into the high- and low-risk groups. In particular, the most significant association was found with respect to cg05370838–rs2230576, cg00956490–rs940453, and cg11340537–rs2640785 CpG–SNP pairs. These CpG–SNP pairs were strongly associated with differential expression of ADAM8, CREB5, and EXPH5 genes, respectively
Jin, Yazhong; Zhang, Chong; Liu, Wei; Qi, Hongyan; Chen, Hao; Cao, Songxiao
Cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis. However, little was known about CADs in melon. Five CAD-like genes were identified in the genome of melons, namely CmCAD1 to CmCAD5. The signal peptides analysis and CAD proteins prediction showed no typical signal peptides were found in all CmCADs and CmCAD proteins may locate in the cytoplasm. Multiple alignments implied that some motifs may be responsible for the high specificity of these CAD proteins, and may be one of the key residues in the catalytic mechanism. The phylogenetic tree revealed seven groups of CAD and melon CAD genes fell into four main groups. CmCAD1 and CmCAD2 belonged to the bona fide CAD group, in which these CAD genes, as representative from angiosperms, were involved in lignin synthesis. Other CmCADs were distributed in group II, V and VII, respectively. Semi-quantitative PCR and real time qPCR revealed differential expression of CmCADs, and CmCAD5 was expressed in different vegetative tissues except mature leaves, with the highest expression in flower, while CmCAD2 and CmCAD5 were strongly expressed in flesh during development. Promoter analysis revealed several motifs of CAD genes involved in the gene expression modulated by various hormones. Treatment of abscisic acid (ABA) elevated the expression of CmCADs in flesh, whereas the transcript levels of CmCAD1 and CmCAD5 were induced by auxin (IAA); Ethylene induced the expression of CmCADs, while 1-MCP repressed the effect, apart from CmCAD4. Taken together, these data suggested that CmCAD4 may be a pseudogene and that all other CmCADs may be involved in the lignin biosynthesis induced by both abiotic and biotic stresses and in tissue-specific developmental lignification through a CAD genes family network, and CmCAD2 may be the main CAD enzymes for lignification of melon flesh and CmCAD5 may also function in flower development.
Jin, Yazhong; Zhang, Chong; Liu, Wei; Qi, Hongyan; Chen, Hao; Cao, Songxiao
Cinnamyl alcohol dehydrogenase (CAD) is a key enzyme in lignin biosynthesis. However, little was known about CADs in melon. Five CAD-like genes were identified in the genome of melons, namely CmCAD1 to CmCAD5. The signal peptides analysis and CAD proteins prediction showed no typical signal peptides were found in all CmCADs and CmCAD proteins may locate in the cytoplasm. Multiple alignments implied that some motifs may be responsible for the high specificity of these CAD proteins, and may be one of the key residues in the catalytic mechanism. The phylogenetic tree revealed seven groups of CAD and melon CAD genes fell into four main groups. CmCAD1 and CmCAD2 belonged to the bona fide CAD group, in which these CAD genes, as representative from angiosperms, were involved in lignin synthesis. Other CmCADs were distributed in group II, V and VII, respectively. Semi-quantitative PCR and real time qPCR revealed differential expression of CmCADs, and CmCAD5 was expressed in different vegetative tissues except mature leaves, with the highest expression in flower, while CmCAD2 and CmCAD5 were strongly expressed in flesh during development. Promoter analysis revealed several motifs of CAD genes involved in the gene expression modulated by various hormones. Treatment of abscisic acid (ABA) elevated the expression of CmCADs in flesh, whereas the transcript levels of CmCAD1 and CmCAD5 were induced by auxin (IAA); Ethylene induced the expression of CmCADs, while 1-MCP repressed the effect, apart from CmCAD4. Taken together, these data suggested that CmCAD4 may be a pseudogene and that all other CmCADs may be involved in the lignin biosynthesis induced by both abiotic and biotic stresses and in tissue-specific developmental lignification through a CAD genes family network, and CmCAD2 may be the main CAD enzymes for lignification of melon flesh and CmCAD5 may also function in flower development. PMID:25019207
Shi, Ke-Qing; Lin, Zhuo; Chen, Xiang-Jian; Song, Mei; Wang, Yu-Qun; Cai, Yi-Jing; Yang, Nai-Bing; Zheng, Ming-Hua; Dong, Jin-Zhong; Zhang, Lei; Chen, Yong-Ping
microRNA (miRNA) expression profiles varied greatly among current studies due to different technological platforms and small sample size. Systematic and integrative analysis of published datesets that compared the miRNA expression profiles between hepatocellular carcinoma (HCC) tissue and paired adjacent noncancerous liver tissue was performed to determine candidate HCC associated miRNAs. Moreover, we further validated the confirmed miRNAs in a clinical setting using qRT-PCR and Tumor Cancer Genome Atlas (TCGA) dataset. A miRNA integrated-signature of 5 upregulated and 8 downregulated miRNAs was identified from 26 published datesets in HCC using robust rank aggregation method. qRT-PCR demonstrated that miR-93-5p, miR-224-5p, miR-221-3p and miR-21-5p was increased, whereas the expression of miR-214-3p, miR-199a-3p, miR-195-5p, miR-150-5p and miR-145-5p was decreased in the HCC tissues, which was also validated on TCGA dataset. A miRNA based score using LASSO regression model provided a high accuracy for identifying HCC tissue (AUC = 0.982): HCC risk score = 0.180E_miR-221 + 0.0262E_miR-21 - 0.007E_miR-223 - 0.185E_miR-130a. E_miR-n = Log 2 (expression of microRNA n). Furthermore, expression of 5 miRNAs (miR-222, miR-221, miR-21 miR-214 and miR-130a) correlated with pathological tumor grade. Cox regression analysis showed that miR-21 was related with 3-year survival (hazard ratio [HR]: 1.509, 95%CI: 1.079–2.112, P = 0.016) and 5-year survival (HR: 1.416, 95%CI: 1.057–1.897, P = 0.020). However, none of the deregulated miRNAs was related with microscopic vascular invasion. This study provides a basis for further clinical application of miRNAs in HCC. PMID:26231037
Khobragade, Chandrahasya N.
A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713). Genome-to-Genome Distance (GGDC) showed high similarity to Pseudoalteromonas haloplanktis (X67024). The generated unique Quick Response (QR) codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR) showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR) indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates) using MEGA6 software. Principal Component Analysis (PCA) was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification. PMID:27882328
Zhou, Xiaobo; Wong, Stephen T. C.
The premise of today’s drug development is that the mechanism of a disease is highly dependent upon underlying signaling and cellular pathways. Such pathways are often composed of complexes of physically interacting genes, proteins, or biochemical activities coordinated by metabolic intermediates, ions, and other small solutes and are investigated with molecular biology approaches in genomics, proteomics, and metabonomics. Nevertheless, the recent declines in the pharmaceutical industry’s revenues indicate such approaches alone may not be adequate in creating successful new drugs. Our observation is that combining methods of genomics, proteomics, and metabonomics with techniques of bioimaging will systematically provide powerful means to decode or better understand molecular interactions and pathways that lead to disease and potentially generate new insights and indications for drug targets. The former methods provide the profiles of genes, proteins, and metabolites, whereas the latter techniques generate objective, quantitative phenotypes correlating to the molecular profiles and interactions. In this paper, we describe pathway reconstruction and target validation based on the proposed systems biologic approach and show selected application examples for pathway analysis and drug screening. PMID:20011613
Sakabe, Noboru Jo; Vibranovski, Maria Dulcetti; de Souza, Sandro José
Alternative splicing increases protein diversity through the generation of different mRNA molecules from the same gene. Although alternative splicing seems to be a widespread phenomenon in the human transcriptome, it is possible that different subgroups of genes present different patterns, related to their biological roles. Analysis of a subgroup may enhance common features of its members that would otherwise disappear amidst a heterogeneous population. Extracellular matrix (ECM) proteins are a good set for such analyses since they are structurally and functionally related. This family of proteins is involved in a large variety of functions, probably achieved by the combinatorial use of protein domains through exon shuffling events. To determine if ECM genes have a different pattern of alternative splicing, we compared clusters of expressed sequences of ECM to all other genes regarding features related to the most frequent type of alternative splicing, alternative exon usage (AEU), such as: the number of alternative exon-intron structures per cluster, the number of AEU events per exon-intron structure, the number of exons per event, among others. Although we did not find many differences between the two sets, we observed a higher frequency of AEU events involving entire protein domains in the ECM set, a feature that could be associated with their multi-domain nature. As other subgroups or even the ECM set in different tissues could present distinct patterns of AEU, it may be premature to conclude that alternative splicing is homogeneous among groups of related genes.
Rentería, Miguel E.; Gandhi, Neha S.; Vinuesa, Pablo; Helmerhorst, Erik; Mancera, Ricardo L.
The insulin receptor (IR), the insulin-like growth factor 1 receptor (IGF1R) and the insulin receptor-related receptor (IRR) are covalently-linked homodimers made up of several structural domains. The molecular mechanism of ligand binding to the ectodomain of these receptors and the resulting activation of their tyrosine kinase domain is still not well understood. We have carried out an amino acid residue conservation analysis in order to reconstruct the phylogeny of the IR Family. We have confirmed the location of ligand binding site 1 of the IGF1R and IR. Importantly, we have also predicted the likely location of the insulin binding site 2 on the surface of the fibronectin type III domains of the IR. An evolutionary conserved surface on the second leucine-rich domain that may interact with the ligand could not be detected. We suggest a possible mechanical trigger of the activation of the IR that involves a slight ‘twist’ rotation of the last two fibronectin type III domains in order to face the likely location of insulin. Finally, a strong selective pressure was found amongst the IRR orthologous sequences, suggesting that this orphan receptor has a yet unknown physiological role which may be conserved from amphibians to mammals. PMID:18989367
Zhou, Yangyun; Zhou, Xun; Li, Qing; Chen, Junfeng; Xiao, Ying; Zhang, Lei; Chen, Wansheng
Production of major effective metabolites, tanshinones and lithospermic acid B (LAB), was dramatically enhanced by exogenous jasmonate (JA) treatment in Salvia miltiorrhiza. However, the molecular mechanism of such metabolic activation in S. miltiorrhiza has not been elucidated yet. Here, we focused on jasmonate ZIM-domain (JAZ) proteins that act as repressors of JA signaling. Open reading frames of two novel genes, SmJAZ1 and SmJAZ2, from S. miltiorrhiza were amplified according to the annotation of S. miltiorrhiza transcriptome. Compared to plant JAZs, SmJAZ1 and SmJAZ2 were clustered into different groups by phylogenetic analysis. Organ expression pattern was studied by real-time quantitative PCR (RT-qPCR), showing higher transcription level of both genes in stems than roots and leaves. The two SmJAZs responded to methyl jasmonate at early stage and the transcriptional level significantly increased at 4 H. Our experimental results indicate that SmJAZ1 and SmJAZ2 are JA responsive and presented similar expression trend in JA response. The whole research will certainly facilitate further characterization of JAs effect on effective metabolites and help to ultimately achieve high yield of target compounds (tanshinones and LAB).
Hu, Ya-ting; Gao, Wei; Liu, Yu-jia; Cheng, Qi-qing; Su, Ping; Liu, Yu-zhong; Chen, Min
Based on the transcriptome database of Salvia miltiorrhiza, specific primers were designed to clone a full-length cDNA of ent-kaurene oxidase synthase (SmKOL) using the RACE strategy. ORF Finder was used to find the open reading frame of SmKOL cDNA, and ClustalW has been performed to analysis the multiple amino acid sequence alignment. Phylogenetic tree has been constructed using MEGA 5.1. The transcription level of SmKOL from the hairy roots induced by elicitor methyl jasmonate (MeJA) was qualifiedby real-time quantitative PCR. The full length of SmKOL cDNA was of 1 884 bp nucleotides encoding 519 amino acids. The molecular weight of the SmKOL protein was about 58.88 kDa with isoelectric point (pI) of 7.62. Results of real-time quantitative PCR analyses indicated that the level of SmKOL mRNA expression in hairy roots was increased by elicitor oMeJA, and reached maximum in 36 h. The full-length cDNA of SmKOL was cloned from S. miltiorrhiza hairy root, which provides a target gene for further studies of its function, gibberellin biosynthesis and regulation of secondary metabolites.
Huang, Cui-Qin; Gasser, Robin B.; Cantacessi, Cinzia; Nisbet, Alasdair J.; Zhong, Weiwei; Sternberg, Paul W.; Loukas, Alex; Mulvenna, Jason; Lin, Rui-Qing; Chen, Ning; Zhu, Xing-Quan
Differential transcription in Ascaris suum was investigated using a genomic-bioinformatic approach. A cDNA archive enriched for molecules in the infective third-stage larva (L3) of A. suum was constructed by suppressive-subtractive hybridization (SSH), and a subset of cDNAs from 3075 clones subjected to microarray analysis using cDNA probes derived from RNA from different developmental stages of A. suum. The cDNAs (n = 498) shown by microarray analysis to be enriched in the L3 were sequenced and subjected to bioinformatic analyses using a semi-automated pipeline (ESTExplorer). Using gene ontology (GO), 235 of these molecules were assigned to ‘biological process’ (n = 68), ‘cellular component’ (n = 50), or ‘molecular function’ (n = 117). Of the 91 clusters assembled, 56 molecules (61.5%) had homologues/orthologues in the free-living nematodes Caenorhabditis elegans and C. briggsae and/or other organisms, whereas 35 (38.5%) had no significant similarity to any sequences available in current gene databases. Transcripts encoding protein kinases, protein phosphatases (and their precursors), and enolases were abundantly represented in the L3 of A. suum, as were molecules involved in cellular processes, such as ubiquitination and proteasome function, gene transcription, protein–protein interactions, and function. In silico analyses inferred the C. elegans orthologues/homologues (n = 50) to be involved in apoptosis and insulin signaling (2%), ATP synthesis (2%), carbon metabolism (6%), fatty acid biosynthesis (2%), gap junction (2%), glucose metabolism (6%), or porphyrin metabolism (2%), although 34 (68%) of them could not be mapped to a specific metabolic pathway. Small numbers of these 50 molecules were predicted to be secreted (10%), anchored (2%), and/or transmembrane (12%) proteins. Functionally, 17 (34%) of them were predicted to be associated with (non-wild-type) RNAi phenotypes in C. elegans, the majority being embryonic lethality
Jin, Yazhong; Zhang, Chong; Liu, Wei; Tang, Yufan; Qi, Hongyan; Chen, Hao; Cao, Songxiao
Alcohol dehydrogenases (ADH), encoded by multigene family in plants, play a critical role in plant growth, development, adaptation, fruit ripening and aroma production. Thirteen ADH genes were identified in melon genome, including 12 ADHs and one formaldehyde dehydrogenease (FDH), designated CmADH1-12 and CmFDH1, in which CmADH1 and CmADH2 have been isolated in Cantaloupe. ADH genes shared a lower identity with each other at the protein level and had different intron-exon structure at nucleotide level. No typical signal peptides were found in all CmADHs, and CmADH proteins might locate in the cytoplasm. The phylogenetic tree revealed that 13 ADH genes were divided into three groups respectively, namely long-, medium-, and short-chain ADH subfamily, and CmADH1,3-11, which belongs to the medium-chain ADH subfamily, fell into six medium-chain ADH subgroups. CmADH12 may belong to the long-chain ADH subfamily, while CmFDH1 may be a Class III ADH and serve as an ancestral ADH in melon. Expression profiling revealed that CmADH1, CmADH2, CmADH10 and CmFDH1 were moderately or strongly expressed in different vegetative tissues and fruit at medium and late developmental stages, while CmADH8 and CmADH12 were highly expressed in fruit after 20 days. CmADH3 showed preferential expression in young tissues. CmADH4 only had slight expression in root. Promoter analysis revealed several motifs of CmADH genes involved in the gene expression modulated by various hormones, and the response pattern of CmADH genes to ABA, IAA and ethylene were different. These CmADHs were divided into ethylene-sensitive and –insensitive groups, and the functions of CmADHs were discussed. PMID:27242871
Hao, Ruixin; Su, Shengzhong; Wan, Yinan; Shen, Frank; Niu, Ben; Coslo, Denise M; Albert, Istvan; Han, Xing; Omiecinski, Curtis J
The constitutive androstane receptor (CAR; NR1I3) is a member of the nuclear receptor superfamily that functions as a xenosensor, serving to regulate xenobiotic detoxification, lipid homeostasis and energy metabolism. CAR activation is also a key contributor to the development of chemical hepatocarcinogenesis in mice. The underlying pathways affected by CAR in these processes are complex and not fully elucidated. MicroRNAs (miRNAs) have emerged as critical modulators of gene expression and appear to impact many cellular pathways, including those involved in chemical detoxification and liver tumor development. In this study, we used deep sequencing approaches with an Illumina HiSeq platform to differentially profile microRNA expression patterns in livers from wild type C57BL/6J mice following CAR activation with the mouse CAR-specific ligand activator, 1,4-bis-[2-(3,5,-dichloropyridyloxy)] benzene (TCPOBOP). Bioinformatic analyses and pathway evaluations were performed leading to the identification of 51 miRNAs whose expression levels were significantly altered by TCPOBOP treatment, including mmu-miR-802-5p and miR-485-3p. Ingenuity Pathway Analysis of the differentially expressed microRNAs revealed altered effector pathways, including those involved in liver cell growth and proliferation. A functional network among CAR targeted genes and the affected microRNAs was constructed to illustrate how CAR modulation of microRNA expression may potentially mediate its biological role in mouse hepatocyte proliferation. This article is part of a Special Issue entitled: Xenobiotic nuclear receptors: New Tricks for An Old Dog, edited by Dr. Wen Xie.
Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza
Background Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. Objectives The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. Materials and Methods To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Results Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Conclusions Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV. PMID:26862379
Abouelhoda, Mohamed; Ghanem, Moustafa
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages , fighting spam mails , detecting plagiarism , and spotting duplications in software systems .
Abouelhoda, Mohamed; Ghanem, Moustafa
Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages , fighting spam mails , detecting plagiarism , and spotting duplications in software systems .
Bioinformatic evaluation of L-arginine catabolic pathways in 24 cyanobacteria and transcriptional analysis of genes encoding enzymes of L-arginine catabolism in the cyanobacterium Synechocystis sp. PCC 6803
Schriek, Sarah; Rückert, Christian; Staiger, Dorothee; Pistorius, Elfriede K; Michel, Klaus-Peter
Background So far very limited knowledge exists on L-arginine catabolism in cyanobacteria, although six major L-arginine-degrading pathways have been described for prokaryotes. Thus, we have performed a bioinformatic analysis of possible L-arginine-degrading pathways in cyanobacteria. Further, we chose Synechocystis sp. PCC 6803 for a more detailed bioinformatic analysis and for validation of the bioinformatic predictions on L-arginine catabolism with a transcript analysis. Results We have evaluated 24 cyanobacterial genomes of freshwater or marine strains for the presence of putative L-arginine-degrading enzymes. We identified an L-arginine decarboxylase pathway in all 24 strains. In addition, cyanobacteria have one or two further pathways representing either an arginase pathway or L-arginine deiminase pathway or an L-arginine oxidase/dehydrogenase pathway. An L-arginine amidinotransferase pathway as a major L-arginine-degrading pathway is not likely but can not be entirely excluded. A rather unusual finding was that the cyanobacterial L-arginine deiminases are substantially larger than the enzymes in non-photosynthetic bacteria and that they are membrane-bound. A more detailed bioinformatic analysis of Synechocystis sp. PCC 6803 revealed that three different L-arginine-degrading pathways may in principle be functional in this cyanobacterium. These are (i) an L-arginine decarboxylase pathway, (ii) an L-arginine deiminase pathway, and (iii) an L-arginine oxidase/dehydrogenase pathway. A transcript analysis of cells grown either with nitrate or L-arginine as sole N-source and with an illumination of 50 μmol photons m-2 s-1 showed that the transcripts for the first enzyme(s) of all three pathways were present, but that the transcript levels for the L-arginine deiminase and the L-arginine oxidase/dehydrogenase were substantially higher than that of the three isoenzymes of L-arginine decarboxylase. Conclusion The evaluation of 24 cyanobacterial genomes revealed that
Gary J. Olsen
Nesbo, Boucher and Doolittle (2001) used phylogenetic trees of four taxa to assess whether euryarchaeal genes share a common history. They have suggested that of the 521 genes examined, each of the three possible tree topologies relating the four taxa was supported essentially equal numbers of times. They suggest that this might be the result of numerous horizontal gene transfer events, essentially randomizing the relationships between gene histories (as inferred in the 521 gene trees) and organismal relationships (which would be a single underlying tree). Motivated by the fact that the order in which sequences are added to a multiple sequence alignment influences the alignment, and ultimately inferred tree, they were interested in the extent to which the variations among inferred trees might be due to variations in the alignment order. This bears directly on their efforts to evaluate and improve upon methods of multiple sequence alignment. They set out to analyze the influence of alignment order on the tree inferred for 43 genes shared among these same 4 taxa. Because alignments produced by CLUSTALW are directed by a rooted guide tree (the denderogram), there are 15 possible alignment orders of 4 taxa. For each gene they tested all 15 alignment orders, and as a 16th option, allowed CLUSTALW to generate its own guide tree. If we supply all 15 possible rooted guide trees, they expected that at least one of them should be as good at CLUSTAL's own guide tree, but most of the time they differed (sometimes being better than CLUSTAL's default tree and sometimes being worse). The difference seems to be that the user-supplied tree is not given meaningful branch lengths, which effect the assumed probability of amino acid changes. They examined the practicality of modifying CLUSTALW to improve its treatment of user-supplied guide trees. This work became ever increasing bogged down in finding and repairing minor bugs in the CLUSTALW code. This effort was put on hold as we feel that our other proposed approaches will ultimately be better.
Zhang, Dong-Mei; Feng, Li-Xing; Li, Lu; Liu, Miao; Jiang, Bao-Hong; Yang, Min; Li, Guo-Qiang; Wu, Wan-Ying; Guo, De-An; Liu, Xuan
The sea dragon Solenognathus hardwickii has long been used as a traditional Chinese medicine for the treatment of various diseases, such as male impotency. To gain a comprehensive insight into the protein components of the sea dragon, shotgun proteomic analysis of its protein expression profiling was conducted in the present study. Proteins were extracted from dried sea dragon using a trichloroacetic acid/acetone precipitation method and then separated by SDS-PAGE. The protein bands were cut from the gel and digested by trypsin to generate peptide mixture. The peptide fragments were then analyzed using nano liquid chromatography tandem mass spectrometry (nano-LC-ESI MS/MS). 810 proteins and 1 577 peptides were identified in the dried sea dragon. The identified proteins exhibited molecular weight values ranging from 1 900 to 3 516 900 Da and pI values from 3.8 to 12.18. Bioinformatic analysis was conducted using the DAVID Bioinformatics Resources 6.7 Gene Ontology (GO) analysis tool to explore possible functions of the identified proteins. Ascribed functions of the proteins mainly included intracellular non-membrane-bound organelle, non-membrane-bounded organelle, cytoskeleton, structural molecule activity, calcium ion binding and etc. Furthermore, possible signal networks of the identified proteins were predicted using STRING (Search Tool for the Retrieval of Interacting Genes) database. Ribosomal protein synthesis was found to play an important role in the signal network. The results of this study, to best of our knowledge, were the first to provide a reference proteome profile for the sea dragon, and would aid in the understanding of the expression and functions of the identified proteins.
Ho, Eric S; Kuchie, Joan; Duffy, Siobain
Begomovirus (genus Begomovirus, family Geminiviridae) infection is devastating to a wide variety of agricultural crops including tomato, squash, and cassava. Thus, understanding the replication and adaptation of begomoviruses has important translational value in alleviating substantial economic loss, particularly in developing countries. The bipartite genome of begomoviruses prevalent in the New World and their counterparts in the Old World share a high degree of genome homology except for a partially overlapping reading frame encoding the pre-coat protein (PCP, or AV2). PCP contributes to the essential functions of intercellular movement and suppression of host RNA silencing, but it is only present in the Old World viruses. In this study, we analyzed a set of non-redundant bipartite begomovirus genomes originating from the Old World (N = 28) and the New World (N = 65). Our bioinformatic analysis suggests ∼ 120 nucleotides were deleted from PCP's proximal promoter region that may have contributed to its loss in the New World viruses. Consequently, genomes of the New World viruses are smaller than the Old World counterparts, possibly compensating for the loss of the intercellular movement functions of PCP. Additionally, we detected substantial purifying selection on a portion of the New World DNA-B movement protein (MP, or BC1). Further analysis of the New World MP gene revealed the emergence of a putative tyrosine phosphorylation site, which likely explains the increased purifying selection in that region. These findings provide important information about the strategies adopted by bipartite begomoviruses in adapting to new environment and suggest future in planta experiments.
Petty, Tom J; Cordey, Samuel; Padioleau, Ismael; Docquier, Mylène; Turin, Lara; Preynat-Seauve, Olivier; Zdobnov, Evgeny M; Kaiser, Laurent
High-throughput sequencing (HTS) provides the means to analyze clinical specimens in unprecedented molecular detail. While this technology has been successfully applied to virus discovery and other related areas of research, HTS methodology has yet to be exploited for use in a clinical setting for routine diagnostics. Here, a bioinformatics pipeline (ezVIR) was designed to process HTS data from any of the standard platforms and to evaluate the entire spectrum of known human viruses at once, providing results that are easy to interpret and customizable. The pipeline works by identifying the most likely viruses present in the specimen given the sequencing data. Additionally, ezVIR can generate optional reports for strain typing, can create genome coverage histograms, and can perform cross-contamination analysis for specimens prepared in series. In this pilot study, the pipeline was challenged using HTS data from 20 clinical specimens representative of those most often collected and analyzed in daily practice. The specimens (5 cerebrospinal fluid, 7 bronchoalveolar lavage fluid, 5 plasma, 2 serum, and 1 nasopharyngeal aspirate) were originally found to be positive for a diverse range of DNA or RNA viruses by routine molecular diagnostics. The ezVIR pipeline correctly identified 14 of 14 specimens containing viruses with genomes of <40,000 bp, and 4 of 6 specimens positive for large-genome viruses. Although further validation is needed to evaluate sensitivity and to define detection cutoffs, results obtained in this pilot study indicate that the overall detection success rate, coupled with the ease of interpreting the analysis reports, makes it worth considering using HTS for clinical diagnostics.
Ho, Eric S.; Kuchie, Joan; Duffy, Siobain
Begomovirus (genus Begomovirus, family Geminiviridae) infection is devastating to a wide variety of agricultural crops including tomato, squash, and cassava. Thus, understanding the replication and adaptation of begomoviruses has important translational value in alleviating substantial economic loss, particularly in developing countries. The bipartite genome of begomoviruses prevalent in the New World and their counterparts in the Old World share a high degree of genome homology except for a partially overlapping reading frame encoding the pre-coat protein (PCP, or AV2). PCP contributes to the essential functions of intercellular movement and suppression of host RNA silencing, but it is only present in the Old World viruses. In this study, we analyzed a set of non-redundant bipartite begomovirus genomes originating from the Old World (N = 28) and the New World (N = 65). Our bioinformatic analysis suggests ∼120 nucleotides were deleted from PCP’s proximal promoter region that may have contributed to its loss in the New World viruses. Consequently, genomes of the New World viruses are smaller than the Old World counterparts, possibly compensating for the loss of the intercellular movement functions of PCP. Additionally, we detected substantial purifying selection on a portion of the New World DNA-B movement protein (MP, or BC1). Further analysis of the New World MP gene revealed the emergence of a putative tyrosine phosphorylation site, which likely explains the increased purifying selection in that region. These findings provide important information about the strategies adopted by bipartite begomoviruses in adapting to new environment and suggest future in planta experiments. PMID:25383632
Charoentong, Pornpimol; Angelova, Mihaela; Efremova, Mirjana; Gallasch, Ralf; Hackl, Hubert; Galon, Jerome; Trajanoski, Zlatko
Recent mechanistic insights obtained from preclinical studies and the approval of the first immunotherapies has motivated increasing number of academic investigators and pharmaceutical/biotech companies to further elucidate the role of immunity in tumor pathogenesis and to reconsider the role of immunotherapy. Additionally, technological advances (e.g., next-generation sequencing) are providing unprecedented opportunities to draw a comprehensive picture of the tumor genomics landscape and ultimately enable individualized treatment. However, the increasing complexity of the generated data and the plethora of bioinformatics methods and tools pose considerable challenges to both tumor immunologists and clinical oncologists. In this review, we describe current concepts and future challenges for the management and analysis of data for cancer immunology and immunotherapy. We first highlight publicly available databases with specific focus on cancer immunology including databases for somatic mutations and epitope databases. We then give an overview of the bioinformatics methods for the analysis of next-generation sequencing data (whole-genome and exome sequencing), epitope prediction tools as well as methods for integrative data analysis and network modeling. Mathematical models are powerful tools that can predict and explain important patterns in the genetic and clinical progression of cancer. Therefore, a survey of mathematical models for tumor evolution and tumor-immune cell interaction is included. Finally, we discuss future challenges for individualized immunotherapy and suggest how a combined computational/experimental approaches can lead to new insights into the molecular mechanisms of cancer, improved diagnosis, and prognosis of the disease and pinpoint novel therapeutic targets.
Collado-Romero, Melania; Aguilar, Carmen; Arce, Cristina; Lucena, Concepción; Codrea, Marius C.; Morera, Luis; Bendixen, Emoke; Moreno, Ángela; Garrido, Juan J.
The enteropathogen Salmonella Typhimurium (S. Typhimurium) is the most commonly non-typhoideal serotype isolated in pig worldwide. Currently, one of the main sources of human infection is by consumption of pork meat. Therefore, prevention and control of salmonellosis in pigs is crucial for minimizing risks to public health. The aim of the present study was to use isobaric tags for relative and absolute quantification (iTRAQ) to explore differences in the response to Salmonella in two segment of the porcine gut (ileum and colon) along a time course of 1, 2, and 6 days post infection (dpi) with S. Typhimurium. A total of 298 proteins were identified in the infected ileum samples of which, 112 displayed significant expression differences due to Salmonella infection. In colon, 184 proteins were detected in the infected samples of which 46 resulted differentially expressed with respect to the controls. The higher number of changes in protein expression was quantified in ileum at 2 dpi. Further biological interpretation of proteomics data using bioinformatics tools demonstrated that the expression changes in colon were found in proteins involved in cell death and survival, tissue morphology or molecular transport at the early stages and tissue regeneration at 6 dpi. In ileum, however, changes in protein expression were mainly related to immunological and infection diseases, inflammatory response or connective tissue disorders at 1 and 2 dpi. iTRAQ has proved to be a proteomic robust approach allowing us to identify ileum as the earliest response focus upon S. Typhimurium in the porcine gut. In addition, new functions involved in the response to bacteria such as eIF2 signaling, free radical scavengers or antimicrobial peptides (AMP) expression have been identified. Finally, the impairment at of the enterohepatic circulation of bile acids and lipid metabolism by means the under regulation of FABP6 protein and FXR/RXR and LXR/RXR signaling pathway in ileum has been
Kunz, Meik; Wolf, Beat; Schulze, Harald; Atlan, David; Walles, Thorsten; Walles, Heike; Dandekar, Thomas
Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs.
Kunz, Meik; Wolf, Beat; Schulze, Harald; Atlan, David; Walles, Thorsten; Walles, Heike; Dandekar, Thomas
Lung cancer is currently the leading cause of cancer related mortality due to late diagnosis and limited treatment intervention. Non-coding RNAs are not translated into proteins and have emerged as fundamental regulators of gene expression. Recent studies reported that microRNAs and long non-coding RNAs are involved in lung cancer development and progression. Moreover, they appear as new promising non-invasive biomarkers for early lung cancer diagnosis. Here, we highlight their potential as biomarker in lung cancer and present how bioinformatics can contribute to the development of non-invasive diagnostic tools. For this, we discuss several bioinformatics algorithms and software tools for a comprehensive understanding and functional characterization of microRNAs and long non-coding RNAs. PMID:28035947
Abascal, María Florencia; Besso, María José; Rosso, Marina; Mencucci, María Victoria; Aparicio, Evangelina; Szapiro, Gala; Furlong, Laura Inés; Vazquez-Levin, Mónica Hebe
Cancer is a group of diseases that causes millions of deaths worldwide. Among cancers, Solid Tumors (ST) stand-out due to their high incidence and mortality rates. Disruption of cell-cell adhesion is highly relevant during tumor progression. Epithelial-cadherin (protein: E-cadherin, gene: CDH1) is a key molecule in cell-cell adhesion and an abnormal expression or/and function(s) contributes to tumor progression and is altered in ST. A systematic study was carried out to gather and summarize current knowledge on CDH1/E-cadherin and ST using bioinformatics resources. The DisGeNET database was exploited to survey CDH1-associated diseases. Reported mutations in specific ST were obtained by interrogating COSMIC and IntOGen tools. CDH1 Single Nucleotide Polymorphisms (SNP) were retrieved from the dbSNP database. DisGeNET analysis identified 609 genes annotated to ST, among which CDH1 was listed. Using CDH1 as query term, 26 disease concepts were found, 21 of which were neoplasms-related terms. Using DisGeNET ALL Databases, 172 disease concepts were identified. Of those, 80 ST disease-related terms were subjected to manual curation and 75/80 (93.75%) associations were validated. On selected ST, 489 CDH1 somatic mutations were listed in COSMIC and IntOGen databases. Breast neoplasms had the highest CDH1-mutation rate. CDH1 was positioned among the 20 genes with highest mutation frequency and was confirmed as driver gene in breast cancer. Over 14,000 SNP for CDH1 were found in the dbSNP database. This report used DisGeNET to gather/compile current knowledge on gene-disease association for CDH1/E-cadherin and ST; data curation expanded the number of terms that relate them. An updated list of CDH1 somatic mutations was obtained with COSMIC and IntOGen databases and of SNP from dbSNP. This information can be used to further understand the role of CDH1/E-cadherin in health and disease.
He, Wen-Xuan; Robert, Shanks; You, Ye-Ming
In the present paper, after middle pressure chromatograph separation using both positive phase and reversed-phase conditions, the organic additives in ethylene-propylene rubber were identified by infrared spectrometer. At the same time, by using solid phase extraction column to maintain the main component-fuel oil in organic additves to avoid its interfering with minor compounds, other organic additves were separated and analysed by GC/Ms. In addition, the remaining active compound such as benzoyl peroxide was identified by CC/Ms, through analyzing acetone extract directly. Using the above mentioned techniques, soften agents (fuel oil, plant oil and phthalte), curing agent (benzoylperoxide), vulcanizing accelerators (2-mercaptobenzothiazole, ethyl thiuram and butyl thiuram), and antiagers (2, 6-Di-tert-butyl-4-methyl phenol and styrenated phenol) in ethylene-propylene rubber were identified. Although the technique was established in ethylene-propylene rubber system, it can be used in other rubber system.
Smalheer, C. V.
The chemistry of lubricant additives is discussed to show what the additives are chemically and what functions they perform in the lubrication of various kinds of equipment. Current theories regarding the mode of action of lubricant additives are presented. The additive groups discussed include the following: (1) detergents and dispersants, (2) corrosion inhibitors, (3) antioxidants, (4) viscosity index improvers, (5) pour point depressants, and (6) antifouling agents.
Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339
Schweighofer, Karl; Pohorille, Andrew
Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.
Lease, Kevin A; Walker, John C
Plant peptides play a number of important roles in defence, development and many other aspects of plant physiology. Identifying additional peptide sequences provides the starting point to investigate their function using molecular, genetic or biochemical techniques. Due to their small size, identifying peptide sequences may not succeed using the default bioinformatic approaches that work well for average-sized proteins. There are two general scenarios related to bioinformatic identification of peptides to be discussed in this paper. In the first scenario, one already has the sequence of a plant peptide and is trying to find more plant peptides with some sequence similarity to the starting peptide. To do this, the Basic Local Alignment Search Tool (BLAST) is employed, with the parameters adjusted to be more favourable for identifying potential peptide matches. A second scenario involves trying to identify plant peptides without using sequence similarity searches to known plant peptides. In this approach, features such as protein size and the presence of a cleavable amino-terminal signal peptide are used to screen annotated proteins. A variation of this method can be used to screen for unannotated peptides from genomic sequences. Bioinformatic resources related to Arabidopsis thaliana will be used to illustrate these approaches.
Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari
Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484
Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael
Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of bioinformatics as a new discipline has challenged many colleges and universities to keep current with their curricula, often in the face of static or dwindling resources. On the plus side, many bioinformatics modules and related databases and software programs are free and accessible online, and interdisciplinary partnerships between existing faculty members and their support staff have proved advantageous in such efforts. We present examples of strategies and methods that have been successfully used to incorporate bioinformatics content into undergraduate curricula. PMID:20810947
Czerny, Claus-Peter; König, Sven; Diesterbeck, Ulrike S.
We have developed a new bioinformatics framework for the analysis of rearranged bovine heavy chain immunoglobulin (Ig) variable regions by combining and refining widely used alignment algorithms. This bioinformatics framework allowed us to investigate alignments of heavy chain framework regions (FRHs) and the separate alignments of FRHs and heavy chain complementarity determining regions (CDRHs) to determine their germline origin in the four cattle breeds Aubrac, German Black Pied, German Simmental, and Holstein Friesian. Now it is also possible to specifically analyze Ig heavy chains possessing exceptionally long CDR3Hs. In order to gain more insight into breed specific differences in Ig combinatorial diversity, somatic hypermutations and putative gene conversions of IgG, we compared the dominantly transcribed variable (IGHV), diversity (IGHD), and joining (IGHJ) segments and their recombination in the four cattle breeds. The analysis revealed the use of 15 different IGHV segments, 21 IGHD segments, and two IGHJ segments with significant different transcription levels within the breeds. Furthermore, there are preferred rearrangements within the three groups of CDR3H lengths. In the sequences of group 2 (CDR3H lengths (L) of 11–47 amino acid residues (aa)) a higher number of recombination was observed than in sequences of group 1 (L≤10 aa) and 3 (L≥48 aa). The combinatorial diversity of germline IGHV, IGHD, and IGHJ-segments revealed 162 rearrangements that were significantly different. The few preferably rearranged gene segments within group 3 CDR3H regions may indicate specialized antibodies because this length is unique in cattle. The most important finding of this study, which was enabled by using the bioinformatics framework, is the discovery of strong evidence for gene conversion as a rare event using pseudogenes fulfilling all definitions for this particular diversification mechanism. PMID:27828971
The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH) architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average) that is homologous to fold type-I pyridoxal 5′-phosphate (PLP) dependent enzymes like aspartate aminotransferase (AAT). These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs). Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups. PMID:27446613
Background There is a rapidly growing awareness that plant peptide signalling molecules are numerous and varied and they are known to play fundamental roles in angiosperm plant growth and development. Two closely related peptide signalling molecule families are the CLAVATA3-EMBRYO-SURROUNDING REGION (CLE) and CLE-LIKE (CLEL) genes, which encode precursors of secreted peptide ligands that have roles in meristem maintenance and root gravitropism. Progress in peptide signalling molecule research in gymnosperms has lagged behind that of angiosperms. We therefore sought to identify CLE and CLEL genes in gymnosperms and conduct a comparative analysis of these gene families with angiosperms. Results We undertook a meta-analysis of the GenBank/EMBL/DDBJ gymnosperm EST database and the Picea abies and P. glauca genomes and identified 93 putative CLE genes and 11 CLEL genes among eight Pinophyta species, in the genera Cryptomeria, Pinus and Picea. The predicted conifer CLE and CLEL protein sequences had close phylogenetic relationships with their homologues in Arabidopsis. Notably, perfect conservation of the active CLE dodecapeptide in presumed orthologues of the Arabidopsis CLE41/44-TRACHEARY ELEMENT DIFFERENTIATION (TDIF) protein, an inhibitor of tracheary element (xylem) differentiation, was seen in all eight conifer species. We cloned the Pinus radiata CLE41/44-TDIF orthologues. These genes were preferentially expressed in phloem in planta as expected, but unexpectedly, also in differentiating tracheary element (TE) cultures. Surprisingly, transcript abundances of these TE differentiation-inhibitors sharply increased during early TE differentiation, suggesting that some cells differentiate into phloem cells in addition to TEs in these cultures. Applied CLE13 and CLE41/44 peptides inhibited root elongation in Pinus radiata seedlings. We show evidence that two CLEL genes are alternatively spliced via 3′-terminal acceptor exons encoding separate CLEL peptides
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for
Pantano, Lorena; Estivill, Xavier; Martí, Eulàlia
High-throughput sequencing technologies enable direct approaches to catalog and analyze snapshots of the total small RNA content of living cells. Characterization of high-throughput sequencing data requires bioinformatic tools offering a wide perspective of the small RNA transcriptome. Here we present SeqBuster, a highly versatile and reliable web-based toolkit to process and analyze large-scale small RNA datasets. The high flexibility of this tool is illustrated by the multiple choices offered in the pre-analysis for mapping purposes and in the different analysis modules for data manipulation. To overcome the storage capacity limitations of the web-based tool, SeqBuster offers a stand-alone version that permits the annotation against any custom database. SeqBuster integrates multiple analyses modules in a unique platform and constitutes the first bioinformatic tool offering a deep characterization of miRNA variants (isomiRs). The application of SeqBuster to small-RNA datasets of human embryonic stem cells revealed that most miRNAs present different types of isomiRs, some of them being associated to stem cell differentiation. The exhaustive description of the isomiRs provided by SeqBuster could help to identify miRNA-variants that are relevant in physiological and pathological processes. SeqBuster is available at http://estivill_lab.crg.es/seqbuster.
Searls, David B.
Online learning initiatives over the past decade have become increasingly comprehensive in their selection of courses and sophisticated in their presentation, culminating in the recent announcement of a number of consortium and startup activities that promise to make a university education on the internet, free of charge, a real possibility. At this pivotal moment it is appropriate to explore the potential for obtaining comprehensive bioinformatics training with currently existing free video resources. This article presents such a bioinformatics curriculum in the form of a virtual course catalog, together with editorial commentary, and an assessment of strengths, weaknesses, and likely future directions for open online learning in this field. PMID:23028269
Mihalas, George I; Tudor, Anca; Paralescu, Sorin; Andor, Minodora; Stoicu-Tivadar, Lacramioara
The paper refers to our methodology and experience in establishing the content of the course in bioinformatics introduced to the school of "Information Systems in Healthcare" (SIIS), master level. The syllabi of both lectures and laboratory works are presented and discussed.
Ye, Jishi; Zhang, Zongze; Wang, Yanlin; Chen, Chang; Xu, Xing; Yu, Hui; Peng, Mian
Although accumulating evidence has suggested that microRNAs (miRNAs) have a serious impact on cognitive function and are associated with the etiology of several neuropsychiatric disorders, their expression in sevoflurane-induced neurotoxicity in the developing brain has not been characterized. In the present study, the miRNAs expression pattern in neonatal hippocampus samples (24 h after sevoflurane exposure) was investigated and 9 miRNAs were selected, which were associated with brain development and cognition in order to perform a bioinformatic analysis. Previous microfluidic chip assay had detected 29 upregulated and 24 downregulated miRNAs in the neonatal rat hippocampus, of which 7 selected deregulated miRNAs were identified by the quantitative polymerase chain reaction. A total of 85 targets of selected deregulated miRNAs were analyzed using bioinformatics and the main enriched metabolic pathways, mitogen-activated protein kinase and Wnt pathways may have been involved in molecular mechanisms with regard to neuronal cell body, dendrite and synapse. The observations of the present study provided a novel understanding regarding the regulatory mechanism of miRNAs underlying sevoflurane-induced neurotoxicity, therefore benefitting the improvement of the prevention and treatment strategies of volatile anesthetics related neurotoxicity. PMID:27588052
Lu, Guoqing; Ni, Jun
The Second Symposium on Computations in Bioinformatics and Bioscience (SCBB07) was held in Iowa City, Iowa, USA, on August 13-15, 2007. This annual event attracted dozens of bioinformatics professionals and students, who are interested in solving emerging computational problems in bioscience, from China, Japan, Taiwan and the United States. The Scientific Committee of the symposium selected 18 peer-reviewed papers for publication in this supplemental issue of BMC Bioinformatics. These papers cover a broad spectrum of topics in computational biology and bioinformatics, including DNA, protein and genome sequence analysis, gene expression and microarray analysis, computational proteomics and protein structure classification, systems biology and machine learning.
Healy, Marion J; Tong, Weida; Ostroff, Stephen; Eichler, Hans-Georg; Patak, Alex; Neuspiel, Margaret; Deluyker, Hubert; Slikker, William
"Regulatory Bioinformatics" strives to develop and implement a standardized and transparent bioinformatic framework to support the implementation of existing and emerging technologies in regulatory decision-making. It has great potential to improve public health through the development and use of clinically important medical products and tools to manage the safety of the food supply. However, the application of regulatory bioinformatics also poses new challenges and requires new knowledge and skill sets. In the latest Global Coalition on Regulatory Science Research (GCRSR) governed conference, Global Summit on Regulatory Science (GSRS2015), regulatory bioinformatics principles were presented with respect to global trends, initiatives and case studies. The discussion revealed that datasets, analytical tools, skills and expertise are rapidly developing, in many cases via large international collaborative consortia. It also revealed that significant research is still required to realize the potential applications of regulatory bioinformatics. While there is significant excitement in the possibilities offered by precision medicine to enhance treatments of serious and/or complex diseases, there is a clear need for further development of mechanisms to securely store, curate and share data, integrate databases, and standardized quality control and data analysis procedures. A greater understanding of the biological significance of the data is also required to fully exploit vast datasets that are becoming available. The application of bioinformatics in the microbiological risk analysis paradigm is delivering clear benefits both for the investigation of food borne pathogens and for decision making on clinically important treatments. It is recognized that regulatory bioinformatics will have many beneficial applications by ensuring high quality data, validated tools and standardized processes, which will help inform the regulatory science community of the requirements
The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508
Johnson, Kathy A.
For the purpose of this paper, bioinformatics is defined as the application of computer technology to the management of biological information. It can be thought of as the science of developing computer databases and algorithms to facilitate and expedite biological research. This is a crosscutting capability that supports nearly all human health areas ranging from computational modeling, to pharmacodynamics research projects, to decision support systems within autonomous medical care. Bioinformatics serves to increase the efficiency and effectiveness of the life sciences research program. It provides data, information, and knowledge capture which further supports management of the bioastronautics research roadmap - identifying gaps that still remain and enabling the determination of which risks have been addressed.
This paper provides an overview of methods and current applications of distributed computing in bioinformatics. Distributed computing is a strategy of dividing a large workload among multiple computers to reduce processing time, or to make use of resources such as programs and databases that are not available on all computers. Participating computers may be connected either through a local high-speed network or through the Internet.
Rasheed, Zafar; Al-Shobaili, Hani A; Rasheed, Naila; Al Salloom, Abdulaziz A M; Al-Shaya, Osama; Mahmood, Amer; Alajez, Nehad M; Alghamdi, Ahmed S S; Mehana, El-Sayed E
This study was undertaken to identify and characterize the globally expressed microRNAs (miRNAs) involved in interleukin-1β (IL-1β)-induced joint damage and to predict whether miRNAs can regulate the catabolic effects in osteoarthritis (OA) chondrocytes. Out of 1347 miRNAs analyzed by microarrays in IL-1β-stimulated OA chondrocytes, 35 miRNAs were down-regulated, 1 miRNA was up-regulated, and the expression of 1311 miRNAs remained unchanged. Bioinformatics analysis showed the key inflammatory mediators and key molecular pathways are targeted by differentially expressed miRNAs. Novel miRNAs identified could have important diagnostic and therapeutic potentials in the development of novel therapeutic strategies for pain managements in OA.
Gaudana, Sandeep B; Zarzycki, Jan; Moparthi, Vamsi K; Kerfeld, Cheryl A
Cyanobacteria have evolved a carbon-concentrating mechanism (CCM) which has enabled them to inhabit diverse environments encompassing a range of inorganic carbon (Ci: [Formula: see text] and CO2) concentrations. Several uptake systems facilitate inorganic carbon accumulation in the cell, which can in turn be fixed by ribulose 1,5-bisphosphate carboxylase/oxygenase. Here we survey the distribution of genes encoding known Ci uptake systems in cyanobacterial genomes and, using a pfam- and gene context-based approach, identify in the marine (alpha) cyanobacteria a heretofore unrecognized number of putative counterparts to the well-known Ci transporters of beta cyanobacteria. In addition, our analysis shows that there is a huge repertoire of transport systems in cyanobacteria of unknown function, many with homology to characterized Ci transporters. These can be viewed as prospective targets for conversion into ancillary Ci transporters through bioengineering. Increasing intracellular Ci concentration coupled with efforts to increase carbon fixation will be beneficial for the downstream conversion of fixed carbon into value-added products including biofuels. In addition to CCM transporter homologs, we also survey the occurrence of rhodopsin homologs in cyanobacteria, including bacteriorhodopsin, a class of retinal-binding, light-activated proton pumps. Because they are light driven and because of the apparent ease of altering their ion selectivity, we use this as an example of re-purposing an endogenous transporter for the augmentation of Ci uptake by cyanobacteria and potentially chloroplasts.
Shtratnikova, Victoria Y; Schelkunov, Mikhail I; Fokina, Victoria V; Pekov, Yury A; Ivashina, Tanya; Donova, Marina V
Actinobacteria comprise diverse groups of bacteria capable of full degradation, or modification of different steroid compounds. Steroid catabolism has been characterized best for the representatives of suborder Corynebacterineae, such as Mycobacteria, Rhodococcus and Gordonia, with high content of mycolic acids in the cell envelope, while it is poorly understood for other steroid-transforming actinobacteria, such as representatives of Nocardioides genus belonging to suborder Propionibacterineae. Nocardioides simplex VKM Ac-2033D is an important biotechnological strain which is known for its ability to introduce ∆(1)-double bond in various 1(2)-saturated 3-ketosteroids, and perform convertion of 3β-hydroxy-5-ene steroids to 3-oxo-4-ene steroids, hydrolysis of acetylated steroids, reduction of carbonyl groups at C-17 and C-20 of androstanes and pregnanes, respectively. The strain is also capable of utilizing cholesterol and phytosterol as carbon and energy sources. In this study, a comprehensive bioinformatics genome-wide screening was carried out to predict genes related to steroid metabolism in this organism, their clustering and possible regulation. The predicted operon structure and number of candidate gene copies paralogs have been estimated. Binding sites of steroid catabolism regulators KstR and KstR2 specified for N. simplex VKM Ac-2033D have been calculated de novo. Most of the candidate genes grouped within three main clusters, one of the predicted clusters having no analogs in other actinobacteria studied so far. The results offer a base for further functional studies, expand the understanding of steroid catabolism by actinobacteria, and will contribute to modifying of metabolic pathways in order to generate effective biocatalysts capable of producing valuable bioactive steroids.
Lu, J C; Zhang, Y P
In this study, we examined the molecular mechanism of thyroid carcinoma (THCA) using bioinformatics. RNA-sequencing data of THCA (N = 498) and normal thyroid tissue (N = 59) were downloaded from The Cancer Genome Atlas. Next, gene expression levels were calculated using the TCC package and differentially expressed genes (DEGs) were identified using the edgeR package. A co-expression network was constructed using the EBcoexpress package and visualized by Cytoscape, and functional and pathway enrichment of DEGs in the co-expression network was analyzed with DAVID and KOBAS 2.0. Moreover, modules in the co-expression network were identified and annotated using MCODE and BiNGO plugins. Small-molecule drugs were analyzed using the cMAP database, and miRNAs and transcription factors regulating DEGs were identified by WebGestalt. A total of 254 up-regulated and 59 down-regulated DEGs were identified between THCA samples and controls. DEGs enriched in biological process terms were related to cell adhesion, death, and growth and negatively correlated with various small-molecule drugs. The co-expression network of the DEGs consisted of hub genes (ITGA3, TIMP1, KRT19, and SERPINA1) and one module (JUN, FOSB, and EGR1). Furthermore, 5 miRNAs and 5 transcription factors were identified, including E2F, HSF2, and miR-26. miR-26 may participate in THCA by targeting CITED1 and PLA2R1; E2F may participate in THCA by regulating ITGA3, TIMP1, KRT19, EGR1, and JUN; HSF2 may be involved in THCA development by regulating SERPINA1 and FOSB; and small-molecule drugs may have anti-THCA effects. Our results provide novel directions for mechanistic studies and drug design of THCA.
Song, X C; Xu, C; Yue, Z G; Wang, L; Wang, G W; Yang, F H
Myostatin, encoded by the MSTN gene (previously GDF8), is a member of the transforming growth factor-β superfamily, which normally acts to limit skeletal muscle mass by regulating the number and growth of muscle fibers. In this study, a total of 84 myostatin gene sequences with known complete coding regions (CDS) and corresponding amino acid sequences were analyzed from 17 species, and differentiation within and among species was studied using comparative genomics and bioinformatics. Characteristics of the nucleotide and amino acid sequences were also predicted. The results indicated that a total of 569 polymorphic sites, including 53 singleton variable sites and 516 parsimony informative sites, which could be sorted into 44 haplotypes, were detected from 17 species. Observed genetic diversity was higher among species than within species, and Vulpes lagopus was more polymorphic than other species. There was clear differentiation of the myostatin gene among species and the reconstructed phylogenetic tree was consistent with the NCBI taxonomy. The myostatin gene was 375-aa long in most species, except for Mus musculus (376 aa) and Danio rerio (373 aa). The amino acid sequences of myostatin were deemed hydrophilic, and had theoretical pI values of <7.0, mostly due to the acidic polypeptide. The instability index of the myostatin protein was 40.48-51.63, indicating that the polypeptide is not stable. The G+C content of the CDS nucleotide sequence in different species was 40.60-51.69%. The predicted promoter region of the Ovis aries myostatin gene was 150-220 bp upstream of the start codon.
Hess, Jonathan L; Glatt, Stephen J
The gene that encodes zinc finger protein 804A (ZNF804A) became a candidate risk gene for schizophrenia (SZ) after surpassing genome-wide significance thresholds in replicated genome-wide association scans and meta-analyses. Much remains unknown about this reported gene expression regulator; however, preliminary work has yielded insights into functional and biological effects of ZNF804A by targeting its regulatory activities in vitro and by characterizing allele-specific interactions with its risk-conferring single nucleotide polymorphisms (SNPs). There is now strong epidemiologic evidence for a role of ZNF804A polymorphisms in both SZ and bipolar disorder (BD); however, functional links between implicated variants and susceptible biological states have not been solidified. Here we briefly review the genetic evidence implicating ZNF804A polymorphisms as genetic risk factors for both SZ and BD, and discuss the potential functional consequences of these variants on the regulation of ZNF804A and its downstream targets. Empirical work and predictive bioinformatic analyses of the alternate alleles of the two most strongly implicated ZNF804A polymorphisms suggest they might alter the affinity of the gene sequence for DNA- and/or RNA-binding proteins, which might in turn alter expression levels of the gene or particular ZNF804A isoforms. Future work should focus on clarifying the critical periods and cofactors regulating these genetic influences on ZNF804A expression, as well as the downstream biological consequences of an imbalance in the expression of ZNF804A and its various mRNA isoforms.
Xing, Jiang-feng; Zhu, Ru-nan; Qian, Yuan; Zhao, Lin-qing; Deng, Jie; Wang, Fang; Sun, Yu
The aim of this study was to characterize the N and E protein encoding genes of a new human coronavirus (HCoV-NL63) which was identified from one of the clinical specimens (BJ8081) collected from a 12 years-old patient with acute respiratory infection in Beijing. The complete N and E gene sequences of HCoV-NL63 were amplified from clinical sample by RT-PCR, then were cloned into the pCF-T and pUCm-T vectors respectively and sequenced. The complete sequences of N and E genes were submitted to GenBank by Sequin and compared with N and E genes of prototype HCoV-NL63 and the other coronaviruses published in GenBank. The secondary structure and the characteristics of sample BJ8081 N and E proteins were predicted by bioinformatics. It was indicated that the N and E genes amplified from sample BJ8081 were 1134 bp and 234 bp in length and the predicted proteins including 377 amino acids and 77 amino acids, respectively. The data suggested that the region of amino acids 78-85 within N protein probably was the conserved region for all coronaviruses identified so far including HCoV-NL63. The region of amino acids 15-37 for E protein was probably the transmembrane domain. In conclusion, the recombinant plasmids pCF-T-8081 N and pUCm-T-8081 E were successfully constructed and sequenced, and the data predicted by bioinformatics are helpful for the further analysis of HCoV-NL63.
The emergence of scientific disciplines, as well as the policies aimed to steer them, have geographical implications. This becomes visible in areas such as genomics and related fields. In this paper, the relation between scientific evolution, political decisions and geographical configuration is studied. The recent formation of bioinformatics in Brazil is focused on. The study involves an analysis of data collected on the website of CNPq, a funding agency attached to the Ministry of Science and Technology. Furthermore, I conducted fieldwork in four cities, interviewing 15 bioinformaticians. In the history of Brazilian bioinformatics, three periods can be identified. In the first period (1900-1996), bioinformatics was actually absent, but biology research groups were formed which would subsequently explore bioinformatics. The second period (1997-2006) was marked by the emergence of the discipline and geographical concentration of major research groups in the southern part of Brazil. A third period can be pointed to (2007-2014), in which political choices have turned geographical diffusion and institutional equality into a national target. As a consequence of the recent shifts, genomics and bioinformatics researchers have been involved in a debate, some defending the existence of few specialized research and sequencing platforms, whereas others welcoming the constitution of a scientific scenario based on decentralized platforms. I defend an intermediate solution, whereby some places would be selected to be genomics hubs. This would fit the regional diversity of this vast country, in addition to tackling the scientific weaknesses of the northern area.
Li, Haili; Tang, Wenru; Jia, Shuting; Wu, Xiaoming; Luo, Ying
Background The function of the tumor suppressor gene RASSF1A in cancer cells has been detailed in many studies. However, due to the methylation of its promoter, the expression of RASSF1A is missing in most cancers. In the literature, we found that the conclusion regarding the relationship between RASSF1A gene promoter methylation and the susceptibility and prognosis of melanoma was not unified. This study adopts the use of a meta-analysis and bioinformatics to explore the relationship between RASSF1A gene promoter methylation and the susceptibility and prognosis of melanoma. Methods Data on melanoma susceptibility were downloaded from the PubMed, Cochrane Library, Web of Science and Google Scholar databases, which were analyzed via a meta-analysis. The effect sizes were estimated by measuring an odds ratio (OR) with a 95% confidence interval (CI). We also used a chi-squared-based Q test to examine the between-study heterogeneity, and used funnel plots to evaluate publication bias. The data on melanoma prognosis, which were analyzed by bioinformatics methods, were downloaded from The Cancer Genome Atlas (TCGA) project. The effect sizes were estimated by measuring the hazard ratios (HRs) with a 95% confidence interval (CI). Results Our meta-analysis included 10 articles. We found that RASSF1A gene promoter methylation was closely related to melanoma susceptibility (OR = 12.67, 95% CI: 6.16 ∼ 26.05, z = 6.90, P<0.0001 according to a fixed effects model and OR = 9.25, 95% CI: 4.37 ∼ 19.54, z = 5.82, P<0.0001 according to a random effects model). The results of the meta-analysis did not reveal any heterogeneity (tau2 = 0.00; H = 1 [1; 1.55]; I2 = 0% [0%; 58.6%], P = 0.5158) or publication bias (t = 0.87, P = 0.4073 by Egger’s test; Z = 0.45, P = 0.6547 by Begg’s test); therefore, we believe that the results of our meta-analysis were more reliable. To explore the relationship between RASSF1A gene methylation, the prognosis of melanoma and the clinical features of
Bansard, Jean-Yves; Rebholz-Schuhman, Dietrich; Cameron, Graham; Clark, Dominic; van Mulligen, Erik; Beltrame, Francesco; Del Hoyo Barbolla, Eva; Martin-Sanchez, Fernando; Milanesi, Luciano; Tollis, Ioannis; Van der Lei, Johan; Coatrieux, Jean-Louis
This paper reports on an analysis of the bioinformatics and medical informatics literature with the objective to identify upcoming trends that are shared among both research fields to derive benefits from potential collaborative initiatives for their future. Our results present the main characteristics of the two fields and show that these domains are still relatively separated. PMID:17521073
Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880
Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.
Zhang, Li; Wang, Shi-Bo; Li, Qi-Gang; Song, Jian; Hao, Yu-Qi; Zhou, Ling; Zheng, Huan-Quan; Dunwell, Jim M.; Zhang, Yuan-Ming
Seed oils provide a renewable source of food, biofuel and industrial raw materials that is important for humans. Although many genes and pathways for acyl-lipid metabolism have been identified, little is known about whether there is a specific mechanism for high-oil content in high-oil plants. Based on the distinct differences in seed oil content between four high-oil dicots (20~50%) and three low-oil grasses (<3%), comparative genome, transcriptome and differential expression analyses were used to investigate this mechanism. Among 4,051 dicot-specific soybean genes identified from 252,443 genes in the seven species, 54 genes were shown to directly participate in acyl-lipid metabolism, and 93 genes were found to be associated with acyl-lipid metabolism. Among the 93 dicot-specific genes, 42 and 27 genes, including CBM20-like SBDs and GPT2, participate in carbohydrate degradation and transport, respectively. 40 genes highly up-regulated during seed oil rapid accumulation period are mainly involved in initial fatty acid synthesis, triacylglyceride assembly and oil-body formation, for example, ACCase, PP, DGAT1, PDAT1, OLEs and STEROs, which were also found to be differentially expressed between high- and low-oil soybean accessions. Phylogenetic analysis revealed distinct differences of oleosin in patterns of gene duplication and loss between high-oil dicots and low-oil grasses. In addition, seed-specific GmGRF5, ABI5 and GmTZF4 were predicted to be candidate regulators in seed oil accumulation. This study facilitates future research on lipid biosynthesis and potential genetic improvement of seed oil content. PMID:27159078
Bao, Yu-Ting; Li, Zhe-Ming; Zhou, Xiao-Jie; He, Jia-Na; Dai, Shi-Jie; Li, Chang yu
CREA levels. In the livers of the BBR group, we found 154 DEGs, including 91 genes with up-regulated expression and 63 genes with down-regulated expression. In addition, GO enrichment analysis showed significant enrichment of the DEGs in the following categories: metabolic process, localization, cellular process, biological regulation and response to stimulus process. After the gene screening, KEGG pathway analysis showed that the target genes are involved in multiple pathways, including the lysine degradation, glycosaminoglycan biosynthesis-chondroitin sulfate/dermatan sulfate and pyruvate metabolism pathways. By combining the results of PPI network and KEGG pathway analyses, we identified seven key node genes. The qRT-PCR results confirmed that the expression of the RHOA, MAPK4 and DLAT genes was significantly down-regulated compared with the levels in DM group, whereas the expression of the SgK494, DOT1L, SETD2 and ME3 genes was significantly up-regulated in the BBR group. Conclusion Berberine can significantly improve glucose metabolism and has a protective effects of liver and kidney function in ZDF rats. The qRT-PCR results for the crucial DEGs validated the microarray results. These results suggested that the RHOA, MAPK4, SGK494, DOT1L, SETD2, ME3 and DLAT genes are potential therapeutic target genes for the treatment of diabetes. PMID:27846294
Tolvanen, Martti; Vihinen, Mauno
Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…
Cantacessi, C; Campbell, B E; Jex, A R; Young, N D; Hall, R S; Ranganathan, S; Gasser, R B
The advent and integration of high-throughput '-omics' technologies (e.g. genomics, transcriptomics, proteomics, metabolomics, glycomics and lipidomics) are revolutionizing the way biology is done, allowing the systems biology of organisms to be explored. These technologies are now providing unique opportunities for global, molecular investigations of parasites. For example, studies of a transcriptome (all transcripts in an organism, tissue or cell) have become instrumental in providing insights into aspects of gene expression, regulation and function in a parasite, which is a major step to understanding its biology. The purpose of this article was to review recent applications of next-generation sequencing technologies and bioinformatic tools to large-scale investigations of the transcriptomes of parasitic nematodes of socio-economic significance (particularly key species of the order Strongylida) and to indicate the prospects and implications of these explorations for developing novel methods of parasite intervention.
Hou, Chunyu; Wang, Fei; Liu, Xuewen; Chang, Guangming; Wang, Feng; Geng, Xin
Telomerase reverse transcriptase （TERT）is the protein component of telomerase complex. Evidence has accumulated showing that the non-telomeric functions of TERT are independent of telomere elongation. However, the mechanisms governing the interaction between TERT and its target genes are not revealed clear. The biological functions of TERT are not fully elucidated and have thus far been underestimated. To further explore these functions, we investigated TERT interaction networks using multiple bioinformatic databases, including BioGRID, STRING, DAVID, GeneCards, GeneMANIA, PANTHER, miRWalk, mirTarBase, miRNet, miRDB and TargetScan. In addition, network diagrams were built using Cytoscape software. Since competing endogenous RNAs (ceRNAs) are endogenous transcripts that compete for the binding of microRNAs by using shared microRNA Recognition Elements (MREs), they are involved in create widespread regulatory networks. Therefore, the ceRNA regulatory networks of TERT were also investigated in the present study. Interestingly, we found that the three genes PABPC1, SLC7A11, TP53 were present in both TERT interaction networks and ceRNAs target genes. It was predicted that TERT might play non-telomeric roles in the generation or development of some rare diseases, such as Rift Valley Fever and Dyscalculia. Thus, our data will help to decipher the interaction networks of TERT and reveal the unknown functions of telomerase in cancer and aging -related diseases.
Li, Zhiying; Li, Dan
The Progressive addition lens is used increasingly extensive with its advantages of meeting the requirements of distant and near vision at the same time. Started from the surface equations of progressive addition lens, combined with evaluation method of spherical power and cylinder power, the relationship equations between the surface sag and optical power distribution are derived. According to the requirements on difference of actual and nominal optical power from Chinese National Standard, the tolerance analysis and evaluation of prototype progressive addition surface with addition of 2.5m-1 ( 7.5m-1 10m-1 ) is given in detail. The tolerance analysis method provides theoretical proof for lens processing control accuracy, and the processing feasibility of lens is evaluated much more reasonably.
Waseem, Hassan; Williams, Maggie R; Stedtfeld, Tiffany; Chai, Benli; Stedtfeld, Robert D; Cole, James R; Tiedje, James M; Hashsham, Syed A
Virulence factor activity relationships (VFARs) - a concept loosely based on quantitative structure-activity relationships (QSARs) for chemicals was proposed as a predictive tool for ranking risks due to microorganisms relevant to water safety. A rapid increase in sequencing capabilities and bioinformatics tools has significantly increased the potential for VFAR-based analyses. This review summarizes more than 20 bioinformatics databases and tools, developed over the last decade, along with their virulence and antimicrobial resistance prediction capabilities. With the number of bacterial whole genome sequences exceeding 241 000 and metagenomic analysis projects exceeding 13 000 and the ability to add additional genome sequences for few hundred dollars, it is evident that further development of VFARs is not limited by the availability of information at least at the genomic level. However, additional information related to co-occurrence, treatment response, modulation of virulence due to environmental and other factors, and economic impact must be gathered and incorporated in a manner that also addresses the associated uncertainties. Of the bioinformatics tools, a majority are either designed exclusively for virulence/resistance determination or equipped with a dedicated module. The remaining have the potential to be employed for evaluating virulence. This review focusing broadly on omics technologies and tools supports the notion that these tools are now sufficiently developed to allow the application of VFAR approaches combined with additional engineering and economic analyses to rank and prioritize organisms important to a given niche. Knowledge gaps do exist but can be filled with focused experimental and theoretical analyses that were unimaginable a decade ago. Further developments should consider the integration of the measurement of activity, risk, and uncertainty to improve the current capabilities.
Beshears, Ronald D.
Computed tomography (CT) inspection was performed on test articles additively manufactured from metallic materials. Metallic AM and machined wrought alloy test articles with programmed flaws were inspected using a 2MeV linear accelerator based CT system. Performance of CT inspection on identically configured wrought and AM components and programmed flaws was assessed using standard image analysis techniques to determine the impact of additive manufacturing on inspectability of objects with complex geometries.
Varzandian, Bahareh; Ghaderi-Zefrehei, Mostafa; Hosseinzadeh, Saeid; Sayyadi, Mostafa; Taghadosi, Vahideh; Varzandian, Sara
Cytokines are immune regulators that play an essential role in regulating immune response against various infections. The present study focused on the possible association between the expression level of Interleukin 10 (IL-10) in blood and milk samples of 25 healthy and 25 mastitic cows in Fars province, Iran, using a quantitative real-time PCR assay. The experimental groups were categorized according to the number of calvings. The expression level of IL-10 was significantly higher in the blood and milk samples of mastitic cows compared to the healthy ones. Concomitant to increasing the number of calving, a numerical elevation in the expression of IL-10 in blood was observed (P < 0.05). The bioinformatics analysis of IL-10 gene revealed the promoter, exon-intron regions, and nucleosome profile. The nucleosome occupancy site was finally predicted using NUPOP software. Our result indicated that the promoter was not exactly placed in the nucleosome region, which was finally aimed to predict the position and expression of IL-10 gene in the mastitic cows.
The suppression subtractive hybridization (SSH) approach, a PCR based approach which amplifies differentially expressed cDNAs (complementary DNAs), while simultaneously suppressing amplification of common cDNAs, was employed to identify immuneinducible genes in insects. This technique has been used as a suitable tool for experimental identification of novel genes in eukaryotes as well as prokaryotes; whose genomes have been sequenced, or the species whose genomes have yet to be sequenced. In this article, I have proposed a method for in silico functional characterization of immune-inducible genes from insects. Apart from immune-inducible genes from insects, this method can be applied for the analysis of genes from other species, starting from bacteria to plants and animals. This article is provided with a background of SSH-based method taking specific examples from innate immune-inducible genes in insects, and subsequently a bioinformatics pipeline is proposed for functional characterization of newly sequenced genes. The proposed workflow presented here, can also be applied for any newly sequenced species generated from Next Generation Sequencing (NGS) platforms.
Seguin, Jonathan; Otten, Patricia; Baerlocher, Loïc; Farinelli, Laurent; Pooggin, Mikhail M
In most eukaryotes, small RNA (sRNA) molecules such as miRNAs, siRNAs and piRNAs regulate gene expression and repress transposons and viruses. AGO/PIWI family proteins sort functional sRNAs based on size, 5'-nucleotide and other sequence features. In plants and some animals, viral sRNAs are extremely diverse and cover the entire viral genome sequences, which allows for de novo reconstruction of a complete viral genome by deep sequencing and bioinformatics analysis of viral sRNAs. Previously, we have developed a tool MISIS to view and analyze sRNA maps of viruses and cellular genome regions which spawn multiple sRNAs. Here we describe a new release of MISIS, MISIS-2, which enables to determine and visualize a consensus sequence and count sRNAs of any chosen sizes and 5'-terminal nucleotide identities. Furthermore we demonstrate the utility of MISIS-2 for identification of single nucleotide polymorphisms (SNPs) at each position of a reference sequence and reconstruction of a consensus master genome in evolving viral quasispecies. MISIS-2 is a Java standalone program. It is freely available along with the source code at the website http://www.fasteris.com/apps.
Dabdoub, Shareef M.; Fellows, Megan L.; Paropkari, Akshay D.; Mason, Matthew R.; Huja, Sarandeep S.; Tsigarida, Alexandra A.; Kumar, Purnima S.
The 16S rRNA gene is widely used for taxonomic profiling of microbial ecosystems; and recent advances in sequencing chemistry have allowed extremely large numbers of sequences to be generated from minimal amounts of biological samples. Analysis speed and resolution of data to species-level taxa are two important factors in large-scale explorations of complex microbiomes using 16S sequencing. We present here new software, Phylogenetic Tools for Analysis of Species-level Taxa (PhyloToAST), that completely integrates with the QIIME pipeline to improve analysis speed, reduce primer bias (requiring two sequencing primers), enhance species-level analysis, and add new visualization tools. The code is free and open source, and can be accessed at http://phylotoast.org. PMID:27357721
Raymond, Margaret; And Others
Describes an experiment on the simultaneous determination of chromium and magnesium by spectophotometry modified to include the Generalized Standard Addition Method computer program, a multivariate calibration method that provides optimal multicomponent analysis in the presence of interference and matrix effects. Provides instructions for…
Greene, Casey S; Tan, Jie; Ung, Matthew; Moore, Jason H; Cheng, Chao
Recent technological advances allow for high throughput profiling of biological systems in a cost-efficient manner. The low cost of data generation is leading us to the "big data" era. The availability of big data provides unprecedented opportunities but also raises new challenges for data mining and analysis. In this review, we introduce key concepts in the analysis of big data, including both "machine learning" algorithms as well as "unsupervised" and "supervised" examples of each. We note packages for the R programming language that are available to perform machine learning analyses. In addition to programming based solutions, we review webservers that allow users with limited or no programming background to perform these analyses on large data compendia.
Skarzyńska, Agnieszka; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Przybecki, Zbigniew
Real-time quantitative polymerase chain reaction is consider as the most reliable method for gene expression studies. However, the expression of target gene could be misinterpreted due to improper normalization. Therefore, the crucial step for analysing of qPCR data is selection of suitable reference genes, which should be validated experimentally. In order to choice the gene with stable expression in the designed experiment, we performed reference gene expression analysis. In this study genes described in the literature and novel genes predicted as control genes, based on the in silico analysis of transcriptome data were used. Analysis with geNorm and NormFinder algorithms allow to create the ranking of candidate genes and indicate the best reference for flower morphogenesis study. According to the results, genes CACS and CYCL were characterised the most stable expression, but the least suitable genes were TUA and EF.
Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability
Chiu, Chih-Min; Huang, Wei-Chih; Weng, Shun-Long; Tseng, Han-Chi; Liang, Chao; Wang, Wei-Chi; Yang, Ting; Yang, Tzu-Ling; Weng, Chen-Tsung; Chang, Tzu-Hao; Huang, Hsien-Da
Eighty-one stool samples from Taiwanese were collected for analysis of the association between the gut flora and obesity. The supervised analysis showed that the most, abundant genera of bacteria in normal samples (from people with a body mass index (BMI) ≤ 24) were Bacteroides (27.7%), Prevotella (19.4%), Escherichia (12%), Phascolarctobacterium (3.9%), and Eubacterium (3.5%). The most abundant genera of bacteria in case samples (with a BMI ≥ 27) were Bacteroides (29%), Prevotella (21%), Escherichia (7.4%), Megamonas (5.1%), and Phascolarctobacterium (3.8%). A principal coordinate analysis (PCoA) demonstrated that normal samples were clustered more compactly than case samples. An unsupervised analysis demonstrated that bacterial communities in the gut were clustered into two main groups: N-like and OB-like groups. Remarkably, most normal samples (78%) were clustered in the N-like group, and most case samples (81%) were clustered in the OB-like group (Fisher's P value = 1.61E − 07). The results showed that bacterial communities in the gut were highly associated with obesity. This is the first study in Taiwan to investigate the association between human gut flora and obesity, and the results provide new insights into the correlation of bacteria with the rising trend in obesity. PMID:25202708
Tian, Dashuan; Niu, Shuli
Nitrogen (N) deposition-induced soil acidification has become a global problem. However, the response patterns of soil acidification to N addition and the underlying mechanisms remain far from clear. Here, we conducted a meta-analysis of 106 studies to reveal global patterns of soil acidification in responses to N addition. We found that N addition significantly reduced soil pH by 0.26 on average globally. However, the responses of soil pH varied with ecosystem types, N addition rate, N fertilization forms, and experimental durations. Soil pH decreased most in grassland, whereas boreal forest was not observed a decrease to N addition in soil acidification. Soil pH decreased linearly with N addition rates. Addition of urea and NH4NO3 contributed more to soil acidification than NH4-form fertilizer. When experimental duration was longer than 20 years, N addition effects on soil acidification diminished. Environmental factors such as initial soil pH, soil carbon and nitrogen content, precipitation, and temperature all influenced the responses of soil pH. Base cations of Ca2+, Mg2+ and K+ were critical important in buffering against N-induced soil acidification at the early stage. However, N addition has shifted global soils into the Al3+ buffering phase. Overall, this study indicates that acidification in global soils is very sensitive to N deposition, which is greatly modified by biotic and abiotic factors. Global soils are now at a buffering transition from base cations (Ca2+, Mg2+ and K+) to non-base cations (Mn2+ and Al3+). This calls our attention to care about the limitation of base cations and the toxic impact of non-base cations for terrestrial ecosystems with N deposition.
Lawlor, Brendan; Walsh, Paul
There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054
Lawlor, Brendan; Walsh, Paul
There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.
Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari
Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…
Jena, Jyotsnarani; Kumar, Ravindra; Dixit, Anshuman; Pandey, Sony; Das, Trupti
Simultaneous nitrate-N, phosphate and COD removal was evaluated from synthetic waste water using mixed microbial consortia in an anoxic environment under various initial carbon load (ICL) in a batch scale reactor system. Within 6 hours of incubation, enriched DNPAOs (Denitrifying Polyphosphate Accumulating Microorganisms) were able to remove maximum COD (87%) at 2g/L of ICL whereas maximum nitrate-N (97%) and phosphate (87%) removal along with PHB accumulation (49 mg/L) was achieved at 8 g/L of ICL. Exhaustion of nitrate-N, beyond 6 hours of incubation, had a detrimental effect on COD and phosphate removal rate. Fresh supply of nitrate-N to the reaction medium, beyond 6 hours, helped revive the removal rates of both COD and phosphate. Therefore, it was apparent that in spite of a high carbon load, maximum COD and nutrient removal can be maintained, with adequate nitrate-N availability. Denitrifying condition in the medium was evident from an increasing pH trend. PHB accumulation by the mixed culture was directly proportional to ICL; however the time taken for accumulation at higher ICL was more. Unlike conventional EBPR, PHB depletion did not support phosphate accumulation in this case. The unique aspect of all the batch studies were PHB accumulation was observed along with phosphate uptake and nitrate reduction under anoxic conditions. Bioinformatics analysis followed by pyrosequencing of the mixed culture DNA from the seed sludge revealed the dominance of denitrifying population, such as Corynebacterium, Rhodocyclus and Paraccocus (Alphaproteobacteria and Betaproteobacteria). Rarefaction curve indicated complete bacterial population and corresponding number of OTUs through sequence analysis. Chao1 and Shannon index (H’) was used to study the diversity of sampling. “UCI95” and “LCI95” indicated 95% confidence level of upper and lower values of Chao1 for each distance. Values of Chao1 index supported the results of rarefaction curve. PMID:25689047
Jena, Jyotsnarani; Kumar, Ravindra; Dixit, Anshuman; Pandey, Sony; Das, Trupti
Simultaneous nitrate-N, phosphate and COD removal was evaluated from synthetic waste water using mixed microbial consortia in an anoxic environment under various initial carbon load (ICL) in a batch scale reactor system. Within 6 hours of incubation, enriched DNPAOs (Denitrifying Polyphosphate Accumulating Microorganisms) were able to remove maximum COD (87%) at 2 g/L of ICL whereas maximum nitrate-N (97%) and phosphate (87%) removal along with PHB accumulation (49 mg/L) was achieved at 8 g/L of ICL. Exhaustion of nitrate-N, beyond 6 hours of incubation, had a detrimental effect on COD and phosphate removal rate. Fresh supply of nitrate-N to the reaction medium, beyond 6 hours, helped revive the removal rates of both COD and phosphate. Therefore, it was apparent that in spite of a high carbon load, maximum COD and nutrient removal can be maintained, with adequate nitrate-N availability. Denitrifying condition in the medium was evident from an increasing pH trend. PHB accumulation by the mixed culture was directly proportional to ICL; however the time taken for accumulation at higher ICL was more. Unlike conventional EBPR, PHB depletion did not support phosphate accumulation in this case. The unique aspect of all the batch studies were PHB accumulation was observed along with phosphate uptake and nitrate reduction under anoxic conditions. Bioinformatics analysis followed by pyrosequencing of the mixed culture DNA from the seed sludge revealed the dominance of denitrifying population, such as Corynebacterium, Rhodocyclus and Paraccocus (Alphaproteobacteria and Betaproteobacteria). Rarefaction curve indicated complete bacterial population and corresponding number of OTUs through sequence analysis. Chao1 and Shannon index (H') was used to study the diversity of sampling. "UCI95" and "LCI95" indicated 95% confidence level of upper and lower values of Chao1 for each distance. Values of Chao1 index supported the results of rarefaction curve.
The role of eosinophils in the development and progression of chronic allograft rejection is recognized in multiple organ transplantation settings. The CCR3 signaling pathway is one of the key regulatory pathways in eosinophil migration to the engrafted tissue. Eotaxin is a ligand for CCR3 and reflects eosinophilic inflammation, which can lead to fibrosis. We hypothesized that the CCR3 pathway would be upregulated in obliterative airway disease (OAD) in an established model of chronic airway allograft rejection. The mouse gene microarray data from a heterotopic mouse model of OAD in the NIH Gene Expression Omnibus (GEO) repository were analyzed for differentially expressed eosinophil pathways, using the Partek Suite and Ingenuity Pathway Analysis. A P value of <0.005 was defined as significant for differential expression, and P value of <0.05 for pathways. Day 25 allografts were defined as chronic allograft rejection and day 4 as acute allograft rejection. The isografts and allografts at day 25 showed significant upregulation of the eosinophil CCR3 pathway (P=0.04), based on the analysis of 1,299 uniquely expressed genes. The isografts at day 4 were compared with those at day 25 based on the identification of 1,859 unique genes, and there was a trend toward the CCR3 pathway upregulation over time (P=0.06). CCR3 pathways were not upregulated during the progression of alloimmune rejection in the allografts at day 4 versus day 25 in comparison, based on the analysis of 1,603 genes. Eotaxin was upregulated in chronic allograft rejection by 2.5-fold. The eosinophil signaling pathway CCR3 and eotaxin were significantly expressed in chronic allograft rejection and our results imply a role in controlling early alloimmune damage in controls.
Han, Kui-hua; Zhao, Jian-li; Lu, Chun-mei; Wang, Yong-zheng; Zhao, Gai-ju; Cheng, Shi-qing
The additive effects of A12O3, Fe2O3 and MnCO3 on CaO sulfation kinetics were investigated by thermogravimetic analysis method and modified grain model. The activation energy (Ea) and the pre-exponential factor (k0) of surface reaction, the activation energy (Ep) and the pre-exponential factor (D0) of product layer diffusion reaction were calculated according to the model. Additions of MnCO3 can enhance the initial reaction rate, product layer diffusion and the final CaO conversion of sorbents, the effect mechanism of which is similar to that of Fe2O3. The method based isokinetic temperature Ts and activation energy can not estimate the contribution of additive to the sulfation reactivity, the rate constant of the surface reaction (k), and the effective diffusivity of reactant in the product layer (Ds) under certain experimental conditions can reflect the effect of additives on the activation. Unstoichiometric metal oxide may catalyze the surface reaction and promote the diffusivity of reactant in the product layer by the crystal defect and distinct diffusion of cation and anion. According to the mechanism and effect of additive on the sulfation, the effective temperature and the stoichiometric relation of reaction, it is possible to improve the utilization of sorbent by compounding more additives to the calcium-based sorbent.
Zhang, Quanwei; Gong, Jishang; Wang, Xueying; Wu, Xiaohu; Li, Yalan; Ma, Youji; Zhang, Yong; Zhao, Xingxu
The IGF family is essential for normal embryonic and postnatal development and plays important roles in the immune system, myogenesis, bone metabolism and other physiological functions, which makes the study of its structure and biological characteristics important. Tianzhu white yak (Bos grunniens) domesticated under alpine hypoxia environments, is well adapted to survive and grow against severe hypoxia and cold temperatures for extended periods. In this study, a full coding sequence of the IGF2 gene of Tianzhu white yak was amplified by reverse transcription PCR and rapid-amplification of cDNA ends (RACE) for the first time. The cDNA sequence revealed an open reading frame of 450 nucleotides, encoding a protein with 179 amino acids. Its expression in different tissues was also studied by Real time PCR. Phylogenetic tree analysis indicated that yak IGF2 was similar to Bos taurus, and 3D structure showed high similarity with the human IGF2. The putative full CDS of yak IGF2 was amplified by PCR in five tissues, and cDNA sequence analysis showed high homology to bovine IGF2. Moreover the super secondary structure prediction showed a similar 3D structure with human IGF2. Its conservation in sequence and structure has facilitated research on IGF2 and its physiological function in yak.
Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar
Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.
Fang, Wai-Chi; Lue, Jaw-Chyng
A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).
Siva Subramaniam, Nitthiya; Morgan, Eleanor; Bottomley, Steven; Tay, Sharon; Gregg, Keith; Lee, Chee Yang; Wetherall, John; Groth, David
Corneodesmosin (CDSN) is an important component of the desmosome in the epidermal cornified stratum and inner root sheath of hair follicles. DNA from a sheep BAC clone previously identified by us to contain CDSN was PCR amplified using cattle-derived primers and the product sequenced. A region of 4579 bp containing CDSN was shown to contain two exons separated by one intron and spanning 3683 bp. The DNA encodes a predicted protein of 546 amino acids. Phylogenetic analysis shows that sheep CDSN falls within a clade containing cattle and other ruminant-like species. Comparison of sequences generated from 12 unrelated merino sheep and the International Sheep Genome Consortium (ISGC) data identified 58 single nucleotide polymorphisms (SNPs) within the 4579 bp region of which 16 are contained within coding sequences (1 in 80 bp). The SNPs identified in this study will add to the Major Histocompatibility Complex (MHC) SNP panel, which will allow extensive haplotyping of the sheep MHC in future studies.
Wang, Nengding; Ozer, Egon A
Transposon insertion sequencing is a process whereby microbial fitness determinants can be identified on a genome-wide scale. This process uses high-throughput next generation sequencing to screen for changes in the composition of a pool of transposon mutants after exposure to selective conditions. One commonly used process for generating transposon insertion sequencing libraries is called INSeq that works with mutant pools produced using a modified Mariner transposon. Libraries produced using the INSeq process are sequenced on the Illumina platform. In this chapter, we describe our method for processing the raw Illumina sequencing reads, aligning the reads to a reference sequence to determine read counts, and using the online transposon insertion sequencing data analysis server, ESSENTIALS, to interpret the results.
de Jong, Anne; van Heel, Auke J.; Kuipers, Oscar P.
Bioinformatic tools can greatly improve the efficiency of bacteriocin screening efforts by limiting the amount of strains. Different classes of bacteriocins can be detected in genomes by looking at different features. Finding small bacteriocins can be especially challenging due to low homology and because small open reading frames (ORFs) are often omitted from annotations. In this chapter, several bioinformatic tools/strategies to identify bacteriocins in genomes are discussed.
Ong, Quang; Nguyen, Phuc; Thao, Nguyen Phuong; Le, Ly
The advance in genomics technology leads to the dramatic change in plant biology research. Plant biologists now easily access to enormous genomic data to deeply study plant high-density genetic variation at molecular level. Therefore, fully understanding and well manipulating bioinformatics tools to manage and analyze these data are essential in current plant genome research. Many plant genome databases have been established and continued expanding recently. Meanwhile, analytical methods based on bioinformatics are also well developed in many aspects of plant genomic research including comparative genomic analysis, phylogenomics and evolutionary analysis, and genome-wide association study. However, constantly upgrading in computational infrastructures, such as high capacity data storage and high performing analysis software, is the real challenge for plant genome research. This review paper focuses on challenges and opportunities which knowledge and skills in bioinformatics can bring to plant scientists in present plant genomics era as well as future aspects in critical need for effective tools to facilitate the translation of knowledge from new sequencing data to enhancement of plant productivity. PMID:27499685
Irizarry, Kristopher Jl; Chan, Adam; Kettle, Derek; Kezian, Steven; Ma, Dominic; Palacios, Louis; Li, Qingshun; Keeler, Calvin L; Drechsler, Yvonne
The goal of this project was to characterize the molecular and cellular roles of various gene targets regulated by 8 miRNAs in differentiating macrophages. Among a number of miRNAs that are found to be expressed in avian macrophages, we focused on eight specific miRNAs (miR-1618, miR-1586, miR-1633, miR-1627, miR-1646, miR-1649, miR-1610, miR-1647) associated with macrophage activation through Wnt signaling, ubiquitination, PPAR mediated macrophage function, vesicle mediated cytokine trafficking, and WD40 domain proteins in macrophage differentiation. The results of our analysis identified a global theme for macrophage function: Differentiation and activation of macrophages requires a comprehensive redistribution of the cell's protein repertoire. This redistribution involves two processes: 1) the degradation and recycling of unneeded cytoplasmic and membrane components and 2) the mobilization of newly synthesized cellular components via vesicular trafficking. Ultimately, this leads to a change in the membrane surface expression profile of the cell as well as to a change in proteins regulating phagosomal and lysosomal compartments facilitating increased efficiency of phagocytic activity. In this manner, a monocyte tooled with chemokine surface receptors and an internal cytoskeletal structure geared towards mobility may efficiently sense, react, and migrate toward a site of infection. Once a monocyte arrives to a site of infection, local signals induce a redistribution of resources into a pro-phagocytic phenotype. This may involve upregulating pathogen pattern recognizing receptors and increasing the efficiency of lysosomal biogenesis, while simultaneously recycling components involved in circulatory migration and leukocyte extravasation. In parallel, Wnt and NF-κB signal transduction induces expression of cytokine signals that act in an autocrine and paracrine manner, driving this process of self-differentiation as well as inducing differentiation of nearby
Stone, Nicole; Pangilinan, Faith; Molloy, Anne M.; Shane, Barry; Scott, John M.; Ueland, Per Magne; Mills, James L.; Kirke, Peader N.; Sethupathy, Praveen; Brody, Lawrence C.
One-carbon metabolism (OCM) is linked to DNA synthesis and methylation, amino acid metabolism and cell proliferation. OCM dysfunction has been associated with increased risk for various diseases, including cancer and neural tube defects. MicroRNAs (miRNAs) are ∼22 nt RNA regulators that have been implicated in a wide array of basic cellular processes, such as differentiation and metabolism. Accordingly, mis-regulation of miRNA expression and/or activity can underlie complex disease etiology. We examined the possibility of OCM regulation by miRNAs. Using computational miRNA target prediction methods and Monte-Carlo based statistical analyses, we identified two candidate miRNA “master regulators” (miR-22 and miR-125) and one candidate pair of “master co-regulators” (miR-344-5p/484 and miR-488) that may influence the expression of a significant number of genes involved in OCM. Interestingly, miR-22 and miR-125 are significantly up-regulated in cells grown under low-folate conditions. In a complementary analysis, we identified 15 single nucleotide polymorphisms (SNPs) that are located within predicted miRNA target sites in OCM genes. We genotyped these 15 SNPs in a population of healthy individuals (age 18–28, n = 2,506) that was previously phenotyped for various serum metabolites related to OCM. Prior to correction for multiple testing, we detected significant associations between TCblR rs9426 and methylmalonic acid (p = 0.045), total homocysteine levels (tHcy) (p = 0.033), serum B12 (p < 0.0001), holo transcobalamin (p < 0.0001) and total transcobalamin (p < 0.0001); and between MTHFR rs1537514 and red blood cell folate (p < 0.0001). However, upon further genetic analysis, we determined that in each case, a linked missense SNP is the more likely causative variant. Nonetheless, our Monte-Carlo based in silico simulations suggest that miRNAs could play an important role in the regulation of OCM. PMID:21765920
Puig-Butille, Joan Anton; Gimenez-Xavier, Pol; Visconti, Alessia; Nsengimana, Jérémie; Garcia-García, Francisco; Tell-Marti, Gemma; Escamez, Maria José; Newton-Bishop, Julia; Bataille, Veronique; Del Río, Marcela; Dopazo, Joaquín; Falchi, Mario; Puig, Susana
The MC1R gene plays a crucial role in pigmentation synthesis. Loss-of-function MC1R variants, which impair protein function, are associated with red hair color (RHC) phenotype and increased skin cancer risk. Cultured cutaneous cells bearing loss-of-function MC1R variants show a distinct gene expression profile compared to wild-type MC1R cultured cutaneous cells. We analysed the gene signature associated with RHC co-cultured melanocytes and keratinocytes by Protein-Protein interaction (PPI) network analysis to identify genes related with non-functional MC1R variants. From two detected networks, we selected 23 nodes as hub genes based on topological parameters. Differential expression of hub genes was then evaluated in healthy skin biopsies from RHC and black hair color (BHC) individuals. We also compared gene expression in melanoma tumors from individuals with RHC versus BHC. Gene expression in normal skin from RHC cutaneous cells showed dysregulation in 8 out of 23 hub genes (CLN3, ATG10, WIPI2, SNX2, GABARAPL2, YWHA, PCNA and GBAS). Hub genes did not differ between melanoma tumors in RHC versus BHC individuals. The study suggests that healthy skin cells from RHC individuals present a constitutive genomic deregulation associated with the red hair phenotype and identify novel genes involved in melanocyte biology.
Rai, Devendra K; Lawrence, Paul; Pauszek, Steve J; Piccone, Maria E; Knowles, Nick J; Rieder, Elizabeth
Bovine rhinitis viruses (BRVs) cause mild respiratory disease of cattle. In this study, a near full-length genome sequence of a virus named RS3X (formerly classified as bovine rhinovirus type 1), isolated from infected cattle from the UK in the 1960s, was obtained and analyzed. Compared to other closely related Aphthoviruses, major differences were detected in the leader protease (L(pro)), P1, 2B, and 3A proteins. Phylogenetic analysis revealed that RS3X was a member of the species bovine rhinitis A virus (BRAV). Using different codon-based and branch-site selection models for Aphthoviruses, including BRAV RS3X and foot-and-mouth disease virus, we observed no clear evidence for genomic regions undergoing positive selection. However, within each of the BRV species, multiple sites under positive selection were detected. The results also suggest that the probability (determined by Recombination Detection Program) for recombination events between BRVs and other Aphthoviruses, including foot-and-mouth disease virus was not significant. In contrast, within BRVs, the probability of recombination increases. The data reported here provide genetic information to assist in the identification of diagnostic signatures and research tools for BRAV.
Antosh, Michael; Whitaker, Rachel; Kroll, Adam; Hosier, Suzanne; Chang, Chengyi; Bauer, Johannes; Cooper, Leon; Neretti, Nicola; Helfand, Stephen L
A multiple comparison approach using whole genome transcriptional arrays was used to identify genes and pathways involved in calorie restriction/dietary restriction (DR) life span extension in Drosophila. Starting with a gene centric analysis comparing the changes in common between DR and two DR related molecular genetic life span extending manipulations, Sir2 and p53, lead to a molecular confirmation of Sir2 and p53's similarity with DR and the identification of a small set of commonly regulated genes. One of the identified upregulated genes, takeout, known to be involved in feeding and starvation behavior, and to have sequence homology with Juvenile Hormone (JH) binding protein, was shown to directly extend life span when specifically overexpressed. Here we show that a pathway centric approach can be used to identify shared physiological pathways between DR and Sir2, p53 and resveratrol life span extending interventions. The set of physiological pathways in common among these life span extending interventions provides an initial step toward defining molecular genetic and physiological changes important in life span extension. The large overlap in shared pathways between DR, Sir2, p53 and resveratrol provide strong molecular evidence supporting the genetic studies linking these specific life span extending interventions.
Forcella, Matilde; Mozzi, Alessandra; Bigi, Alessandra; Parenti, Paolo; Fusi, Paola
Trehalase is involved in the control of trehalose concentration, the main blood sugar in insects. Here, we describe the molecular cloning of the cDNA encoding for the soluble form of the trehalase from the midge larvae of Chironomus riparius, a well-known bioindicator of the quality of freshwater environments. Molecular cloning was achieved through multiple alignment of Diptera trehalase sequences, allowing the synthesis of internal homology-based primers; the complete open reading frame(ORF) was subsequently obtained through RACE-PCR(where RACE is rapid amplification of cDNA ends). The cDNA contained the 5' untranslated region (UTR), the 3' UTR including a poly(A) tail and the ORF of 1,725 bp consisting of 574 amino acid residues with a predicted molecular mass of 65,778 Da. Recombinant trehalase was successfully expressed in Escherichia coli as a His-tagged protein and purified on Ni-NTA affinity chromatography. Primary structure analysis showed a series of characteristic features shared by all insect trehalases, while three-dimensional structure prediction yielded the typical glucosidase fold, the two key residues involved in the catalytic mechanism being conserved. Production of recombinant insect trehalases opens the way to structural characterizations of the catalytic site, which might represent, among others, an element for reconsidering the enzyme as a target in pest insects' control.
Rai, Devendra K.; Lawrence, Paul; Pauszek, Steve J.; Piccone, Maria E.; Knowles, Nick J.; Rieder, Elizabeth
Bovine rhinitis viruses (BRVs) cause mild respiratory disease of cattle. In this study, a near full-length genome sequence of a virus named RS3X (formerly classified as bovine rhinovirus type 1), isolated from infected cattle from the UK in the 1960s, was obtained and analyzed. Compared to other closely related Aphthoviruses, major differences were detected in the leader protease (Lpro), P1, 2B, and 3A proteins. Phylogenetic analysis revealed that RS3X was a member of the species bovine rhinitis A virus (BRAV). Using different codon-based and branch-site selection models for Aphthoviruses, including BRAV RS3X and foot-and-mouth disease virus, we observed no clear evidence for genomic regions undergoing positive selection. However, within each of the BRV species, multiple sites under positive selection were detected. The results also suggest that the probability (determined by Recombination Detection Program) for recombination events between BRVs and other Aphthoviruses, including foot-and-mouth disease virus was not significant. In contrast, within BRVs, the probability of recombination increases. The data reported here provide genetic information to assist in the identification of diagnostic signatures and research tools for BRAV. PMID:27081310
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class
Lv, Changqi; Wang, Peng; Wang, Wencai; Su, Ruirui; Ge, Yadong; Zhu, Youming; Zhu, Guoping
Isocitrate dehydrogenase (IDH) is a key enzyme in the tricarboxylate (TCA) cycle, which may play an important role in the virulence of pathogenic bacteria. Here, two structurally different IDHs from a plant pathogen Xanthomonas campestris pv. campestris 8004 (XccIDH1 and XccIDH2) were characterized in detail. The recombinant XccIDH1 forms homodimer in solution, while the recombinant XccIDH2 is a typical monomer. Phylogenetic analysis showed that XccIDH1 belongs to the type I IDH subfamily and XccIDH2 groups into the monomeric IDH clade. Kinetic characterization demonstrated that XccIDH1's specificity towards NAD(+) was 110-fold greater than NADP(+) , while XccIDH2's specificity towards NADP(+) was 353-fold greater than NAD(+) . The putative coenzyme discriminating amino acids (Asp268, Ile269 and Ala275 for XccIDH1, and Lys589, His590 and Arg601 for XccIDH2) were studied by site-directed mutagenesis. The coenzyme specificities of the two mutants, mXccIDH1 and mXccIDH2, were completely reversed from NAD(+) to NADP(+) , and NADP(+) to NAD(+) , respectively. Furthermore, Ser80 of XccIDH1, and Lys256 and Tyr421 of XccIDH2, were the determinants for the substrate binding. The detailed biochemical properties, such as optimal pH and temperature, thermostability, and metal ion effects, of XccIDH1 and XccIDH2 were further investigated. The possibility of taking the two IDHs into consideration as the targets for drug development to control the plant diseases caused by Xcc 8004 were described and discussed thoroughly.
Mefford, Megan E.; Kunstman, Kevin; Wolinsky, Steven M.; Gabuzda, Dana
Macrophages express low levels of the CD4 receptor compared to T-cells. Macrophage-tropic HIV strains replicating in brain of untreated patients with HIV-associated dementia (HAD) express Envs that are adapted to overcome this restriction through mechanisms that are poorly understood. Here, bioinformatic analysis of env sequence datasets together with functional studies identified polymorphisms in the β3 strand of the HIV gp120 bridging sheet that increase M-tropism. D197, which results in loss of an N-glycan located near the HIV Env trimer apex, was detected in brain in some HAD patients, while position 200 was estimated to be under positive selection. D197 and T/V200 increased fusion and infection of cells expressing low CD4 by enhancing gp120 binding to CCR5. These results identify polymorphisms in the HIV gp120 bridging sheet that overcome the restriction to macrophage infection imposed by low CD4 through enhanced gp120–CCR5 interactions, thereby promoting infection of brain and other macrophage-rich tissues. - Highlights: • We analyze HIV Env sequences and identify amino acids in beta 3 of the gp120 bridging sheet that enhance macrophage tropism. • These amino acids at positions 197 and 200 are present in brain of some patients with HIV-associated dementia. • D197 results in loss of a glycan near the HIV Env trimer apex, which may increase exposure of V3. • These variants may promote infection of macrophages in the brain by enhancing gp120–CCR5 interactions.
Siddiqui, Huma; Chen, Tsute; Aliko, Ardita; Mydel, Piotr M; Jonsson, Roland; Olsen, Ingar
Background Reduced salivation is considered a major clinical feature of most but not all cases of primary Sjögren’s syndrome (pSS). Reduced saliva flow may lead to changes in the salivary microbiota. These changes have mainly been studied with culture that typically recovers only 65% of the bacteria present. Objective This study was to use high throughput sequencing, covering both cultivated and not-yet-cultivated bacteria, to assess the bacterial microbiota of whole saliva in pSS patients with normal salivation. Methods Bacteria of whole unstimulated saliva from nine pSS patients with normal salivation flow and from nine healthy controls were examined by high throughput sequencing of the hypervariable region V1V2 of 16S rRNA using the 454 GS Junior system. Raw sequence reads were subjected to a species-level, reference-based taxonomy assignment pipeline specially designed for studying the human oral microbial community. Each of the sequence reads was BLASTN-searched against a database consisting of reference sequences representing 1,156 oral and 12,013 non-oral species. Unassigned reads were then screened for high-quality non-chimeras and subjected to de novo species-level operational taxonomy unit (OTU) calling for potential novel species. Downstream analyses, including alpha and beta diversities, were analyzed using the Quantitative Insights into Microbial Ecology (QIIME) pipeline. To reveal significant differences between the microbiota of control saliva and Sjögren’s saliva, a statistical method introduced in Metastats www.metastats.cbcb.umd.edu was used. Results Saliva of pSS patients with normal salivation had a significantly higher frequency of Firmicutes compared with controls (p=0.004). Two other major phyla, Synergistetes and Spirochaetes, were significantly depleted in pSS (p=0.001 for both). In addition, we saw a nearly 17% decrease in the number of genera in pSS (25 vs. 30). While Prevotella was almost equally abundant in both groups (25% in p
Guingab-Cagmat, J.D.; Cagmat, E.B.; Hayes, R.L.; Anagli, J.
Traumatic brain injury (TBI) is a major medical crisis without any FDA-approved pharmacological therapies that have been demonstrated to improve functional outcomes. It has been argued that discovery of disease-relevant biomarkers might help to guide successful clinical trials for TBI. Major advances in mass spectrometry (MS) have revolutionized the field of proteomic biomarker discovery and facilitated the identification of several candidate markers that are being further evaluated for their efficacy as TBI biomarkers. However, several hurdles have to be overcome even during the discovery phase which is only the first step in the long process of biomarker development. The high-throughput nature of MS-based proteomic experiments generates a massive amount of mass spectral data presenting great challenges in downstream interpretation. Currently, different bioinformatics platforms are available for functional analysis and data mining of MS-generated proteomic data. These tools provide a way to convert data sets to biologically interpretable results and functional outcomes. A strategy that has promise in advancing biomarker development involves the triad of proteomics, bioinformatics, and systems biology. In this review, a brief overview of how bioinformatics and systems biology tools analyze, transform, and interpret complex MS datasets into biologically relevant results is discussed. In addition, challenges and limitations of proteomics, bioinformatics, and systems biology in TBI biomarker discovery are presented. A brief survey of researches that utilized these three overlapping disciplines in TBI biomarker discovery is also presented. Finally, examples of TBI biomarkers and their applications are discussed. PMID:23750150
Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
This analysis is prepared by the Mined Geologic Disposal System (MGDS) Waste Package Development Department (WPDD) in response to a request received via a QAP-3-12 Design Input Data Request (Ref. 5.1) from WAST Design (formerly MRSMPC Design). The request is to provide: Specific MPC access requirements for the addition of filler materials at the MGDS (i.e., location and size of access required). The objective of this analysis is to provide a response to the foregoing request. The purpose of this analysis is to provide a documented record of the basis for the response. The response is stated in Section 8 herein. The response is based upon requirements from an MGDS perspective.
Grabowska, Anna D.; Wywiał, Ewa; Dunin-Horkawicz, Stanislaw; Łasica, Anna M.; Wösten, Marc M. S. M.; Nagy-Staroń, Anna; Godlewska, Renata; Bocian-Ostrzycka, Katarzyna; Pieńkowska, Katarzyna; Łaniewski, Paweł; Bujnicki, Janusz M.; van Putten, Jos P. M.; Jagusztyn-Krynicka, E. Katarzyna
Background Bacterial Dsb enzymes are involved in the oxidative folding of many proteins, through the formation of disulfide bonds between their cysteine residues. The Dsb protein network has been well characterized in cells of the model microorganism Escherichia coli. To gain insight into the functioning of the Dsb system in epsilon-Proteobacteria, where it plays an important role in the colonization process, we studied two homologs of the main Escherichia coli Dsb oxidase (EcDsbA) that are present in the cells of the enteric pathogen Campylobacter jejuni, the most frequently reported bacterial cause of human enteritis in the world. Methods and Results Phylogenetic analysis suggests the horizontal transfer of the epsilon-Proteobacterial DsbAs from a common ancestor to gamma-Proteobacteria, which then gave rise to the DsbL lineage. Phenotype and enzymatic assays suggest that the two C. jejuni DsbAs play different roles in bacterial cells and have divergent substrate spectra. CjDsbA1 is essential for the motility and autoagglutination phenotypes, while CjDsbA2 has no impact on those processes. CjDsbA1 plays a critical role in the oxidative folding that ensures the activity of alkaline phosphatase CjPhoX, whereas CjDsbA2 is crucial for the activity of arylsulfotransferase CjAstA, encoded within the dsbA2-dsbB-astA operon. Conclusions Our results show that CjDsbA1 is the primary thiol-oxidoreductase affecting life processes associated with bacterial spread and host colonization, as well as ensuring the oxidative folding of particular protein substrates. In contrast, CjDsbA2 activity does not affect the same processes and so far its oxidative folding activity has been demonstrated for one substrate, arylsulfotransferase CjAstA. The results suggest the cooperation between CjDsbA2 and CjDsbB. In the case of the CjDsbA1, this cooperation is not exclusive and there is probably another protein to be identified in C. jejuni cells that acts to re-oxidize CjDsbA1. Altogether
Arrebola, Eva; Carrión, Víctor J.; Gutiérrez-Barranquero, José Antonio; Pérez-García, Alejandro; Ramos, Cayo; Cazorla, Francisco M.; de Vicente, Antonio
The genome sequence of more than 100 Pseudomonas syringae strains has been sequenced to date; however only few of them have been fully assembled, including P. syringae pv. syringae B728a. Different strains of pv. syringae cause different diseases and have different host specificities; so, UMAF0158 is a P. syringae pv. syringae strain related to B728a but instead of being a bean pathogen it causes apical necrosis of mango trees, and the two strains belong to different phylotypes of pv.syringae and clades of P. syringae. In this study we report the complete sequence and annotation of P. syringae pv. syringae UMAF0158 chromosome and plasmid pPSS158. A comparative analysis with the available sequenced genomes of other 25 P. syringae strains, both closed (the reference genomes DC3000, 1448A and B728a) and draft genomes was performed. The 5.8 Mb UMAF0158 chromosome has 59.3% GC content and comprises 5017 predicted protein-coding genes. Bioinformatics analysis revealed the presence of genes potentially implicated in the virulence and epiphytic fitness of this strain. We identified several genetic features, which are absent in B728a, that may explain the ability of UMAF0158 to colonize and infect mango trees: the mangotoxin biosynthetic operon mbo, a gene cluster for cellulose production, two different type III and two type VI secretion systems, and a particular T3SS effector repertoire. A mutant strain defective in the rhizobial-like T3SS Rhc showed no differences compared to wild-type during its interaction with host and non-host plants and worms. Here we report the first complete sequence of the chromosome of a pv. syringae strain pathogenic to a woody plant host. Our data also shed light on the genetic factors that possibly determine the pathogenic and epiphytic lifestyle of UMAF0158. This work provides the basis for further analysis on specific mechanisms that enable this strain to infect woody plants and for the functional analysis of host specificity in the P
Inza, Iñaki; Calvo, Borja; Armañanzas, Rubén; Bengoetxea, Endika; Larrañaga, Pedro; Lozano, José A
The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process. In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community.
Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315
Rodet, Xavier; Schwarz, Diemo
The subject of this chapter is the estimation, representation, modification, and use of spectral envelopes in the context of sinusoidal-additive-plus-residual analysis/synthesis. A spectral envelope is an amplitude-vs-frequency function, which may be obtained from the envelope of a short-time spectrum (Rodet et al., 1987; Schwarz, 1998). [Precise definitions of such an envelope and short-time spectrum (STS) are given in Section 2.] The additive-plus-residual analysis/synthesis method is based on a representation of signals in terms of a sum of time-varying sinusoids and of a non-sinusoidal residual signal [e.g., see Serra (1989), Laroche et al. (1993), McAulay and Quatieri (1995), and Ding and Qian (1997)]. Many musical sound signals may be described as a combination of a nearly periodic waveform and colored noise. The nearly periodic part of the signal can be viewed as a sum of sinusoidal components, called partials, with time-varying frequency and amplitude. Such sinusoidal components are easily observed on a spectral analysis display (Fig. 5.1) as obtained, for instance, from a discrete Fourier transform.
Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were analysed to predict the
Leung, Anthony K L; Andersen, Jens S; Mann, Matthias; Lamond, Angus I
The nucleolus is a plurifunctional, nuclear organelle, which is responsible for ribosome biogenesis and many other functions in eukaryotes, including RNA processing, viral replication and tumour suppression. Our knowledge of the human nucleolar proteome has been expanded dramatically by the two recent MS studies on isolated nucleoli from HeLa cells [Andersen, Lyon, Fox, Leung, Lam, Steen, Mann and Lamond (2002) Curr. Biol. 12, 1-11; Scherl, Coute, Deon, Calle, Kindbeiter, Sanchez, Greco, Hochstrasser and Diaz (2002) Mol. Biol. Cell 13, 4100-4109]. Nearly 400 proteins were identified within the nucleolar proteome so far in humans. Approx. 12% of the identified proteins were previously shown to be nucleolar in human cells and, as expected, nearly all of the known housekeeping proteins required for ribosome biogenesis were identified in these analyses. Surprisingly, approx. 30% represented either novel or uncharacterized proteins. This review focuses on how to apply the derived knowledge of this newly recognized nucleolar proteome, such as their amino acid/peptide composition and their homologies across species, to explore the function and dynamics of the nucleolus, and suggests ways to identify, in silico, possible functions of the novel/uncharacterized proteins and potential interaction networks within the human nucleolus, or between the nucleolus and other nuclear organelles, by drawing resources from the public domain. PMID:14531731
Mcwilliam, Hamish; Valentin, Franck; Goujon, Mickael; Li, Weizhong; Narayanasamy, Menaka; Martin, Jenny; Miyar, Teresa; Lopez, Rodrigo
The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009–2010. PMID:19435877
Translational bioinformatics plays an indispensable role in transforming psychoneuroimmunology (PNI) into personalized medicine. It provides a powerful method to bridge the gaps between various knowledge domains in PNI and systems biology. Translational bioinformatics methods at various systems levels can facilitate pattern recognition, and expedite and validate the discovery of systemic biomarkers to allow their incorporation into clinical trials and outcome assessments. Analysis of the correlations between genotypes and phenotypes including the behavioral-based profiles will contribute to the transition from the disease-based medicine to human-centered medicine. Translational bioinformatics would also enable the establishment of predictive models for patient responses to diseases, vaccines, and drugs. In PNI research, the development of systems biology models such as those of the neurons would play a critical role. Methods based on data integration, data mining, and knowledge representation are essential elements in building health information systems such as electronic health records and computerized decision support systems. Data integration of genes, pathophysiology, and behaviors are needed for a broad range of PNI studies. Knowledge discovery approaches such as network-based systems biology methods are valuable in studying the cross-talks among pathways in various brain regions involved in disorders such as Alzheimer's disease.
Williams, Christopher L.; Sica, Jeffrey C.; Killen, Robert T.; Balis, Ulysses G. J.
Objective: Within the information technology (IT) industry, best practices and standards are constantly evolving and being refined. In contrast, computer technology utilized within the healthcare industry often evolves at a glacial pace, with reduced opportunities for justified innovation. Although the use of timely technology refreshes within an enterprise's overall technology stack can be costly, thoughtful adoption of select technologies with a demonstrated return on investment can be very effective in increasing productivity and at the same time, reducing the burden of maintenance often associated with older and legacy systems. In this brief technical communication, we introduce the concept of microservices as applied to the ecosystem of data analysis pipelines. Microservice architecture is a framework for dividing complex systems into easily managed parts. Each individual service is limited in functional scope, thereby conferring a higher measure of functional isolation and reliability to the collective solution. Moreover, maintenance challenges are greatly simplified by virtue of the reduced architectural complexity of each constitutive module. This fact notwithstanding, rendered overall solutions utilizing a microservices-based approach provide equal or greater levels of functionality as compared to conventional programming approaches. Bioinformatics, with its ever-increasing demand for performance and new testing algorithms, is the perfect use-case for such a solution. Moreover, if promulgated within the greater development community as an open-source solution, such an approach holds potential to be transformative to current bioinformatics software development. Context: Bioinformatics relies on nimble IT framework which can adapt to changing requirements. Aims: To present a well-established software design and deployment strategy as a solution for current challenges within bioinformatics Conclusions: Use of the microservices framework is an effective
Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L
Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this "borderland." As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature.
Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L.
Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this “borderland.” As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature. PMID:27453689
Lue, Jaw-Chyng L.; Fang, Wai-Chi
A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.
As preparation of the IPCC's Third Assessment Report takes place, one of the many observed climate variables of key interest is cloud amount. For several nations of the world, there exist records of surface-observed cloud amount dating back to the middle of the 20th Century or earlier, offering valuable information on variations and trends. Studies using such databases include Sun and Groisman (1999) and Kaiser and Razuvaev (1995) for the former Soviet Union, Angel1 et al. (1984) for the United States, Henderson-Sellers (1986) for Europe, Jones and Henderson-Sellers (1992) for Australia, and Kaiser (1998) for China. The findings of Kaiser (1998) differ from the other studies in that much of China appears to have experienced decreased cloudiness over recent decades (1954-1994), whereas the other land regions for the most part show evidence of increasing cloud cover. This paper expands on Kaiser (1998) by analyzing trends in additional meteorological variables for Chi na [station pressure (p), water vapor pressure (e), and relative humidity (rh)] and extending the total cloud amount (N) analysis an additional two years (through 1996).
Pinto, Jose Miguel; Arrieta, Cristobal; Andia, Marcelo E; Uribe, Sergio; Ramos-Grez, Jorge; Vargas, Alex; Irarrazaval, Pablo; Tejos, Cristian
Additive manufacturing (AM) models are used in medical applications for surgical planning, prosthesis design and teaching. For these applications, the accuracy of the AM models is essential. Unfortunately, this accuracy is compromised due to errors introduced by each of the building steps: image acquisition, segmentation, triangulation, printing and infiltration. However, the contribution of each step to the final error remains unclear. We performed a sensitivity analysis comparing errors obtained from a reference with those obtained modifying parameters of each building step. Our analysis considered global indexes to evaluate the overall error, and local indexes to show how this error is distributed along the surface of the AM models. Our results show that the standard building process tends to overestimate the AM models, i.e. models are larger than the original structures. They also show that the triangulation resolution and the segmentation threshold are critical factors, and that the errors are concentrated at regions with high curvatures. Errors could be reduced choosing better triangulation and printing resolutions, but there is an important need for modifying some of the standard building processes, particularly the segmentation algorithms.
This book chapter describes the current Big Data problem in Bioinformatics and the resulting issues with performing reproducible computational research. The core of the chapter provides guidelines and summaries of current tools/techniques that a noncomputational researcher would need to learn to pe...
Background Network Tools and Applications in Biology (NETTAB) Workshops are a series of meetings focused on the most promising and innovative ICT tools and to their usefulness in Bioinformatics. The NETTAB 2011 workshop, held in Pavia, Italy, in October 2011 was aimed at presenting some of the most relevant methods, tools and infrastructures that are nowadays available for Clinical Bioinformatics (CBI), the research field that deals with clinical applications of bioinformatics. Methods In this editorial, the viewpoints and opinions of three world CBI leaders, who have been invited to participate in a panel discussion of the NETTAB workshop on the next challenges and future opportunities of this field, are reported. These include the development of data warehouses and ICT infrastructures for data sharing, the definition of standards for sharing phenotypic data and the implementation of novel tools to implement efficient search computing solutions. Results Some of the most important design features of a CBI-ICT infrastructure are presented, including data warehousing, modularity and flexibility, open-source development, semantic interoperability, integrated search and retrieval of -omics information. Conclusions Clinical Bioinformatics goals are ambitious. Many factors, including the availability of high-throughput "-omics" technologies and equipment, the widespread availability of clinical data warehouses and the noteworthy increase in data storage and computational power of the most recent ICT systems, justify research and efforts in this domain, which promises to be a crucial leveraging factor for biomedical research. PMID:23095472
Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included.
Deng, Youping; Ni, Jun; Zhang, Chaoyang
The first symposium of computations in bioinformatics and bioscience (SCBB06) was held in Hangzhou, China on June 21-22, 2006. Twenty-six peer-reviewed papers were selected for publication in this special issue of BMC Bioinformatics. These papers cover a broad range of topics including bioinformatics theories, algorithms, applications and tool development. The main technical topics contain gene expression analysis, sequence analysis, genome analysis, phylogenetic analysis, gene function prediction, molecular interaction and system biology, genetics and population study, immune strategy, protein structure prediction and proteomics.
support programs. 14. SUBJECT TERMS additive manufacturing, 3D printing, technology adoption 15. NUMBER OF PAGES 69 16...LEFT BLANK xii LIST OF ACRONYMS AND ABBREVIATIONS 3D Three Dimensions or Three Dimensional 3DP 3D Printing AM Additive Manufacturing AMDO...this is about to change. Additive manufacturing (AM) systems (commonly known as “ 3D printing”) could bring the organic parts manufacturing capability
Schneider, Maria V; Orchard, Sandra
We provide an overview on the state of the art for the Omics technologies, the types of omics data and the bioinformatics resources relevant and related to Omics. We also illustrate the bioinformatics challenges of dealing with high-throughput data. This overview touches several fundamental aspects of Omics and bioinformatics: data standardisation, data sharing, storing Omics data appropriately and exploring Omics data in bioinformatics. Though the principles and concepts presented are true for the various different technological fields, we concentrate in three main Omics fields namely: genomics, transcriptomics and proteomics. Finally we address the integration of Omics data, and provide several useful links for bioinformatics and Omics.
Merklein, Marion; Junker, Daniel; Schaub, Adam; Neubauer, Franziska
Imposing the trend of mass customization of lightweight construction in industry, conventional manufacturing processes like forming technology and chipping production are pushed to their limits for economical manufacturing. More flexible processes are needed which were developed by the additive manufacturing technology. This toolless production principle offers a high geometrical freedom and an optimized utilization of the used material. Thus load adjusted lightweight components can be produced in small lot sizes in an economical way. To compensate disadvantages like inadequate accuracy and surface roughness hybrid machines combining additive and subtractive manufacturing are developed. Within this paper the principles of mainly used additive manufacturing processes of metals and their possibility to be integrated into a hybrid production machine are summarized. It is pointed out that in particular the integration of deposition processes into a CNC milling center supposes high potential for manufacturing larger parts with high accuracy. Furthermore the combination of additive and subtractive manufacturing allows the production of ready to use products within one single machine. Additionally actual research for the integration of additive manufacturing processes into the production chain will be analyzed. For the long manufacturing time of additive production processes the combination with conventional manufacturing processes like sheet or bulk metal forming seems an effective solution. Especially large volumes can be produced by conventional processes. In an additional production step active elements can be applied by additive manufacturing. This principle is also investigated for tool production to reduce chipping of the high strength material used for forming tools. The aim is the addition of active elements onto a geometrical simple basis by using Laser Metal Deposition. That process allows the utilization of several powder materials during one process what
Wang, Youping; Sonntag, Karin; Rudloff, Eicke; Wehling, Peter; Snowdon, Rod J
Two Brassica napus-Crambe abyssinica monosomic addition lines (2n=39, AACC plus a single chromosome from C. abyssinca) were obtained from the F(2) progeny of the asymmetric somatic hybrid. The alien chromosome from C. abyssinca in the addition line was clearly distinguished by genomic in situ hybridization (GISH). Twenty-seven microspore-derived plants from the addition lines were obtained. Fourteen seedlings were determined to be diploid plants (2n=38) arising from spontaneous chromosome doubling, while 13 seedlings were confirmed as haploid plants. Doubled haploid plants produced after treatment with colchicine and two disomic chromosome addition lines (2n=40, AACC plus a single pair of homologous chromosomes from C. abyssinca) could again be identified by GISH analysis. The lines are potentially useful for molecular genetic analysis of novel C. abyssinica genes or alleles contributing to traits relevant for oilseed rape (B. napus) breeding.
Mohabatkar, Hassan; Keyhanfar, Mehrnaz; Behbahani, Mandana
Scientists have united in a common search to sequence, store and analyze genes and proteins. In this regard, rapidly evolving bioinformatics methods are providing valuable information on these newly-discovered molecules. Understanding what has been done and what we can do in silico is essential in designing new experiments. The unbalanced situation between sequence-known proteins and attribute-known proteins, has called for developing computational methods or high-throughput automated tools for fast and reliably predicting or identifying various characteristics of uncharacterized proteins. Taking into consideration the role of viruses in causing diseases and their use in biotechnology, the present review describes the application of protein bioinformatics in virology. Therefore, a number of important features of viral proteins like epitope prediction, protein docking, subcellular localization, viral protease cleavage sites and computer based comparison of their aspects have been discussed. This paper also describes several tools, principally developed for viral bioinformatics. Prediction of viral protein features and learning the advances in this field can help basic understanding of the relationship between a virus and its host.
Zales, Charlotte Rappe; Cronin, Susan J.
Sixteen high school women participated in a 5-week residential summer program designed to encourage female and minority students to choose careers in scientific fields. Students gained expertise in bioinformatics through problem-based learning in a complex learning environment of content instruction, speakers, labs, and trips. Innovative hands-on activities filled the program. Students learned biological principles in context and sophisticated bioinformatics tools for processing data. Students additionally mastered a variety of information-searching techniques. Students completed creative individual and group projects, demonstrating the successful integration of biology, information technology, and bioinformatics. Discussions with female scientists allowed students to see themselves in similar roles. Summer residential aspects fostered an atmosphere in which students matured in interacting with others and in their views of diversity.
Symeonidis, Iphigenia Sofia
This paper aims to elucidate guiding concepts for the design of powerful undergraduate bioinformatics degrees which will lead to a conceptual framework for the curriculum. "Powerful" here should be understood as having truly bioinformatics objectives rather than enrichment of existing computer science or life science degrees on which bioinformatics degrees are often based. As such, the conceptual framework will be one which aims to demonstrate intellectual honesty in regards to the field of bioinformatics. A synthesis/conceptual analysis approach was followed as elaborated by Hurd (1983). The approach takes into account the following: bioinfonnatics educational needs and goals as expressed by different authorities, five undergraduate bioinformatics degrees case-studies, educational implications of bioinformatics as a technoscience and approaches to curriculum design promoting interdisciplinarity and integration. Given these considerations, guiding concepts emerged and a conceptual framework was elaborated. The practice of bioinformatics was given a closer look, which led to defining tool-integration skills and tool-thinking capacity as crucial areas of the bioinformatics activities spectrum. It was argued, finally, that a process-based curriculum as a variation of a concept-based curriculum (where the concepts are processes) might be more conducive to the teaching of bioinformatics given a foundational first year of integrated science education as envisioned by Bialek and Botstein (2004). Furthermore, the curriculum design needs to define new avenues of communication and learning which bypass the traditional disciplinary barriers of academic settings as undertaken by Tador and Tidmor (2005) for graduate studies.
Li, Meng; Chen, Yi-Bu; Clintworth, William A
Question: How can a library-based bioinformatics support program be implemented and expanded to continuously support the growing and changing needs of the research community? Setting: A program at a health sciences library serving a large academic medical center with a strong research focus is described. Methods: The bioinformatics service program was established at the Norris Medical Library in 2005. As part of program development, the library assessed users' bioinformatics needs, acquired additional funds, established and expanded service offerings, and explored additional roles in promoting on-campus collaboration. Results: Personnel and software have increased along with the number of registered software users and use of the provided services. Conclusion: With strategic efforts and persistent advocacy within the broader university environment, library-based bioinformatics service programs can become a key part of an institution's comprehensive solution to researchers' ever-increasing bioinformatics needs. PMID:24163602
Attwood, Teresa K; Atwood, Teresa K; Bongcam-Rudloff, Erik; Brazas, Michelle E; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M; Schneider, Maria Victoria; van Gelder, Celia W G
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy--paradoxically, many are actually closing "niche" bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all.
Schneider, Maria V; Walter, Peter; Blatter, Marie-Claude; Watson, James; Brazas, Michelle D; Rother, Kristian; Budd, Aidan; Via, Allegra; van Gelder, Celia W G; Jacob, Joachim; Fernandes, Pedro; Nyrönen, Tommi H; De Las Rivas, Javier; Blicher, Thomas; Jimenez, Rafael C; Loveland, Jane; McDowall, Jennifer; Jones, Phil; Vaughan, Brendan W; Lopez, Rodrigo; Attwood, Teresa K; Brooksbank, Catherine
Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of 'high-throughput biology', the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.
Lima, Andre O. S.; Garces, Sergio P. S.
Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…
Sheppard, Paul R; Helsel, Dennis R; Speakman, Robert J; Ridenour, Gary; Witten, Mark L
Previously reported dendrochemical data showed temporal variability in concentration of tungsten (W) and cobalt (Co) in tree rings of Fallon, Nevada, US. Criticism of this work questioned the use of the Mann-Whitney test for determining change in element concentrations. Here, we demonstrate that Mann-Whitney is appropriate for comparing background element concentrations to possibly elevated concentrations in environmental media. Given that Mann-Whitney tests for differences in shapes of distributions, inter-tree variability (e.g., "coefficient of median variation") was calculated for each measured element across trees within subsites and time periods. For W and Co, the metals of highest interest in Fallon, inter-tree variability was always higher within versus outside of Fallon. For calibration purposes, this entire analysis was repeated at a different town, Sweet Home, Oregon, which has a known tungsten-powder facility, and inter-tree variability of W in tree rings confirmed the establishment date of that facility. Mann-Whitney testing of simulated data also confirmed its appropriateness for analysis of data affected by point-source contamination. This research adds important new dimensions to dendrochemistry of point-source contamination by adding analysis of inter-tree variability to analysis of central tendency. Fallon remains distinctive by a temporal increase in W beginning by the mid 1990s and by elevated Co since at least the early 1990s, as well as by high inter-tree variability for W and Co relative to comparison towns.
Tomazic, William A; Schmidt, Harold W; Tischler, Adelbert O
The effect of adding fluorine to the Vanguard first-stage oxidant was anlyzed. An increase in specific impulse of 5.74 percent may be obtained with 30 percent fluorine. This increase, coupled with increased mass ratio due to greater oxidant density, gave up to 24.6-percent increase in first-stage burnout energy with 30 percent fluorine added. However, a change in tank configuration is required to accommodate the higher oxidant-fuel ratio necessary for peak specific impulse with fluorine addition.
Karikari, Thomas K.
Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics. PMID:26378921
Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M
Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part’s porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented. PMID:26601041
Slotwinski, John A; Garboczi, Edward J; Hebenstreit, Keith M
Additive manufacturing techniques can produce complex, high-value metal parts, with potential applications as critical metal components such as those found in aerospace engines and as customized biomedical implants. Material porosity in these parts is undesirable for aerospace parts - since porosity could lead to premature failure - and desirable for some biomedical implants - since surface-breaking pores allows for better integration with biological tissue. Changes in a part's porosity during an additive manufacturing build may also be an indication of an undesired change in the build process. Here, we present efforts to develop an ultrasonic sensor for monitoring changes in the porosity in metal parts during fabrication on a metal powder bed fusion system. The development of well-characterized reference samples, measurements of the porosity of these samples with multiple techniques, and correlation of ultrasonic measurements with the degree of porosity are presented. A proposed sensor design, measurement strategy, and future experimental plans on a metal powder bed fusion system are also presented.
Ozdemir, Abdil; Lin, Jung-Lee; Gillig, Kent J.; Gulfen, Mustafa; Chen, Chung-Hsuan
In this work, we present the detection sensitivity improvement of electrospray ionization (ESI) mass spectrometry of neutral saccharides in a positive ion mode by the addition of various amino acids. Saccharides of a broad molecular weight range were chosen as the model compounds in the present study. Saccharides provide strong noncovalent interactions with amino acids, and the complex formation enhances the signal intensity and simplifies the mass spectra of saccharides. Polysaccharides provide a polymer-like ESI spectrum with a basic subunit difference between multiply charged chains. The protonated spectra of saccharides are not well identified because of different charge state distributions produced by the same molecules. Depending on the solvent used and other ions or molecules present in the solution, noncovalent interactions with saccharides may occur. These interactions are affected by the addition of amino acids. Amino acids with polar side groups show a strong tendency to interact with saccharides. In particular, serine shows a high tendency to interact with saccharides and significantly improves the detection sensitivity of saccharide compounds.
Hadley, Stanton W
Between 2010 and 2012 the Eastern Interconnection Planning Collaborative (EIPC) conducted a major long-term resource and transmission study of the Eastern Interconnection (EI). With guidance from a Stakeholder Steering Committee (SSC) that included representatives from the Eastern Interconnection States Planning Council (EISPC) among others, the project was conducted in two phases. Phase 1 involved a long-term capacity expansion analysis that involved creation of eight major futures plus 72 sensitivities. Three scenarios were selected for more extensive transmission- focused evaluation in Phase 2. Five power flow analyses, nine production cost model runs (including six sensitivities), and three capital cost estimations were developed during this second phase. The results from Phase 1 and 2 provided a wealth of data that could be examined further to address energy-related questions. A list of 13 topics was developed for further analysis; this paper discusses the first five.
Kossida, Sophia; Tahri, Nadia; Daizadeh, Iraj
With the completion of the human genome, and the imminent completion of other large-scale sequencing and structure-determination projects, computer-assisted bioscience is aimed to become the new paradigm for conducting basic and applied research. The presence of these additional bioinformatics tools stirs great anxiety for experimental researchers (as well as for pedagogues), since they are now faced with a wider and deeper knowledge of differing disciplines (biology, chemistry, physics, mathematics, and computer science). This review targets those individuals who are interested in using computational methods in their teaching or research. By analyzing a real-life, pharmaceutical, multicomponent, target-based example the reader will experience this fascinating new discipline.
Sauer, David B.; Karpowich, Nathan K.; Song, Jin Mei; Wang, Da-Neng
Ex vivo stability is a valuable protein characteristic but is laborious to improve experimentally. In addition to biopharmaceutical and industrial applications, stable protein is important for biochemical and structural studies. Taking advantage of the large number of available genomic sequences and growth temperature data, we present two bioinformatic methods to identify a limited set of amino acids or positions that likely underlie thermostability. Because these methods allow thousands of homologs to be examined in silico, they have the advantage of providing both speed and statistical power. Using these methods, we introduced, via mutation, amino acids from thermoadapted homologs into an exemplar mesophilic membrane protein, and demonstrated significantly increased thermostability while preserving protein activity. PMID:26445442
Zhang, Jian Bo; Zhang, Hong; Wang, Hua Li; Zhang, Ji Yue; Luo, Peng Jie; Zhu, Lei; Wang, Zhu Tian
This study was to analyze the risk of sulfites in food consumed by the Chinese people and assess the health protection capability of maximum-permitted level (MPL) of sulfites in GB 2760-2011. Sulfites as food additives are overused or abused in many food categories. When the MPL in GB 2760-2011 was used as sulfites content in food, the intake of sulfites in most surveyed populations was lower than the acceptable daily intake (ADI). Excess intake of sulfites was found in all the surveyed groups when a high percentile of sulfites in food was in taken. Moreover, children aged 1-6 years are at a high risk to intake excess sulfites. The primary cause for the excess intake of sulfites in Chinese people is the overuse and abuse of sulfites by the food industry. The current MPL of sulfites in GB 2760-2011 protects the health of most populations.
Maule, Alexis L; Makey, Colleen M; Benson, Eugene B; Burrows, Isaac J; Scammell, Madeleine K
Hydraulic fracturing is used to extract natural gas from shale formations. The process involves injecting into the ground fracturing fluids that contain thousands of gallons of chemical additives. Companies are not mandated by federal regulations to disclose the identities or quantities of chemicals used during hydraulic fracturing operations on private or public lands. States have begun to regulate hydraulic fracturing fluids by mandating chemical disclosure. These laws have shortcomings including nondisclosure of proprietary or "trade secret" mixtures, insufficient penalties for reporting inaccurate or incomplete information, and timelines that allow for after-the-fact reporting. These limitations leave lawmakers, regulators, public safety officers, and the public uninformed and ill-prepared to anticipate and respond to possible environmental and human health hazards associated with hydraulic fracturing fluids. We explore hydraulic fracturing exemptions from federal regulations, as well as current and future efforts to mandate chemical disclosure at the federal and state level.
Reversible protein phosphorylation is one of the most important forms of cellular regulation. Thus, phosphoproteomic analysis of protein phosphorylation in cells is a powerful tool to evaluate cell functional status. The importance of protein kinase-regulated signal transduction pathways in human cancer has led to the development of drugs that inhibit protein kinases at the apex or intermediary levels of these pathways. Phosphoproteomic analysis of these signalling pathways will provide important insights for operation and connectivity of these pathways to facilitate identification of the best targets for cancer therapies. Enrichment of phosphorylated proteins or peptides from tissue or bodily fluid samples is required. The application of technologies such as phosphoenrichments, mass spectrometry (MS) coupled to bioinformatics tools is crucial for the identification and quantification of protein phosphorylation sites for advancing in such relevant clinical research. A combination of different phosphopeptide enrichments, quantitative techniques and bioinformatic tools is necessary to achieve good phospho-regulation data and good structural analysis of protein studies. The current and most useful proteomics and bioinformatics techniques will be explained with research examples. Our aim in this article is to be helpful for cancer research via detailing proteomics and bioinformatic tools. PMID:21967744
Olivier, Timothée; Chappuis, Pierre; Tsantoulis, Petros
Bioinformatics is essential in clinical oncology and research. Combining biology, computer science and mathematics, bioinformatics aims to derive useful information from clinical and biological data, often poorly structured, at a large scale. Bioinformatics approaches have reclassified certain cancers based on their molecular and biological presentation, improving treatment selection. Many molecular signatures have been developed and, after validation, some are now usable in clinical practice. Other applications could facilitate daily practice, reduce the risk of error and increase the precision of medical decision-making. Bioinformatics must evolve in accordance with ethical considerations and requires multidisciplinary collaboration. Its application depends on a sound technical foundation that meets strict quality requirements.
Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott
Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth.
Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P
Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database.
Ozyigit, Ibrahim I.; Filiz, Ertugrul; Vatansever, Recep; Kurtoglu, Kuaybe Y.; Koc, Ibrahim; Öztürk, Münir X.; Anjum, Naser A.
Among major reactive oxygen species (ROS), hydrogen peroxide (H2O2) exhibits dual roles in plant metabolism. Low levels of H2O2 modulate many biological/physiological processes in plants; whereas, its high level can cause damage to cell structures, having severe consequences. Thus, steady-state level of cellular H2O2 must be tightly regulated. Glutathione peroxidases (GPX) and ascorbate peroxidase (APX) are two major ROS-scavenging enzymes which catalyze the reduction of H2O2 in order to prevent potential H2O2-derived cellular damage. Employing bioinformatics approaches, this study presents a comparative evaluation of both GPX and APX in 18 different plant species, and provides valuable insights into the nature and complex regulation of these enzymes. Herein, (a) potential GPX and APX genes/proteins from 18 different plant species were identified, (b) their exon/intron organization were analyzed, (c) detailed information about their physicochemical properties were provided, (d) conserved motif signatures of GPX and APX were identified, (e) their phylogenetic trees and 3D models were constructed, (f) protein-protein interaction networks were generated, and finally (g) GPX and APX gene expression profiles were analyzed. Study outcomes enlightened GPX and APX as major H2O2-scavenging enzymes at their structural and functional levels, which could be used in future studies in the current direction. PMID:27047498
Li, Wei; Zhong, Chaoqin; Jiao, Jun; Li, Peng; Cui, Baoxia; Ji, Chunyan; Ma, Daoxin
Circular RNAs (circRNAs) represent a widespread class of non-coding RNAs, which drew little attention in the past. Recently, limited data showed their promising future to act as biomarkers in human cancer, but the characteristics and functions remain largely unknown in hematopoietic malignancies, especially in leukemia. In this study, with the help of circRNA microarray, we demonstrated the expression profile of circRNAs in acute myeloid leukemia (AML) patients, and identified a large number of circRNAs possibly expressed in a leukemia specific manner. We also described a circRNA signature related to AML risk-status based on the bioinformatics prediction. In particular, a downregulated circRNA, hsa_circ_0004277, was characterized and functionally evaluated in a cohort of 115 human samples, thus offering a potential diagnostic marker and treatment target in AML. Interestingly, we found chemotherapy could significantly restore the expression of hsa_circ_0004277, indicating the increasing level of hsa_circ_0004277 was associated with successful treatment. Furthermore, a detailed circRNA-miRNA-mRNA interaction network was presented for hsa_circ_0004277, allowing us to better understand its underlying mechanisms for function in AML.
Li, Wei; Zhong, Chaoqin; Jiao, Jun; Li, Peng; Cui, Baoxia; Ji, Chunyan; Ma, Daoxin
Circular RNAs (circRNAs) represent a widespread class of non-coding RNAs, which drew little attention in the past. Recently, limited data showed their promising future to act as biomarkers in human cancer, but the characteristics and functions remain largely unknown in hematopoietic malignancies, especially in leukemia. In this study, with the help of circRNA microarray, we demonstrated the expression profile of circRNAs in acute myeloid leukemia (AML) patients, and identified a large number of circRNAs possibly expressed in a leukemia specific manner. We also described a circRNA signature related to AML risk-status based on the bioinformatics prediction. In particular, a downregulated circRNA, hsa_circ_0004277, was characterized and functionally evaluated in a cohort of 115 human samples, thus offering a potential diagnostic marker and treatment target in AML. Interestingly, we found chemotherapy could significantly restore the expression of hsa_circ_0004277, indicating the increasing level of hsa_circ_0004277 was associated with successful treatment. Furthermore, a detailed circRNA–miRNA–mRNA interaction network was presented for hsa_circ_0004277, allowing us to better understand its underlying mechanisms for function in AML. PMID:28282919
Berends, Koen; Warmink, Jord; Hulscher, Suzanne
the proposed intervention. The implicit assumption underlying such analysis is that both models are commensurable. We hypothesize that they are commensurable only to a certain extent. In an idealised study we have demonstrated that prediction performance loss should be expected with increasingly large engineering works. When accounting for parametric uncertainty of floodplain roughness in model identification, we see uncertainty bounds for predicted effects of interventions increase with increasing intervention scale. Calibration of these types of models therefore seems to have a shelf-life, beyond which calibration does not longer improves prediction. Therefore a qualification scheme for model use is required that can be linked to model validity. In this study, we characterize model use along three dimensions: extrapolation (using the model with different external drivers), extension (using the model for different output or indicators) and modification (using modified models). Such use of models is expected to have implications for the applicability of surrogating modelling for efficient uncertainty analysis as well, which is recommended for future research. Warmink, J. J.; Straatsma, M. W.; Huthoff, F.; Booij, M. J. & Hulscher, S. J. M. H. 2013. Uncertainty of design water levels due to combined bed form and vegetation roughness in the Dutch river Waal. Journal of Flood Risk Management 6, 302-318 . DOI: 10.1111/jfr3.12014
Honts, Jerry E.
Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489
Blagodatskaya, Evgenia; Blagodatsky, Sergey; Yuyukina, Tatayna; Kuzyakov, Yakov
Heterotrophic component of CO2 emitted from soil is mainly due to the respiratory activity of soil microorganisms. Field measurements of microbial respiration can be used for estimation of C-budget in soil, while laboratory estimation of respiration kinetics allows the elucidation of mechanisms of soil C sequestration. Physiological approaches based on 1) time-dependent or 2) substrate-dependent respiratory response of soil microorganisms decomposing the organic substrates allow to relate the functional properties of soil microbial community with decomposition rates of soil organic matter. We used a novel methodology combining (i) microbial growth kinetics and (ii) enzymes affinity to the substrate to show the shift in functional properties of the soil microbial community after amendments with substrates of contrasting availability. We combined the application of 14C labeled glucose as easily available C source to soil with natural isotope labeling of old and young soil SOM. The possible contribution of two processes: isotopic fractionation and preferential substrate utilization to the shifts in δ13C during SOM decomposition in soil after C3-C4 vegetation change was evaluated. Specific growth rate (µ) of soil microorganisms was estimated by fitting the parameters of the equation v(t) = A + B * exp(µ*t), to the measured CO2 evolution rate (v(t)) after glucose addition, and where A is the initial rate of non-growth respiration, B - initial rate of the growing fraction of total respiration. Maximal mineralization rate (Vmax), substrate affinity of microbial enzymes (Ks) and substrate availability (Sn) were determined by Michaelis-Menten kinetics. To study the effect of plant originated C on δ13C signature of SOM we compared the changes in isotopic composition of different C pools in C3 soil under grassland with C3-C4 soil where C4 plant Miscanthus giganteus was grown for 12 years on the plot after grassland. The shift in 13δ C caused by planting of M. giganteus
McEntire, R; Karp, P; Abernethy, N; Benton, D; Helt, G; DeJongh, M; Kent, R; Kosky, A; Lewis, S; Hodnett, D; Neumann, E; Olken, F; Pathak, D; Tarczy-Hornoch, P; Toldo, L; Topaloglou, T
Ontologies are specifications of the concepts in a given field, and of the relationships among those concepts. The development of ontologies for molecular-biology information and the sharing of those ontologies within the bioinformatics community are central problems in bioinformatics. If the bioinformatics community is to share ontologies effectively, ontologies must be exchanged in a form that uses standardized syntax and semantics. This paper reports on an effort among the authors to evaluate alternative ontology-exchange languages, and to recommend one or more languages for use within the larger bioinformatics community. The study selected a set of candidate languages, and defined a set of capabilities that the ideal ontology-exchange language should satisfy. The study scored the languages according to the degree to which they satisfied each capability. In addition, the authors performed several ontology-exchange experiments with the two languages that received the highest scores: OML and Ontolingua. The result of those experiments, and the main conclusion of this study, was that the frame-based semantic model of Ontolingua is preferable to the conceptual graph model of OML, but that the XML-based syntax of OML is preferable to the Lisp-based syntax of Ontolingua.
Koo, Hyunmin; Hakim, Joseph A; Fisher, Phillip R E; Grueneberg, Alexander; Andersen, Dale T; Bej, Asim K
In this study, we report the distribution and abundance of cold-adaptation proteins in microbial mat communities in the perennially ice-covered Lake Joyce, located in the McMurdo Dry Valleys, Antarctica. We have used MG-RAST and R code bioinformatics tools on Illumina HiSeq2000 shotgun metagenomic data and compared the filtering efficacy of these two methods on cold-adaptation proteins. Overall, the abundance of cold-shock DEAD-box protein A (CSDA), antifreeze proteins (AFPs), fatty acid desaturase (FAD), trehalose synthase (TS), and cold-shock family of proteins (CSPs) were present in all mat samples at high, moderate, or low levels, whereas the ice nucleation protein (INP) was present only in the ice and bulbous mat samples at insignificant levels. Considering the near homogeneous temperature profile of Lake Joyce (0.08-0.29 °C), the distribution and abundance of these proteins across various mat samples predictively correlated with known functional attributes necessary for microbial communities to thrive in this ecosystem. The comparison of the MG-RAST and the R code methods showed dissimilar occurrences of the cold-adaptation protein sequences, though with insignificant ANOSIM (R = 0.357; p-value = 0.012), ADONIS (R(2) = 0.274; p-value = 0.03) and STAMP (p-values = 0.521-0.984) statistical analyses. Furthermore, filtering targeted sequences using the R code accounted for taxonomic groups by avoiding sequence redundancies, whereas the MG-RAST provided total counts resulting in a higher sequence output. The results from this study revealed for the first time the distribution of cold-adaptation proteins in six different types of microbial mats in Lake Joyce, while suggesting a simpler and more manageable user-defined method of R code, as compared to a web-based MG-RAST pipeline.
This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR…
Heyer, Laurie J.
This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…
Torres, Angela; Nieto, Juan J.
The purpose of this paper is to present a general view of the current applications of fuzzy logic in medicine and bioinformatics. We particularly review the medical literature using fuzzy logic. We then recall the geometrical interpretation of fuzzy sets as points in a fuzzy hypercube and present two concrete illustrations in medicine (drug addictions) and in bioinformatics (comparison of genomes). PMID:16883057
Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang
As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related…
Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…
Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.
Exponential growth of biological data, mainly due to revolutionary developments in NGS technologies in past couple of years, created a multitude of challenges in downstream data analysis using bioinformatics approaches. To handle such tsunami of data, bioinformatics analysis must be carried out in an automated and parallel fashion. A successful analysis often requires more than a few computational steps and bootstrapping these individual steps (scripts) into components and the components into pipelines certainly makes bioinformatics a reproducible and manageable segment of scientific research. CloVR (http://clovr.org) is one such flexible framework that facilitates the abstraction of bioinformatics workflows into executable pipelines. CloVR comes packaged with various built-in bioinformatics pipelines that can make use of multicore processing power when run on servers and/or cloud. CloVR is amenable to build custom pipelines based on individual laboratory requirements. CloVR is available as a single executable virtual image file that comes bundled with pre-installed and pre-configured bioinformatics tools and packages and thus circumvents the cumbersome installation difficulties. CloVR is highly portable and can be run on traditional desktop/laptop computers, central servers and cloud compute farms. In conclusion, CloVR provides built-in automated analysis pipelines for microbial genomics with a scope to develop and integrate custom-workflows that make use of parallel processing power when run on compute clusters, there by addressing the bioinformatics challenges with NGS data.
Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather
Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083
Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad; Freyermuth, Sharyn K.; Bailey, Cheryl; Britton, Robert A.; Gordon, Stuart G.; Heinhorst, Sabine; Reed, Kelynne; Xu, Zhaohui; Sanders-Lorenz, Erin R.; Axen, Seth; Kim, Edwin; Johns, Mitrick; Scott, Kathleen; Kerfeld, Cheryl A.
courses or independent research projects requires infrastructure for organizing and assessing student work. Here, we present a new platform for faculty to keep current with the rapidly changing field of bioinformatics, the Integrated Microbial Genomes Annotation Collaboration Toolkit (IMG-ACT). It was developed by instructors from both research-intensive and predominately undergraduate institutions in collaboration with the Department of Energy-Joint Genome Institute (DOE-JGI) as a means to innovate and update undergraduate education and faculty development. The IMG-ACT program provides a cadre of tools, including access to a clearinghouse of genome sequences, bioinformatics databases, data storage, instructor course management, and student notebooks for organizing the results of their bioinformatic investigations. In the process, IMG-ACT makes it feasible to provide undergraduate research opportunities to a greater number and diversity of students, in contrast to the traditional mentor-to-student apprenticeship model for undergraduate research, which can be too expensive and time-consuming to provide for every undergraduate. The IMG-ACT serves as the hub for the network of faculty and students that use the system for microbial genome analysis. Open access of the IMG-ACT infrastructure to participating schools ensures that all types of higher education institutions can utilize it. With the infrastructure in place, faculty can focus their efforts on the pedagogy of bioinformatics, involvement of students in research, and use of this tool for their own research agenda. What the original faculty members of the IMG-ACT development team present here is an overview of how the IMG-ACT program has affected our development in terms of teaching and research with the hopes that it will inspire more faculty to get involved.
Silveira, Nelson JF; Varuzza, Leonardo; Machado-Lima, Ariane; Lauretto, Marcelo S; Pinheiro, Daniel G; Rodrigues, Rodrigo V; Severino, Patrícia; Nobrega, Francisco G; Silva, Wilson A; de B Pereira, Carlos A; Tajara, Eloiza H
Background Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The
The Bioinformatics Analysis of Comparative Genomics of Mycobacterium tuberculosis Complex (MTBC) Provides Insight into Dissimilarities between Intraspecific Groups Differing in Host Association, Virulence, and Epitope Diversity
Jia, Xinmiao; Yang, Li; Dong, Mengxing; Chen, Suting; Lv, Lingna; Cao, Dandan; Fu, Jing; Yang, Tingting; Zhang, Ju; Zhang, Xiangli; Shang, Yuanyuan; Wang, Guirong; Sheng, Yongjie; Huang, Hairong; Chen, Fei
Tuberculosis now exceeds HIV as the top infectious disease cause of mortality, and is caused by the Mycobacterium tuberculosis complex (MTBC). MTBC strains have highly conserved genome sequences (similarity >99%) but dramatically different phenotypes. To analyze the relationship between genotype and phenotype, we conducted the comparative genomic analysis on 12 MTBC strains representing different lineages (i.e., Mycobacterium bovis; M. bovis BCG; M. microti; M. africanum; M. tuberculosis H37Rv; M. tuberculosis H37Ra, and six M. tuberculosis clinical isolates). The analysis focused on the three aspects of pathogenicity: host association, virulence, and epitope variations. Host association analysis indicated that eight mce3 genes, two enoyl-CoA hydratases, and five PE/PPE family genes were present only in human isolates; these may have roles in host-pathogen interactions. There were 15 SNPs found on virulence factors (including five SNPs in three ESX secretion proteins) only in the Beijing strains, which might be related to their more virulent phenotype. A comparison between the virulent H37Rv and non-virulent H37Ra strains revealed three SNPs that were likely associated with the virulence attenuation of H37Ra: S219L (PhoP), A219E (MazG) and a newly identified I228M (EspK). Additionally, a comparison of animal-associated MTBC strains showed that the deletion of the first four genes (i.e., pe35, ppe68, esxB, esxA), rather than all eight genes of RD1, might play a central role in the virulence attenuation of animal isolates. Finally, by comparing epitopes among MTBC strains, we found that four epitopes were lost only in the Beijing strains; this may render them better capable of evading the human immune system, leading to enhanced virulence. Overall, our comparative genomic analysis of MTBC strains reveals the relationship between the highly conserved genotypes and the diverse phenotypes of MTBC, provides insight into pathogenic mechanisms, and facilitates the
Adebiyi, Ezekiel F.; Alzohairy, Ahmed M.; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J.; Panji, Sumir; Patterton, Hugh-G.
The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350
Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.
Qaadri, Kashef [Biomatters
Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Kashef Qaadri on "NGS for the Masses: Empowering biologists to improve bioinformatic productivity" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Wefer, Stephen H.
The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were
Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.
Microarray technology is being used widely in various biomedical research areas; the corresponding microarray data analysis is an essential step toward the best utilizing of array technologies. Here we review two components of the microarray data analysis: a low level of microarray data analysis that emphasizes the designing, the quality control, and the preprocessing of microarray experiments, then a high level of microarray data analysis that focuses on the domain-specific microarray applications such as tumor classification, biomarker prediction, analyzing array CGH experiments, and reverse engineering of gene expression networks. Additionally, we will review the recent development of building a predictive model in genome expression and regulation studies. This review may help biologists grasp a basic knowledge of microarray bioinformatics as well as its potential impact on the future evolvement of biomedical research fields.
Habachi-Houimli, Yosra; Khalfallah, Yosra; Makni, Hanem; Makni, Mohamed; Bouktila, Dhia
In the present study, we have screened 71, 713, 525, 119 and 241 mature miRNA variants from Hordeum vulgare, Oryza sativa, Brachypodium distachyon, Triticum aestivum, and Sorghum bicolor, respectively, and classified them with respect to their conservation status and expression levels. These Poaceae non-redundant miRNA species (1,669) were distributed over a total of 625 MIR families, among which only 54 were conserved across two or more plant species, confirming the relatively recent evolutionary differentiation of miRNAs in grasses. On the other hand, we have used 257 H. vulgare, 286T. aestivum, 119 B. distachyon, 269 O. sativa, and 139 S. bicolor NBS domains, which were either mined directly from the annotated proteomes, or predicted from whole genome sequence assemblies. The hybridization potential between miRNAs and their putative NBS genes targets was analyzed, revealing that at least 454 NBS genes from all five Poaceae were potentially regulated by 265 distinct miRNA species, most of them expressed in leaves and predominantly co-expressed in additional tissues. Based on gene ontology, we could assign these probable miRNA target genes to 16 functional groups, among which three conferring resistance to bacteria (Rpm1, Xa1 and Rps2), and 13 groups of resistance to fungi (Rpp8,13, Rp3, Tsn1, Lr10, Rps1-k-1, Pm3, Rpg5, and MLA1,6,10,12,13). The results of the present analysis provide a large-scale platform for a better understanding of biological control strategies of disease resistance genes in Poaceae, and will serve as an important starting point for enhancing crop disease resistance improvement by means of transgenic lines with artificial miRNAs.
Shirvani-Mahdavi, Hamidreza; Shafiee, Parisa
Matrix mismatching in the quantitative analysis of materials through calibration-based laser-induced breakdown spectroscopy (LIBS) is a serious problem. In this paper, to overcome the matrix mismatching, two distinct approaches named addition standardization (AS) and addition-internal combinatorial standardization (A-ICS) are demonstrated for LIBS experiments. Furthermore, in order to examine the efficiency of these methods, the concentration of calcium in ordinary garden soil without any fertilizer is individually measured by each of the two procedures. To achieve this purpose, ten standard samples with different concentrations of calcium (as the analyte) and copper (as the internal standard) are prepared in the form of cylindrical tablets, so that the soil plays the role of the matrix in all of them. The measurements indicate that the relative error of concentration compared to a certified value derived by induced coupled plasma optical emission spectroscopy is 3.97% and 2.23% for AS and A-ICS methods, respectively. Furthermore, calculations related to standard deviation indicates that A-ICS method may be more accurate than AS one.
Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong
In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.
Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T
Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.
Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T
Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.
Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.
Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772
Ewels, Philip; Krueger, Felix; Käller, Max; Andrews, Simon
Pipeline tools are becoming increasingly important within the field of bioinformatics. Using a pipeline manager to manage and run workflows comprised of multiple tools reduces workload and makes analysis results more reproducible. Existing tools require significant work to install and get running, typically needing pipeline scripts to be written from scratch before running any analysis. We present Cluster Flow, a simple and flexible bioinformatics pipeline tool designed to be quick and easy to install. Cluster Flow comes with 40 modules for common NGS processing steps, ready to work out of the box. Pipelines are assembled using these modules with a simple syntax that can be easily modified as required. Core helper functions automate many common NGS procedures, making running pipelines simple. Cluster Flow is available with an GNU GPLv3 license on GitHub. Documentation, examples and an online demo are available at http://clusterflow.io.
Ewels, Philip; Krueger, Felix; Käller, Max; Andrews, Simon
Pipeline tools are becoming increasingly important within the field of bioinformatics. Using a pipeline manager to manage and run workflows comprised of multiple tools reduces workload and makes analysis results more reproducible. Existing tools require significant work to install and get running, typically needing pipeline scripts to be written from scratch before running any analysis. We present Cluster Flow, a simple and flexible bioinformatics pipeline tool designed to be quick and easy to install. Cluster Flow comes with 40 modules for common NGS processing steps, ready to work out of the box. Pipelines are assembled using these modules with a simple syntax that can be easily modified as required. Core helper functions automate many common NGS procedures, making running pipelines simple. Cluster Flow is available with an GNU GPLv3 license on GitHub. Documentation, examples and an online demo are available at http://clusterflow.io.
Helsens, Kenny; Van Damme, Petra; Degroeve, Sven; Martens, Lennart; Arnesen, Thomas; Vandekerckhove, Joël; Gevaert, Kris
Initiation of protein translation is a well-studied fundamental process, albeit high-throughput and more comprehensive determination of the exact translation initiation sites (TIS) was only recently made possible following the introduction of positional proteomics techniques that target protein N-termini. Precise translation initiation is of crucial importance, as truncated or extended proteins might fold, function, and locate erroneously. Still, as already shown for some proteins, alternative translation initiation can also serve as a regulatory mechanism. By applying N-terminal COFRADIC (combined fractional diagonal chromatography), we here isolated N-terminal peptides of a Saccharomyces cerevisiae proteome and analyzed both annotated and alternative TIS. We analyzed this N-terminome of S. cerevisiae which resulted in the identification of 650 unique N-terminal peptides corresponding to database annotated TIS. Furthermore, 56 unique N(α)-acetylated peptides were identified that suggest alternative TIS (MS/MS-based), while MS-based evidence of N(α)-acetylation led to an additional 33 such peptides. To improve the overall sensitivity of the analysis, we also included the 5' UTR (untranslated region) in-frame translations together with the yeast protein sequences in UniProtKB/Swiss-Prot. To ensure the quality of the individual peptide identifications, peptide-to-spectrum matches were only accepted at a 99% probability threshold and were subsequently analyzed in detail by the Peptizer tool to automatically ascertain their compliance with several expert criteria. Furthermore, we have also identified 60 MS/MS-based and 117 MS-based N(α)-acetylated peptides that point to N(α)-acetylation as a post-translational modification since these peptides did not start nor were preceded (in their corresponding protein sequence) by a methionine residue. Next, we evaluated consensus sequence features of nucleic acids and amino acids across each of these groups of peptides and
The BITS2011 meeting, held in Pisa on June 20-22, 2011, brought together more than 120 Italian researchers working in the field of Bioinformatics, as well as students in Bioinformatics, Computational Biology, Biology, Computer Sciences, and Engineering, representing a landscape of Italian bioinformatics research. This preface provides a brief overview of the meeting and introduces the peer-reviewed manuscripts that were accepted for publication in this Supplement. PMID:22536954
Coassin, Stefan; Brandstätter, Anita; Kronenberg, Florian
Genome-wide association studies (GWASs) led to impressive advances in the elucidation of genetic factors underlying complex phenotypes and diseases. However, the ability of GWAS to identify new susceptibility loci in a hypothesis-free approach requires tools to quickly retrieve comprehensive information about a genomic region and analyze the potential effects of coding and non-coding SNPs in a candidate gene region. Furthermore, once a candidate region is chosen for resequencing and fine-mapping studies, the identification of several rare mutations is likely and requires strong bioinformatic support to properly evaluate and prioritize the found mutations for further analysis. Due to the variety of regulatory layers that can be affected by a mutation, a comprehensive in-silico evaluation of candidate SNPs can be a demanding and very time-consuming task. Although many bioinformatic tools that significantly simplify this task were made available in the last years, their utility is often still unknown to researches not intensively involved in bioinformatics. We present a comprehensive guide of 64 tools and databases to bioinformatically analyze gene regions of interest to predict SNP effects. In addition, we discuss tools to perform data mining of large genetic regions, predict the presence of regulatory elements, make in-silico evaluations of SNPs effects and address issues ranging from interactome analysis to graphically annotated proteins sequences. Finally, we exemplify the use of these tools by applying them to hits of a recently performed GWAS. Taken together a combination of the discussed tools are summarized and constantly updated in the web-based "GenEpi Toolbox" (http://genepi_toolbox.i-med.ac.at) and can help to get a glimpse at the potential functional relevance of both large genetic regions and single nucleotide mutations which might help to prioritize the next steps.
Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele
The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the "glue" for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge.
Merelli, Ivan; Pérez-Sánchez, Horacio; Gesing, Sandra; D'Agostino, Daniele
The explosion of the data both in the biomedical research and in the healthcare systems demands urgent solutions. In particular, the research in omics sciences is moving from a hypothesis-driven to a data-driven approach. Healthcare is additionally always asking for a tighter integration with biomedical data in order to promote personalized medicine and to provide better treatments. Efficient analysis and interpretation of Big Data opens new avenues to explore molecular biology, new questions to ask about physiological and pathological states, and new ways to answer these open issues. Such analyses lead to better understanding of diseases and development of better and personalized diagnostics and therapeutics. However, such progresses are directly related to the availability of new solutions to deal with this huge amount of information. New paradigms are needed to store and access data, for its annotation and integration and finally for inferring knowledge and making it available to researchers. Bioinformatics can be viewed as the “glue” for all these processes. A clear awareness of present high performance computing (HPC) solutions in bioinformatics, Big Data analysis paradigms for computational biology, and the issues that are still open in the biomedical and healthcare fields represent the starting point to win this challenge. PMID:25254202
Zhang, Yun; Aevermann, Brian D.; Anderson, Tavis K.; Burke, David F.; Dauphin, Gwenaelle; Gu, Zhiping; He, Sherry; Kumar, Sanjeev; Larsen, Christopher N.; Lee, Alexandra J.; Li, Xiaomei; Macken, Catherine; Mahaffey, Colin; Pickett, Brett E.; Reardon, Brian; Smith, Thomas; Stewart, Lucy; Suloway, Christian; Sun, Guangyu; Tong, Lei; Vincent, Amy L.; Walters, Bryan; Zaremba, Sam; Zhao, Hongtao; Zhou, Liwei; Zmasek, Christian; Klem, Edward B.; Scheuermann, Richard H.
The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics and therapeutics against influenza virus by providing a comprehensive collection of influenza-related data integrated from various sources, a growing suite of analysis and visualization tools for data mining and hypothesis generation, personal workbench spaces for data storage and sharing, and active user community support. Here, we describe the recent improvements in IRD including the use of cloud and high performance computing resources, analysis and visualization of user-provided sequence data with associated metadata, predictions of novel variant proteins, annotations of phenotype-associated sequence markers and their predicted phenotypic effects, hemagglutinin (HA) clade classifications, an automated tool for HA subtype numbering conversion, linkouts to disease event data and the addition of host factor and antiviral drug components. All data and tools are freely available without restriction from the IRD website at https://www.fludb.org. PMID:27679478
Dey, Sumanta; Nandy, Ashesh; Basak, Subhash C; Nandy, Papiya; Das, Sukhen
The Zika virus infections have reached epidemic proportions in the Latin American countries causing severe birth defects and neurological disorders. While several organizations have begun research into design of prophylactic vaccines and therapeutic drugs, computer assisted methods with adequate data resources can be expected to assist in these measures to reduce lead times through bioinformatics approaches. Using 60 sequences of the Zika virus envelope protein available in the GenBank database, our analysis with numerical characterization techniques and several web based bioinformatics servers identified four peptide stretches on the Zika virus envelope protein that are well conserved and surface exposed and are predicted to have reasonable epitope binding efficiency. These peptides can be expected to form the basis for a nascent peptide vaccine which, enhanced by incorporation of suitable adjuvants, can elicit immune response against the Zika virus infections.
Diniz, W J S; Canduri, F
Technological advancements in recent years have promoted a marked progress in understanding the genetic basis of phenotypes. In line with these advances, genomics has changed the paradigm of biological questions in full genome-wide scale (genome-wide), revealing an explosion of data and opening up many possibilities. On the other hand, the vast amount of information that has been generated points the challenges that must be overcome for storage (Moore's law) and processing of biological information. In this context, bioinformatics and computational biology have sought to overcome such challenges. This review presents an overview of bioinformatics and its use in the analysis of biological data, exploring approaches, emerging methodologies, and tools that can give biological meaning to the data generated.
Jameson, Daniel; Garwood, Kevin; Garwood, Chris; Booth, Tim; Alper, Pinar; Oliver, Stephen G; Paton, Norman W
Background The systematic capture of appropriately annotated experimental data is a prerequisite for most bioinformatics analyses. Data capture is required not only for submission of data to public repositories, but also to underpin integrated analysis, archiving, and sharing – both within laboratories and in collaborative projects. The widespread requirement to capture data means that data capture and annotation are taking place at many sites, but the small scale of the literature on tools, techniques and experiences suggests that there is work to be done to identify good practice and reduce duplication of effort. Results This paper reports on experience gained in the deployment of the Pedro data capture tool in a range of representative bioinformatics applications. The paper makes explicit the requirements that have recurred when capturing data in different contexts, indicates how these requirements are addressed in Pedro, and describes case studies that illustrate where the requirements have arisen in practice. Conclusion Data capture is a fundamental activity for bioinformatics; all biological data resources build on some form of data capture activity, and many require a blend of import, analysis and annotation. Recurring requirements in data capture suggest that model-driven architectures can be used to construct data capture infrastructures that can be rapidly configured to meet the needs of individual use cases. We have described how one such model-driven infrastructure, namely Pedro, has been deployed in representative case studies, and discussed the extent to which the model-driven approach has been effective in practice. PMID:18402673
Likić, Vladimir A.; McConville, Malcolm J.; Lithgow, Trevor; Bacic, Antony
Biochemical systems biology augments more traditional disciplines, such as genomics, biochemistry and molecular biology, by championing (i) mathematical and computational modeling; (ii) the application of traditional engineering practices in the analysis of biochemical systems; and in the past decade increasingly (iii) the use of near-comprehensive data sets derived from ‘omics platform technologies, in particular “downstream” technologies relative to genome sequencing, including transcriptomics, proteomics and metabolomics. The future progress in understanding biological principles will increasingly depend on the development of temporal and spatial analytical techniques that will provide high-resolution data for systems analyses. To date, particularly successful were strategies involving (a) quantitative measurements of cellular components at the mRNA, protein and metabolite levels, as well as in vivo metabolic reaction rates, (b) development of mathematical models that integrate biochemical knowledge with the information generated by high-throughput experiments, and (c) applications to microbial organisms. The inevitable role bioinformatics plays in modern systems biology puts mathematical and computational sciences as an equal partner to analytical and experimental biology. Furthermore, mathematical and computational models are expected to become increasingly prevalent representations of our knowledge about specific biochemical systems. PMID:21331364
Machluf, Yossy; Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat
Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled 'Bioinformatics in the Service of Biotechnology'. Students' learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students' difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students' cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students' scientific 'toolbox'. For students, questions stemming from the 'old world' biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers' prediction. Analysis of students' affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher's role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum.
Accardi, Luigi; Freudenberg, Wolfgang; Ohya, Masanori
The QP-DYN algorithms / L. Accardi, M. Regoli and M. Ohya -- Study of transcriptional regulatory network based on Cis module database / S. Akasaka ... [et al.] -- On Lie group-Lie algebra correspondences of unitary groups in finite von Neumann algebras / H. Ando, I. Ojima and Y. Matsuzawa -- On a general form of time operators of a Hamiltonian with purely discrete spectrum / A. Arai -- Quantum uncertainty and decision-making in game theory / M. Asano ... [et al.] -- New types of quantum entropies and additive information capacities / V. P. Belavkin -- Non-Markovian dynamics of quantum systems / D. Chruscinski and A. Kossakowski -- Self-collapses of quantum systems and brain activities / K.-H. Fichtner ... [et al.] -- Statistical analysis of random number generators / L. Accardi and M. Gabler -- Entangled effects of two consecutive pairs in residues and its use in alignment / T. Ham, K. Sato and M. Ohya -- The passage from digital to analogue in white noise analysis and applications / T. Hida -- Remarks on the degree of entanglement / D. Chruscinski ... [et al.] -- A completely discrete particle model derived from a stochastic partial differential equation by point systems / K.-H. Fichtner, K. Inoue and M. Ohya -- On quantum algorithm for exptime problem / S. Iriyama and M. Ohya -- On sufficient algebraic conditions for identification of quantum states / A. Jamiolkowski -- Concurrence and its estimations by entanglement witnesses / J. Jurkowski -- Classical wave model of quantum-like processing in brain / A. Khrennikov -- Entanglement mapping vs. quantum conditional probability operator / D. Chruscinski ... [et al.] -- Constructing multipartite entanglement witnesses / M. Michalski -- On Kadison-Schwarz property of quantum quadratic operators on M[symbol](C) / F. Mukhamedov and A. Abduganiev -- On phase transitions in quantum Markov chains on Cayley Tree / L. Accardi, F. Mukhamedov and M. Saburov -- Space(-time) emergence as symmetry breaking effect / I. Ojima
Zexian, Liu; Yudong, Cai; Xuejiang, Guo; Ao, Li; Tingting, Li; Jianding, Qiu; Jian, Ren; Shaoping, Shi; Jiangning, Song; Minghui, Wang; Lu, Xie; Yu, Xue; Ziding, Zhang; Xingming, Zhao
Post-translational modifications (PTMs) are essential for regulating conformational changes, activities and functions of proteins, and are involved in almost all cellular pathways and processes. Identification of protein PTMs is the basis for understanding cellular and molecular mechanisms. In contrast with labor-intensive and time-consuming experiments, the PTM prediction using various bioinformatics approaches can provide accurate, convenient, and efficient strategies and generate valuable information for further experimental consideration. In this review, we summarize the current progresses made by Chineses bioinformaticians in the field of PTM Bioinformatics, including the design and improvement of computational algorithms for predicting PTM substrates and sites, design and maintenance of online and offline tools, establishment of PTM-related databases and resources, and bioinformatics analysis of PTM proteomics data. Through comparing similar studies in China and other countries, we demonstrate both advantages and limitations of current PTM bioinformatics as well as perspectives for future studies in China.
de Jong, Simone; van Eijk, Kristel R; Zeegers, Dave W L H; Strengman, Eric; Janson, Esther; Veldink, Jan H; van den Berg, Leonard H; Cahn, Wiepke; Kahn, René S; Boks, Marco P M; Ophoff, Roel A
There is genetic evidence that schizophrenia is a polygenic disorder with a large number of loci of small effect on disease susceptibility. Genome-wide association studies (GWASs) of schizophrenia have had limited success, with the best finding at the MHC locus at chromosome 6p. A recent effort of the Psychiatric GWAS consortium (PGC) yielded five novel loci for schizophrenia. In this study, we aim to highlight additional schizophrenia susceptibility loci from the PGC study by combining the top association findings from the discovery stage (9394 schizophrenia cases and 12 462 controls) with expression QTLs (eQTLs) and differential gene expression in whole blood of schizophrenia patients and controls. We examined the 6192 single-nucleotide polymorphisms (SNPs) with significance threshold at P<0.001. eQTLs were calculated for these SNPs in a sample of healthy controls (n=437). The transcripts significantly regulated by the top SNPs from the GWAS meta-analysis were subsequently tested for differential expression in an independent set of schizophrenia cases and controls (n=202). After correction for multiple testing, the eQTL analysis yielded 40 significant cis-acting effects of the SNPs. Seven of these transcripts show differential expression between cases and controls. Of these, the effect of three genes (RNF5, TRIM26 and HLA-DRB3) coincided with the direction expected from meta-analysis findings and were all located within the MHC region. Our results identify new genes of interest and highlight again the involvement of the MHC region in schizophrenia susceptibility.
Li, Yuan; Luo, Mei; Shi, Xuejiao; Lu, Zhiliang; Sun, Shouguo; Huang, Jianbing; Chen, Zhaoli; He, Jie
Enhancer of zeste homolog 2 (EZH2), a dynamic chromatin regulator in cancer, represents a potential therapeutic target showing early signs of promise in clinical trials. EZH2 ChIP sequencing data in 19 cell lines and RNA sequencing data in ten cancer types were downloaded from GEO and TCGA, respectively. Integrated ChIP sequencing analysis and co-expressing analysis were conducted and both mRNA and long noncoding RNA (lncRNA) targets were detected. We detected a median of 4,672 mRNA targets and 4,024 lncRNA targets regulated by EZH2 in 19 cell lines. 20 mRNA targets and 27 lncRNA targets were found in all 19 cell lines. These mRNA targets were enriched in pathways in cancer, Hippo, Wnt, MAPK and PI3K-Akt pathways. Co-expression analysis confirmed numerous targets, mRNA genes (RRAS, TGFBR2, NUF2 and PRC1) and lncRNA genes (lncRNA LINC00261, DIO3OS, RP11-307C12.11 and RP11-98D18.9) were potential targets and were significantly correlated with EZH2. We predicted genome-wide potential targets and the role of EZH2 in regulating as a transcriptional suppressor or activator which could pave the way for mechanism studies and the targeted therapy of EZH2 in cancer. PMID:27835578
Mora, Antonio; Sandve, Geir Kjetil; Gabrielsen, Odd Stokke
Enhancer–promoter regulation is a fundamental mechanism underlying differential transcriptional regulation. Spatial chromatin organization brings remote enhancers in contact with target promoters in cis to regulate gene expression. There is considerable evidence for promoter–enhancer interactions (PEIs). In the recent years, genome-wide analyses have identified signatures and mapped novel enhancers; however, being able to precisely identify their target gene(s) requires massive biological and bioinformatics efforts. In this review, we give a short overview of the chromatin landscape and transcriptional regulation. We discuss some key concepts and problems related to chromatin interaction detection technologies, and emerging knowledge from genome-wide chromatin interaction data sets. Then, we critically review different types of bioinformatics analysis methods and tools related to representation and visualization of PEI data, raw data processing and PEI prediction. Lastly, we provide specific examples of how PEIs have been used to elucidate a functional role of non-coding single-nucleotide polymorphisms. The topic is at the forefront of epigenetic research, and by highlighting some future bioinformatics challenges in the field, this review provides a comprehensive background for future PEI studies. PMID:26586731
Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D.; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L.; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C.; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists. PMID:23803301
Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart
Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences. PMID:24162173
Naulaerts, Stefan; Meysman, Pieter; Bittremieux, Wout; Vu, Trung Nghia; Vanden Berghe, Wim; Goethals, Bart; Laukens, Kris
Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences.
Via, Allegra; Blicher, Thomas; Bongcam-Rudloff, Erik; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; Fernandes, Pedro L; van Gelder, Celia; Jacob, Joachim; Jimenez, Rafael C; Loveland, Jane; Moran, Federico; Mulder, Nicola; Nyrönen, Tommi; Rother, Kristian; Schneider, Maria Victoria; Attwood, Teresa K
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
Atwood, Teresa K.; Bongcam-Rudloff, Erik; Brazas, Michelle E.; Corpas, Manuel; Gaudet, Pascale; Lewitter, Fran; Mulder, Nicola; Palagi, Patricia M.; Schneider, Maria Victoria; van Gelder, Celia W. G.
In recent years, high-throughput technologies have brought big data to the life sciences. The march of progress has been rapid, leaving in its wake a demand for courses in data analysis, data stewardship, computing fundamentals, etc., a need that universities have not yet been able to satisfy—paradoxically, many are actually closing “niche” bioinformatics courses at a time of critical need. The impact of this is being felt across continents, as many students and early-stage researchers are being left without appropriate skills to manage, analyse, and interpret their data with confidence. This situation has galvanised a group of scientists to address the problems on an international scale. For the first time, bioinformatics educators and trainers across the globe have come together to address common needs, rising above institutional and international boundaries to cooperate in sharing bioinformatics training expertise, experience, and resources, aiming to put ad hoc training practices on a more professional footing for the benefit of all. PMID:25856076
Sablok, Gaurav; Luo, Chun; Lee, Wan Sin; Rahman, Farzana; Tatarinova, Tatiana V; Harikrishna, Jennifer Ann; Luo, Zhengrong
We present here a systematic analysis of the Diospyros kaki expressed sequence tags (ESTs) generated from development stage-specific libraries. A total of 2,529 putative tentative unigenes were identified in the MF library whereas the OYF library displayed 3,775 tentative unigenes. Among the two cDNA libraries, 325 EST-Simple sequence repeats (SSRs) in 296 putative unigenes were detected in the MF library showing an occurrence of 11.7% with a frequency of 1 SSR/3.16 kb whereas the OYF library had an EST-SSRs occurrence of 10.8% with 407 EST-SSRs in the 352 putative unigenes with a frequency of 1 SSR/2.92 kb. We observed a higher frequency of SNPs and indels in the OYF library (20.94 SNPs/indels per 100 bp) in comparison to MF library showed a relatively lower frequency (0.74 SNPs/indels per 100 bp). A combined homology and secondary structure analysis approach identified a potential miRNA precursor, an ortholog of miR159, and potential miR159 targets, in the development-specific ESTs of D. kaki. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s13205-011-0005-9) contains supplementary material, which is available to authorized users.
Tell, Robert W; Horvath, Curt M
Signal transducer and activator of transcription 3 (STAT3), a latent transcription factor associated with inflammatory signaling and innate and adaptive immune responses, is known to be aberrantly activated in a wide variety of cancers. In vitro analysis of STAT3 in human cancer cell lines has elucidated a number of specific targets associated with poor prognosis in breast cancer. However, to date, no comparison of cancer subtype and gene expression associated with STAT3 signaling in human patients has been reported. In silico analysis of human breast cancer microarray and reverse-phase protein array data was performed to identify expression patterns associated with STAT3 in basal-like and luminal breast cancers. Results indicate clearly identifiable STAT3-regulated signatures common to basal-like breast cancers but not to luminal A or luminal B cancers. Furthermore, these differentially expressed genes are associated with immune signaling and inflammation, a known phenotype of basal-like cancers. These findings demonstrate a distinct role for STAT3 signaling in basal breast cancers, and underscore the importance of considering subtype-specific molecular pathways that contribute to tissue-specific cancers.
In bioinformatics studies, supervised classification with high-dimensional input variables is frequently encountered. Examples routinely arise in genomic, epigenetic and proteomic studies. Feature selection can be employed along with classifier construction to avoid over-fitting, to generate more reliable classifier and to provide more insights into the underlying causal relationships. In this article, we provide a review of several recently developed penalized feature selection and classification techniques—which belong to the family of embedded feature selection methods—for bioinformatics studies with high-dimensional input. Classification objective functions, penalty functions and computational algorithms are discussed. Our goal is to make interested researchers aware of these feature selection and classification methods that are applicable to high-dimensional bioinformatics data. PMID:18562478
Phosphorus additions and measurement in soil is of concern on lands where biosolids have been applied. Colorimetric analysis for plant-available P may be inadequate for the accurate assessment of soil P. Phosphate additions in a regulatory environment need to be accurately assessed as the reported...
Degl'Innocenti, Debora; Alberti, Chiara; Castellano, Giancarlo; Greco, Angela; Miranda, Claudia; Pierotti, Marco A.; Seregni, Ettore; Borrello, Maria Grazia; Canevari, Silvana; Tomassetti, Antonella
Background Papillary thyroid carcinoma (PTCs), the most frequent thyroid cancer, is usually not life threatening, but may recur or progress to aggressive forms resistant to conventional therapies. A more detailed understanding of the signaling pathways activated in PTCs may help to identify novel therapeutic approaches against these tumors. The aim of this study is to identify signaling pathways activated in PTCs. Methodology/Principal Findings We examined coordinated gene expression patterns of ligand/receptor (L/R) pairs using the L/R database DRLP-rev1 and five publicly available thyroid cancer datasets of gene expression on a total of 41 paired PTC/normal thyroid tissues. We identified 26 (up) and 13 (down) L/R pairs coordinately and differentially expressed. The relevance of these L/R pairs was confirmed by performing the same analysis on REarranged during Transfection (RET)/PTC1-infected thyrocytes with respect to normal thyrocytes. TGFA/EGFR emerged as one of the most tightly regulated L/R pair. Furthermore, PTC clinical samples analyzed by real-time RT-PCR expressed EGFR transcript levels similar to those of 5 normal thyroid tissues from patients with pathologies other than thyroid cancer, whereas significantly elevated levels of TGFA transcripts were only present in PTCs. Biochemical analysis of PTC cell lines demonstrated the presence of EGFR on the cell membrane and TGFA in conditioned media. Moreover, conditioned medium of the PTC cell line NIM-1 activated EGFR expressed on HeLa cells, culminating in both ERK and AKT phosphorylation. In NIM-1 cells harboring BRAF mutation, TGFA stimulated proliferation, contributing to PI3K/AKT activation independent of MEK/ERK signaling. Conclusions/Significance We compiled a reliable list of L/R pairs associated with PTC and validated the biological role of one of the emerged L/R pair, the TGFA/EGFR, in this cancer, in vitro. These data provide a better understanding of the factors involved in the biology of PTCs and
Fierro, Francisco; García-Estrada, Carlos; Castillo, Nancy I; Rodríguez, Raquel; Velasco-Conde, Tania; Martín, Juan-Francisco
High penicillin-producing strains of Penicillium chrysogenum contain 6-14 copies of the three clustered structural biosynthetic genes, pcbAB, pcbC, and penDE [Barredo, J.L., Díez, B., Alvarez, E., Martín, J.F., 1989. Large amplification of a 35-kb DNA fragment carrying two penicillin biosynthetic genes in high penicillin producing strains of Penicillium chrysogenum. Curr. Genet. 16, 453-459; Smith, D.J., Bull, J.H., Edwards, J., Turner, G., 1989. Amplification of the isopenicillin N synthetase gene in a strain of Penicillium chrysogenum producing high levels of penicillin. Mol. Gen. Genet. 216, 492-497.] . The cluster is located in a 56.8 kb DNA region bounded by a conserved TGTAAA/T hexanucleotide that undergoes amplification in tandem repeats [Fierro, F., Barredo, J.L., Díez, B., Gutiérrez, S., Fernández, F.J., Martín, J.F., 1995. The penicillin gene cluster is amplified in tandem repeats linked by conserved hexanucleotide sequences. Proc. Natl. Acad. Sci. USA 92, 6200-6204; Newbert, R.W., Barton, B., Greaves, P., Harper, J., Turner, G., 1997. Analysis of a commercially improved Penicillium chrysogenum strain series: involvement of recombinogenic regions in amplification and deletion of the penicillin biosynthesis gene cluster. J. Ind. Microbiol. Biotechnol. 19, 18-27]. Transcriptional analysis of this amplified region (AR) revealed the presence of at least eight transcripts expressed in penicillin producing conditions. Three of them correspond to the known penicillin biosynthetic genes, pcbAB, pcbC, and penDE. To locate genes related to penicillin precursor formation, or penicillin transport and regulation we have sequenced and analyzed the 56.8 kb amplified region of P. chrysogenum AS-P-78, finding a total of 16 open reading frames. Two of these ORFs have orthologues of known function in the databases. Other ORFs showed similarities to specific domains occurring in different proteins and superfamilies which allowed to infer their probable function. ORF11
Kido, Toshimi; Kurata, Hideaki; Kondo, Kazuo; Itakura, Hiroshige; Okazaki, Mitsuyo; Urata, Takeyoshi; Yokoyama, Shinji
Plasma concentration of apoA-I, apoA-II and apoA-II-unassociated apoA-I was analyzed in 314 Japanese subjects (177 males and 137 females), including one (male) homozygote and 37 (20 males and 17 females) heterozygotes of genetic CETP deficiency. ApoA-I unassociated with apoA-II markedly and linearly increased with HDL-cholesterol, while apoA-II increased only very slightly and the ratio of apoA-II-associated apoA-I to apoA-II stayed constant at 2 in molar ratio throughout the increase of HDL-cholesterol, among the wild type and heterozygous CETP deficiency. Thus, overall HDL concentration almost exclusively depends on HDL with apoA-I without apoA-II (LpAI) while concentration of HDL containing apoA-I and apoA-II (LpAI:AII) is constant having a fixed molar ratio of 2 : 1 regardless of total HDL and apoA-I concentration. Distribution of apoA-I between LpAI and LpAI:AII is consistent with a model of statistical partitioning regardless of sex and CETP genotype. The analysis also indicated that LpA-I accommodates on average 4 apoA-I molecules and has a clearance rate indistinguishable from LpAI:AII. Independent evidence indicated LpAI:A-II has a diameter 20% smaller than LpAI, consistent with a model having two apoA-I and one apoA-II. The functional contribution of these particles is to be investigated. PMID:27526664
Chruszcz, Maksymilian; Ciardiello, Maria Antonietta; Osinski, Tomasz; Majorek, Karolina A; Giangrieco, Ivana; Font, Jose; Breiteneder, Heimo; Thalassinos, Konstantinos; Minor, Wladek
The allergen Act d 11, also known as kirola, is a 17 kDa protein expressed in large amounts in ripe green and yellow-fleshed kiwifruit. Ten percent of all kiwifruit-allergic individuals produce IgE specific for the protein. Using X-ray crystallography, we determined the first three-dimensional structures of Act d 11, produced from both recombinant expression in Escherichia coli and from the natural source (kiwifruit). While Act d 11 is immunologically correlated with the birch pollen allergen Bet v 1 and other members of the pathogenesis-related protein family 10 (PR-10), it has low sequence similarity to PR-10 proteins. By sequence Act d 11 appears instead to belong to the major latex/ripening-related (MLP/RRP) family, but analysis of the crystal structures shows that Act d 11 has a fold very similar to that of Bet v 1 and other PR-10 related allergens regardless of the low sequence identity. The structures of both the natural and recombinant protein include an unidentified ligand, which is relatively small (about 250 Da by mass spectrometry experiments) and most likely contains an aromatic ring. The ligand-binding cavity in Act d 11 is also significantly smaller than those in PR-10 proteins. The binding of the ligand, which we were not able to unambiguously identify, results in conformational changes in the protein that may have physiological and immunological implications. Interestingly, the residue corresponding to Glu45 in Bet v 1 (Glu46), which is important for IgE binding to the birch pollen allergen, is conserved in Act d 11, even though it is not in other allergens with significantly higher sequence identity to Bet v 1. We suggest that the so-called Gly-rich loop (or P-loop), which is conserved in all PR-10 allergens, may be responsible for IgE cross-reactivity between Bet v 1 and Act d 11.
Zhang, Tao; Wei, Dongqing
Cytochrome P450 is predominantly responsible for human drug metabolism, which is of critical importance for drug discovery and development. Structural bioinformatics focuses on analysis and prediction of three-dimentional structure of biological macromolecules and elucidation of structure-function relationship as well as identification of important binding interactions. Rapid advancement of structural bioinformatics has been made over the last decade. With more information available for CYP structures, the methods of structural bioinformatics may be used in the CYP field. In this review, we demonstrate three previous studies on CYP using the methods of structural bioinformatics, including the investigation of reasons for decrease of enzymatic activity of CYP1A2 caused by a peripheral mutation, the construction of a pharmacophore model specific to active site of CYP1A2 and the prediction of the functional consequences of single residue mutation in CYP. By illustrating these studies we attempt to show the potential role of structural bioinformatics in CYP research and help better understanding the importance of structural bioinformatics in drug designing.
Hérisson, Joan; Ferey, Nicolas; Gros, Pierre-Emmanuel; Gherbi, Rachid
Most of biologists work on textual DNA sequences that are limited to the linear representation of DNA. In this paper, we address the potential offered by Virtual Reality for 3D modeling and immersive visualization of large genomic sequences. The representation of the 3D structure of naked DNA allows biologists to observe and analyze genomes in an interactive way at different levels. We developed a powerful software platform that provides a new point of view for sequences analysis: ADNViewer. Nevertheless, a classical eukaryotic chromosome of 40 million base pairs requires about 6 Gbytes of 3D data. In order to manage these huge amounts of data in real-time, we designed various scene management algorithms and immersive human-computer interaction for user-friendly data exploration. In addition, one bioinformatics study scenario is proposed.
Ondrej, Vladan; Dvorak, Petr
Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…
Kelley, Scott; Alger, Christianna; Deutschman, Douglas
The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP).…
Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...
Boyle, John A.
Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…
Belmann, Peter; Dröge, Johannes; Bremges, Andreas; McHardy, Alice C; Sczyrba, Alexander; Barton, Michael D
Software is now both central and essential to modern biology, yet lack of availability, difficult installations, and complex user interfaces make software hard to obtain and use. Containerisation, as exemplified by the Docker platform, has the potential to solve the problems associated with sharing software. We propose bioboxes: containers with standardised interfaces to make bioinformatics software interchangeable.
In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…
Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...
Bansal, Arvind K
The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i) genomics--sequencing and comparative study of genomes to identify gene and genome functionality, (ii) proteomics--identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii) cell visualization and simulation to study and model cell behavior, and (iv) application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1) analysis based upon the available experimental wet-lab data, (2) the use of mathematical modeling to derive new information, and (3) an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene expression analysis to derive
Chen, Minjun; Martin, Jackson; Fang, Hong; Isukapalli, Sastry; Georgopoulos, Panos G; Welsh, William J; Tong, Weida
ebTrack is being developed as an integrated bioinformatics system for environmental research and analysis by addressing the issues of integration, curation, management, first level analysis and interpretation of environmental and toxicological data from diverse sources. It is based on enhancements to the US FDA developed ArrayTrack™ system through additional analysis modules for gene expression data as well as through incorporation and linkages to modules for analysis of proteomic and metabonomic datasets that include tandem mass spectra. ebTrack uses a client-server architecture with the free and open source PostgreSQL as its database engine, and java tools for user interface, analysis, visualization, and web-based deployment. Several predictive tools that are critical for environmental health research are currently supported in ebTrack, including Significance Analysis of Microarray (SAM). Furthermore, new tools are under continuous integration, and interfaces to environmental health risk analysis tools are being developed in order to make ebTrack widely usable. These health risk analysis tools include the Modeling ENvironment for TOtal Risk studies (MENTOR) for source-to-dose exposure modeling and the DOse Response Information ANalysis system (DORIAN) for health outcome modeling. The design of ebTrack is presented in detail and steps involved in its application are summarized through an illustrative application. PMID:19278561
Ouellette, B. F. Francis
With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468
Pillai, S.; Silventoinen, V.; Kallio, K.; Senger, M.; Sobhany, S.; Tate, J.; Velankar, S.; Golovin, A.; Henrick, K.; Rice, P.; Stoehr, P.; Lopez, R.
SOAP (Simple Object Access Protocol) () based Web Services technology () has gained much attention as an open standard enabling interoperability among applications across heterogeneous architectures and different networks. The European Bioinformatics Institute (EBI) is using this technology to provide robust data retrieval and data analysis mechanisms to the scientific community and to enhance utilization of the biological resources it already provides [N. Harte, V. Silventoinen, E. Quevillon, S. Robinson, K. Kallio, X. Fustero, P. Patel, P. Jokinen and R. Lopez (2004) Nucleic Acids Res., 32, 3–9]. These services are available free to all users from . PMID:15980463
Traditional bioinformatics methods scan primary sequences for local patterns. It is important to assess how accurate local primary sequence methods can be. We study the problem of donor pre-mRNA splice site recognition, where the sequence overlaps between real and decoy data sets can be quantified, exposing the intrinsic limitations of the performance of local primary sequence methods. We assess the accuracy of local primary sequence methods generally by studying how they scale with dataset size and demonstrate that our new Primary Sequence Ranking methods have superior performance. Our Primary Sequence Ranking analysis tools are available at tt http://rna.williams.edu/
Rodríguez-Segura, M. A.; Godina-Nava, J. J.; Villa-Treviño, S.
Microarrays are devices designed to analyze simultaneous expression of thousands of genes. However, the process will adds noise into the information at each stage of the study. To analyze these thousands of data is necessary to use bioinformatics tools. The traditional analysis begins by normalizing data, but the obtained results are highly dependent on how it is conducted the study. It is shown the need to develop new strategies to analyze microarray. Liver tissue taken from an animal model in which is chemically induced cancer is used as an example.
Biochemical, Transcriptional, and Bioinformatic Analysis of Lipid Droplets from Seeds of Date Palm (Phoenix dactylifera L.) and Their Use as Potent Sequestration Agents against the Toxic Pollutant, 2,3,7,8-Tetrachlorinated Dibenzo-p-Dioxin
Hanano, Abdulsamie; Almousally, Ibrahem; Shaban, Mouhnad; Rahman, Farzana; Blee, Elizabeth; Murphy, Denis J.
Contamination of aquatic environments with dioxins, the most toxic group of persistent organic pollutants (POPs), is a major ecological issue. Dioxins are highly lipophilic and bioaccumulate in fatty tissues of marine organisms used for seafood where they constitute a potential risk for human health. Lipid droplets (LDs) purified from date palm, Phoenix dactylifera, seeds were characterized and their capacity to extract dioxins from aquatic systems was assessed. The bioaffinity of date palm LDs toward 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), the most toxic congener of dioxins was determined. Fractioned LDs were spheroidal with mean diameters of 2.5 µm, enclosing an oil-rich core of 392.5 mg mL-1. Isolated LDs did not aggregate and/or coalesce unless placed in acidic media and were strongly associated with three major groups of polypeptides of relative mass 32–37, 20–24, and 16–18 kDa. These masses correspond to the LD-associated proteins, oleosins, caleosins, and steroleosins, respectively. Efficient partitioning of TCDD into LDs occurred with a coefficient of log KLB/w,TCDD = 7.528 ± 0.024; it was optimal at neutral pH and was dependent on the presence of the oil-rich core, but was independent of the presence of LD-associated proteins. Bioinformatic analysis of the date palm genome revealed nine oleosin-like, five caleosin-like, and five steroleosin-like sequences, with predicted structures having putative lipid-binding domains that match their LD stabilizing roles and use as bio-based encapsulation systems. Transcriptomic analysis of date palm seedlings exposed to TCDD showed strong up-regulation of several caleosin and steroleosin genes, consistent with increased LD formation. The results suggest that the plant LDs could be used in ecological remediation strategies to remove POPs from aquatic environments. Recent reports suggest that several fungal and algal species also use LDs to sequester both external and internally derived hydrophobic toxins
Biochemical, Transcriptional, and Bioinformatic Analysis of Lipid Droplets from Seeds of Date Palm (Phoenix dactylifera L.) and Their Use as Potent Sequestration Agents against the Toxic Pollutant, 2,3,7,8-Tetrachlorinated Dibenzo-p-Dioxin.
Hanano, Abdulsamie; Almousally, Ibrahem; Shaban, Mouhnad; Rahman, Farzana; Blee, Elizabeth; Murphy, Denis J
Contamination of aquatic environments with dioxins, the most toxic group of persistent organic pollutants (POPs), is a major ecological issue. Dioxins are highly lipophilic and bioaccumulate in fatty tissues of marine organisms used for seafood where they constitute a potential risk for human health. Lipid droplets (LDs) purified from date palm, Phoenix dactylifera, seeds were characterized and their capacity to extract dioxins from aquatic systems was assessed. The bioaffinity of date palm LDs toward 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), the most toxic congener of dioxins was determined. Fractioned LDs were spheroidal with mean diameters of 2.5 µm, enclosing an oil-rich core of 392.5 mg mL(-1). Isolated LDs did not aggregate and/or coalesce unless placed in acidic media and were strongly associated with three major groups of polypeptides of relative mass 32-37, 20-24, and 16-18 kDa. These masses correspond to the LD-associated proteins, oleosins, caleosins, and steroleosins, respectively. Efficient partitioning of TCDD into LDs occurred with a coefficient of log K LB/w,TCDD = 7.528 ± 0.024; it was optimal at neutral pH and was dependent on the presence of the oil-rich core, but was independent of the presence of LD-associated proteins. Bioinformatic analysis of the date palm genome revealed nine oleosin-like, five caleosin-like, and five steroleosin-like sequences, with predicted structures having putative lipid-binding domains that match their LD stabilizing roles and use as bio-based encapsulation systems. Transcriptomic analysis of date palm seedlings exposed to TCDD showed strong up-regulation of several caleosin and steroleosin genes, consistent with increased LD formation. The results suggest that the plant LDs could be used in ecological remediation strategies to remove POPs from aquatic environments. Recent reports suggest that several fungal and algal species also use LDs to sequester both external and internally derived hydrophobic toxins, which
Hiew, Hong Liang; Bellgard, Matthew
Life Science research faces the constant challenge of how to effectively handle an ever-growing body of bioinformatics software and online resources. The users and developers of bioinformatics resources have a diverse set of competing demands on how these resources need to be developed and organised. Unfortunately, there does not exist an adequate community-wide framework to integrate such competing demands. The problems that arise from this include unstructured standards development, the emergence of tools that do not meet specific needs of researchers, and often times a communications gap between those who use the tools and those who supply them. This paper presents an overview of the different functions and needs of bioinformatics stakeholders to determine what may be required in a community-wide framework. A Bioinformatics Reference Model is proposed as a basis for such a framework. The reference model outlines the functional relationship between research usage and technical aspects of bioinformatics resources. It separates important functions into multiple structured layers, clarifies how they relate to each other, and highlights the gaps that need to be addressed for progress towards a diverse, manageable, and sustainable body of resources. The relevance of this reference model to the bioscience research community, and its implications in progress for organising our bioinformatics resources, are discussed.
Schottler, Natalie A.; Valli-Marill, Joanne; Beck, Lisa; Beatty, Jackson
This completely computer-based module's purpose is to introduce students to bioinformatics resources. We present an easy-to-adopt module that weaves together several important bioinformatic tools so students can grasp how these tools are used in answering research questions. Students integrate information gathered from websites dealing with anatomy (Mouse Brain Library), quantitative trait locus analysis (WebQTL from GeneNetwork), bioinformatics and gene expression analyses (University of California, Santa Cruz Genome Browser, National Center for Biotechnology Information's Entrez Gene, and the Allen Brain Atlas), and information resources (PubMed). Instructors can use these various websites in concert to teach genetics from the phenotypic level to the molecular level, aspects of neuroanatomy and histology, statistics, quantitative trait locus analysis, and molecular biology (including in situ hybridization and microarray analysis), and to introduce bioinformatic resources. Students use these resources to discover 1) the region(s) of chromosome(s) influencing the phenotypic trait, 2) a list of candidate genes—narrowed by expression data, 3) the in situ pattern of a given gene in the region of interest, 4) the nucleotide sequence of the candidate gene, and 5) articles describing the gene. Teaching materials such as a detailed student/instructor's manual, PowerPoints, sample exams, and links to free Web resources can be found at http://mdcune.psych.ucla.edu/modules/bioinformatics. PMID:20516355
Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.
There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…
Leclair, Benoît; Shaler, Robert; Carmody, George R; Eliason, Kristilyn; Hendrickson, Brant C; Judkins, Thad; Norton, Michael J; Sears, Christopher; Scholl, Tom
Victim identification initiatives undertaken in the wake of Mass Fatality Incidents (MFIs) where high-body fragmentation has been sustained are often dependent on DNA typing technologies to complete their mandate. The success of these endeavors is linked to the choice of DNA typing methods and the bioinformatic tools required to make the necessary associations. Several bioinformatic tools were developed to assist with the identification of the victims of the World Trade Center attacks, one of the most complex incidents to date. This report describes one of these tools, the Mass Disaster Kinship Analysis Program (MDKAP), a pair-wise comparison software designed to handle large numbers of complete or partial Short Tandem Repeats (STR) genotypes, and infer identity of, or biological relationships between tested samples. The software performs all functions required to take full advantage of the information content of processed genotypic data sets from large-scale MFIs, including the collapse of victims data sets, remains re-association, virtual genotype generation through gap-filling, parentage trio searching, and a consistency check of reported/inferred biological relationships within families. Although very few WTC victims were genetically related, the software can detect parentage trios from within a victim's genotype data set through a nontriangulated approach that screens all possible parentage trios. All software-inferred relationships from WTC data were confirmed by independent statistical analysis. With a 13 STR loci complement, a fortuitous parentage trio (FPT) involving nonrelated individuals was detected. Additional STR loci would be required to reduce the risk of an FPT going undetected in large-scale MFIs involving related individuals among the victims. Kinship analysis has proven successful in this incident but its continued success in larger scale MFIs is contingent on the use of a sufficient number of STR loci to reduce the risk of undetected FPTs, the
Romano, Paolo; Bartocci, Ezio; Bertolini, Guglielmo; De Paoli, Flavio; Marra, Domenico; Mauri, Giancarlo; Merelli, Emanuela; Milanesi, Luciano
Background The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing. Results We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved. Conclusion We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis
Hernández, Sergio; Calvo, Alejandra; Ferragut, Gabriela; Franco, Luís; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan
Protein multitasking or moonlighting is the capability of certain proteins to execute two or more unique biological functions. This ability to perform moonlighting functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Usually, moonlighting proteins are revealed experimentally by serendipity, and the proteins described probably represent just the tip of the iceberg. It would be helpful if bioinformatics could predict protein multifunctionality, especially because of the large amounts of sequences coming from genome projects. In the present article, we describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. The sequence analysis has been performed: (i) by remote homology searches using PSI-BLAST, (ii) by the detection of functional motifs, and (iii) by the co-evolutionary relationship between amino acids. Programs designed to identify functional motifs/domains are basically oriented to detect the main function, but usually fail in the detection of secondary ones. Remote homology searches such as PSI-BLAST seem to be more versatile in this task, and it is a good complement for the information obtained from protein-protein interaction (PPI) databases. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can be used only in very restricted situations, but can suggest how the evolutionary process of the acquisition of the second function took place.
He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei
A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.
... 7 Agriculture 3 2011-01-01 2011-01-01 false Additional fees for appeal of analysis. 91.38 Section 91.38 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE (CONTINUED)...
... 7 Agriculture 3 2010-01-01 2010-01-01 false Additional fees for appeal of analysis. 91.38 Section 91.38 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Standards, Inspections, Marketing Practices), DEPARTMENT OF AGRICULTURE (CONTINUED)...
He, Shixuan; Xie, Wanyi; Zhang, Wei; Zhang, Liqun; Wang, Yunxia; Liu, Xiaoling; Liu, Yulong; Du, Chunlei
A novel strategy which combines iteratively cubic spline fitting baseline correction method with discriminant partial least squares qualitative analysis is employed to analyze the surface enhanced Raman scattering (SERS) spectroscopy of banned food additives, such as Sudan I dye and Rhodamine B in food, Malachite green residues in aquaculture fish. Multivariate qualitative analysis methods, using the combination of spectra preprocessing iteratively cubic spline fitting (ICSF) baseline correction with principal component analysis (PCA) and discriminant partial least squares (DPLS) classification respectively, are applied to investigate the effectiveness of SERS spectroscopy for predicting the class assignments of unknown banned food additives. PCA cannot be used to predict the class assignments of unknown samples. However, the DPLS classification can discriminate the class assignment of unknown banned additives using the information of differences in relative intensities. The results demonstrate that SERS spectroscopy combined with ICSF baseline correction method and exploratory analysis methodology DPLS classification can be potentially used for distinguishing the banned food additives in field of food safety.
Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong
Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition. PMID:26813078
Yue, Kai; Peng, Yan; Peng, Changhui; Yang, Wanqin; Peng, Xin; Wu, Fuzhong
Elevated nitrogen (N) deposition alters the terrestrial carbon (C) cycle, which is likely to feed back to further climate change. However, how the overall terrestrial ecosystem C pools and fluxes respond to N addition remains unclear. By synthesizing data from multiple terrestrial ecosystems, we quantified the response of C pools and fluxes to experimental N addition using a comprehensive meta-analysis method. Our results showed that N addition significantly stimulated soil total C storage by 5.82% ([2.47%, 9.27%], 95% CI, the same below) and increased the C contents of the above- and below-ground parts of plants by 25.65% [11.07%, 42.12%] and 15.93% [6.80%, 25.85%], respectively. Furthermore, N addition significantly increased aboveground net primary production by 52.38% [40.58%, 65.19%] and litterfall by 14.67% [9.24%, 20.38%] at a global scale. However, the C influx from the plant litter to the soil through litter decomposition and the efflux from the soil due to microbial respiration and soil respiration showed insignificant responses to N addition. Overall, our meta-analysis suggested that N addition will increase soil C storage and plant C in both above- and below-ground parts, indicating that terrestrial ecosystems might act to strengthen as a C sink under increasing N deposition.
Bioinformatics and Genomics are closely related disciplines that hold great promises for the advancement of research and development in complex biomedical systems, as well as public health, drug design, comparative genomics, personalized medicine and so on. Research and development in these two important areas are impacting the science and technology. High throughput sequencing and molecular imaging technologies marked the beginning of a new era for modern translational medicine and personalized healthcare. The impact of having the human sequence and personalized digital images in hand has also created tremendous demands of developing powerful supercomputing, statistical learning and artificial intelligence approaches to handle the massive bioinformatics and personalized healthcare data, which will obviously have a profound effect on how biomedical research will be conducted toward the improvement of human health and prolonging of human life in the future. The International Society of Intelligent Biological Medicine (http://www.isibm.org) and its official journals, the International Journal of Functional Informatics and Personalized Medicine (http://www.inderscience.com/ijfipm) and the International Journal of Computational Biology and Drug Design (http://www.inderscience.com/ijcbdd) in collaboration with International Conference on Bioinformatics and Computational Biology (Biocomp), touch tomorrow's bioinformatics and personalized medicine throughout today's efforts in promoting the research, education and awareness of the upcoming integrated inter/multidisciplinary field. The 2007 international conference on Bioinformatics and Computational Biology (BIOCOMP07) was held in Las Vegas, the United States of American on June 25-28, 2007. The conference attracted over 400 papers, covering broad research areas in the genomics, biomedicine and bioinformatics. The Biocomp 2007 provides a common platform for the cross fertilization of ideas, and to help shape knowledge and
Taylor, Ronald C.
Bioinformatics researchers are increasingly confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date.
Pagliano, Enea; Meija, Juris
The combination of isotope dilution and mass spectrometry has become an ubiquitous tool of chemical analysis. Often perceived as one of the most accurate methods of chemical analysis, it is not without shortcomings. Current isotope dilution equations are not capable of fully addressing one of the key problems encountered in chemical analysis: the possible effect of sample matrix on measured isotope ratios. The method of standard addition does compensate for the effect of sample matrix by making sure that all measured solutions have identical composition. While it is impossible to attain such condition in traditional isotope dilution, we present equations which allow for matrix-matching between all measured solutions by fusion of isotope dilution and standard addition methods.
Dźwiarek, Marek; Latała, Agata
This article presents an analysis of results of 1035 serious and 341 minor accidents recorded by Poland's National Labour Inspectorate (PIP) in 2005–2011, in view of their prevention by means of additional safety measures applied by machinery users. Since the analysis aimed at formulating principles for the application of technical safety measures, the analysed accidents should bear additional attributes: the type of machine operation, technical safety measures and the type of events causing injuries. The analysis proved that the executed tasks and injury-causing events were closely connected and there was a relation between casualty events and technical safety measures. In the case of tasks consisting of manual feeding and collecting materials, the injuries usually occur because of the rotating motion of tools or crushing due to a closing motion. Numerous accidents also happened in the course of supporting actions, like removing pollutants, correcting material position, cleaning, etc. PMID:26652689
Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M.
Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers. PMID:21803787
Rocco, D; Critchlow, T
The transition of the World Wide Web from a paradigm of static Web pages to one of dynamic Web services provides new and exciting opportunities for bioinformatics with respect to data dissemination, transformation, and integration. However, the rapid growth of bioinformatics services, coupled with non-standardized interfaces, diminish the potential that these Web services offer. To face this challenge, we examine the notion of a Web service class that defines the functionality provided by a collection of interfaces. These descriptions are an integral part of a larger framework that can be used to discover, classify, and wrapWeb services automatically. We discuss how this framework can be used in the context of the proliferation of sites offering BLAST sequence alignment services for specialized data sets.
Rother, Kristian; Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M
Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers.
Although investigators using methodologies in bioinformatics have always been useful in genomic experimentation in analytic, engineering, and infrastructure support roles, only recently have bioinformaticians been able to have a primary scientific role in asking and answering questions on human health and disease. Here, I argue that this shift in role towards asking questions in medicine is now the next step needed for the field of bioinformatics. I outline four reasons why bioinformaticians are newly enabled to drive the questions in primary medical discovery: public availability of data, intersection of data across experiments, commoditization of methods, and streamlined validation. I also list four recommendations for bioinformaticians wishing to get more involved in translational research. PMID:19566916
Scholz, Sonja W.; Mhyre, Tim; Ressom, Habtom; Shah, Salim; Federoff, Howard J.
Within the last two decades, genomics and bioinformatics have profoundly impacted our understanding of the molecular mechanisms of Parkinson's disease (PD). From the description of the first PD gene in 1997 until today, we have witnessed the emergence of new technologies that have revolutionized our concepts to identify genetic mechanisms implicated in human health and disease. Driven by the publication of the human genome sequence and followed by the description of detailed maps for common genetic variability, novel applications to rapidly scrutinize the entire genome in a systematic, cost-effective manner have become a reality. As a consequence, about 30 genetic loci have been unequivocally linked to the pathogenesis of PD highlighting essential molecular pathways underlying this common disorder. Herein we discuss how neurogenomics and bioinformatics are applied to dissect the nature of this complex disease with the overall aim of developing rational therapeutic interventions. PMID:22762024
Korcsmaros, Tamas; Dunai, Zsuzsanna A; Vellai, Tibor; Csermely, Peter
The number of bioinformatics tools and resources that support molecular and cell biology approaches is continuously expanding. Moreover, systems and network biology analyses are accompanied more and more by integrated bioinformatics methods. Traditional information-centered university teaching methods often fail, as (1) it is impossible to cover all existing approaches in the frame of a single course, and (2) a large segment of the current bioinformation can become obsolete in a few years. Signaling network offers an excellent example for teaching bioinformatics resources and tools, as it is both focused and complex at the same time. Here, we present an outline of a university bioinformatics course with four sample practices to demonstrate how signaling network studies can integrate biochemistry, genetics, cell biology and network sciences. We show that several bioinformatics resources and tools, as well as important concepts and current trends, can also be integrated to signaling network studies. The research-type hands-on experiences we show enable the students to improve key competences such as teamworking, creative and critical thinking and problem solving. Our classroom course curriculum can be re-formulated as an e-learning material or applied as a part of a specific training course. The multi-disciplinary approach and the mosaic setup of the course have the additional benefit to support the advanced teaching of talented students.
Lakhno, V D
Mathematical biology and bioinformatics represent a new and rapidly progressing line of investigations which emerged in the course of work on the project "Human genome". The main applied problems of these sciences are grug design, patient-specific medicine and nanobioelectronics. It is shown that progress in the technology of mass sequencing of the human genome has set the stage for starting the national program on patient-specific medicine.
Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.
Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469
Cohen, K Bretonnel; Hunter, Lawrence E
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.
Mulder, Nicola J; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu
The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet.
Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu
The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985
Eisenhaber, Frank; Sherman, Westley Arthur
The Journal of Bioinformatics and Computational Biology (JBCB) started publishing scientific articles in 2003. It has established itself as home for solid research articles in the field (~ 60 per year) that are surprisingly well cited. JBCB has an important function as alternative publishing channel in addition to other, bigger journals.
Medin, Carey L.; Nolin, Katie L.
Molecular biologists commonly use bioinformatics to map and analyze DNA and protein sequences and to align different DNA and protein sequences for comparison. Additionally, biologists can create and view 3D models of protein structures to further understand intramolecular interactions. The primary goal of this 10-week laboratory was to introduce…
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J; Claros, M Gonzalo; Trelles, Oswaldo
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user's tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/.
Ramírez, Sergio; Muñoz-Mérida, Antonio; Karlsson, Johan; García, Maximiliano; Pérez-Pulido, Antonio J.; Claros, M. Gonzalo; Trelles, Oswaldo
The productivity of any scientist is affected by cumbersome, tedious and time-consuming tasks that try to make the heterogeneous web services compatible so that they can be useful in their research. MOWServ, the bioinformatic platform offered by the Spanish National Institute of Bioinformatics, was released to provide integrated access to databases and analytical tools. Since its release, the number of available services has grown dramatically, and it has become one of the main contributors of registered services in the EMBRACE Biocatalogue. The ontology that enables most of the web-service compatibility has been curated, improved and extended. The service discovery has been greatly enhanced by Magallanes software and biodataSF. User data are securely stored on the main server by an authentication protocol that enables the monitoring of current or already-finished user’s tasks, as well as the pipelining of successive data processing services. The BioMoby standard has been greatly extended with the new features included in the MOWServ, such as management of additional information (metadata such as extended descriptions, keywords and datafile examples), a qualified registry, error handling, asynchronous services and service replication. All of them have increased the MOWServ service quality, usability and robustness. MOWServ is available at http://www.inab.org/MOWServ/ and has a mirror at http://www.bitlab-es.com/MOWServ/. PMID:20525794
Katayama, Toshiaki; Arakawa, Kazuharu; Nakao, Mitsuteru; Ono, Keiichiro; Aoki-Kinoshita, Kiyoko F; Yamamoto, Yasunori; Yamaguchi, Atsuko; Kawashima, Shuichi; Chun, Hong-Woo; Aerts, Jan; Aranda, Bruno; Barboza, Lord Hendrix; Bonnal, Raoul Jp; Bruskiewich, Richard; Bryne, Jan C; Fernández, José M; Funahashi, Akira; Gordon, Paul Mk; Goto, Naohisa; Groscurth, Andreas; Gutteridge, Alex; Holland, Richard; Kano, Yoshinobu; Kawas, Edward A; Kerhornou, Arnaud; Kibukawa, Eri; Kinjo, Akira R; Kuhn, Michael; Lapp, Hilmar; Lehvaslaiho, Heikki; Nakamura, Hiroyuki; Nakamura, Yasukazu; Nishizawa, Tatsuya; Nobata, Chikashi; Noguchi, Tamotsu; Oinn, Thomas M; Okamoto, Shinobu; Owen, Stuart; Pafilis, Evangelos; Pocock, Matthew; Prins, Pjotr; Ranzinger, René; Reisinger, Florian; Salwinski, Lukasz; Schreiber, Mark; Senger, Martin; Shigemoto, Yasumasa; Standley, Daron M; Sugawara, Hideaki; Tashiro, Toshiyuki; Trelles, Oswaldo; Vos, Rutger A; Wilkinson, Mark D; York, William; Zmasek, Christian M; Asai, Kiyoshi; Takagi, Toshihisa
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies.
Web services have become a key technology for bioinformatics, since life science databases are globally decentralized and the exponential increase in the amount of available data demands for efficient systems without the need to transfer entire databases for every step of an analysis. However, various incompatibilities among database resources and analysis services make it difficult to connect and integrate these into interoperable workflows. To resolve this situation, we invited domain specialists from web service providers, client software developers, Open Bio* projects, the BioMoby project and researchers of emerging areas where a standard exchange data format is not well established, for an intensive collaboration entitled the BioHackathon 2008. The meeting was hosted by the Database Center for Life Science (DBCLS) and Computational Biology Research Center (CBRC) and was held in Tokyo from February 11th to 15th, 2008. In this report we highlight the work accomplished and the common issues arisen from this event, including the standardization of data exchange formats and services in the emerging fields of glycoinformatics, biological interaction networks, text mining, and phyloinformatics. In addition, common shared object development based on BioSQL, as well as technical challenges in large data management, asynchronous services, and security are discussed. Consequently, we improved interoperability of web services in several fields, however, further cooperation among major database centers and continued collaborative efforts between service providers and software developers are still necessary for an effective advance in bioinformatics web service technologies. PMID:20727200
Steed, Chad A.; Halsey, William; Dehoff, Ryan; ...
Flexible visual analysis of long, high-resolution, and irregularly sampled time series data from multiple sensor streams is a challenge in several domains. In the field of additive manufacturing, this capability is critical for realizing the full potential of large-scale 3D printers. Here, we propose a visual analytics approach that helps additive manufacturing researchers acquire a deep understanding of patterns in log and imagery data collected by 3D printers. Our specific goals include discovering patterns related to defects and system performance issues, optimizing build configurations to avoid defects, and increasing production efficiency. We introduce Falcon, a new visual analytics system thatmore » allows users to interactively explore large, time-oriented data sets from multiple linked perspectives. Falcon provides overviews, detailed views, and unique segmented time series visualizations, all with adjustable scale options. To illustrate the effectiveness of Falcon at providing thorough and efficient knowledge discovery, we present a practical case study involving experts in additive manufacturing and data from a large-scale 3D printer. The techniques described are applicable to the analysis of any quantitative time series, though the focus of this paper is on additive manufacturing.« less
Tinnemann, Peter; Stöber, Yvonne; Roll, Stephanie; Vauth, Christoph; Willich, Stefan N.; Greiner, Wolfgang
Background Besides clinical and radiological examination instrumental functional analyses are performed as diagnostic procedures for craniomandibular dysfunctions. Instrumental functional analyses cause substantial costs and shows a considerable variability between individual dentist practices. Objectives On the basis of published scientific evidence the validity of the instrumental functional analysis for the diagnosis of craniomandibular dysfunctions compared to clinical diagnostic procedures; the difference of the various forms of the instrumental functional analysis; the existence of a dependency on additional other factors and the need for further research are determined in this report. In addition, the cost effectiveness of the instrumental functional analysis is analysed in a health-policy context, and social, legal and ethical aspects are considered. Methods A literature search is performed in over 27 databases and by hand. Relevant companies and institutions are contacted concerning unpublished studies. The inclusion criteria for publications are (i) diagnostic studies with the indication “craniomandibular malfunction”, (ii) a comparison between clinical and instrumental functional analysis, (iii) publications since 1990, (iv) publications in English or German. The identified literature is evaluated by two scientists regarding the relevance of content and methodical quality. Results The systematic database search resulted in 962 hits. 187 medical and economic complete publications are evaluated. Since the evaluated studies are not relevant enough to answer the medical or health economic questions no study is included. Discussion The inconsistent terminology concerning craniomandibular dysfunctions and instrumental functional analyses results in a broad literature search in databases and an extensive search by hand. Since no relevant results concerning the validity of the instrumental functional analysis in comparison to the clinical functional analysis
Shuster, Michele; Claussen, Kira; Locke, Melly; Glazewski, Krista
At the intersection of biology and computer science is the growing field of bioinformatics-the analysis of complex datasets of biological relevance. Despite the increasing importance of bioinformatics and associated practical applications, these are not standard topics in elementary and middle school classrooms. We report on a pilot project and its evolution to support implementation of bioinformatics-based activities in elementary and middle school classrooms. Specifically, we ultimately designed a multi-day summer teacher professional development workshop, in which teachers design innovative classroom activities. By focusing on teachers, our design leverages enhanced teacher knowledge and confidence to integrate innovative instructional materials into K-8 classrooms and contributes to capacity building in STEM instruction.
Andrick, Benjamin J.; Borello, Alexa M.
Objectives. To design and implement a bioinformatics exercise that applies immunological principles to predicting rejection of protein drugs based upon patient genotype. Design. Doctor of pharmacy (PharmD) students used the Immune Epitope Database, a freely available bioinformatics tool. Over a 2-week laboratory, students interrogated whether a protein drug would be predicted to induce an immune response based upon patient genotype. Results were presented at the last laboratory session, and students completed reports discussing their findings. Assessment. Pre-lab quizzes and a final report were graded. Students answered questionnaires assessing perceived learning gains. To determine the impact on student understanding of immunity against protein drugs, the quality of student data analysis and comparisons to class data were graded. Independent measures of student learning demonstrated that students developed a greater understanding of how patient genotype could contribute to treatment failure with protein drugs. Conclusions. This study indicates that questions related to clinical immunology can be posed using bioinformatics tools. PMID:28090096
Background Bioinformatics researchers are now confronted with analysis of ultra large-scale data sets, a problem that will only increase at an alarming rate in coming years. Recent developments in open source software, that is, the Hadoop project and associated software, provide a foundation for scaling to petabyte scale data warehouses on Linux clusters, providing fault-tolerant parallelized analysis on such data using a programming style named MapReduce. Description An overview is given of the current usage within the bioinformatics community of Hadoop, a top-level Apache Software Foundation project, and of associated open source software projects. The concepts behind Hadoop and the associated HBase project are defined, and current bioinformatics software that employ Hadoop is described. The focus is on next-generation sequencing, as the leading application area to date. Conclusions Hadoop and the MapReduce programming paradigm already have a substantial base in the bioinformatics community, especially in the field of next-generation sequencing analysis, and such use is increasing. This is due to the cost-effectiveness of Hadoop-based analysis on commodity Linux clusters, and in the cloud via data upload to cloud vendors who have implemented Hadoop/HBase; and due to the effectiveness and ease-of-use of the MapReduce method in parallelization of many data analysis algorithms. PMID:21210976
Bokulich, Nicholas A; Rideout, Jai Ram; Mercurio, William G; Shiffer, Arron; Wolfe, Benjamin; Maurice, Corinne F; Dutton, Rachel J; Turnbaugh, Peter J; Knight, Rob; Caporaso, J Gregory
Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community.
Bokulich, Nicholas A.; Rideout, Jai Ram; Mercurio, William G.; Shiffer, Arron; Wolfe, Benjamin; Maurice, Corinne F.; Dutton, Rachel J.; Turnbaugh, Peter J.; Knight, Rob
ABSTRACT Mock communities are an important tool for validating, optimizing, and comparing bioinformatics methods for microbial community analysis. We present mockrobiota, a public resource for sharing, validating, and documenting mock community data resources, available at http://caporaso-lab.github.io/mockrobiota/. The materials contained in mockrobiota include data set and sample metadata, expected composition data (taxonomy or gene annotations or reference sequences for mock community members), and links to raw data (e.g., raw sequence data) for each mock community data set. mockrobiota does not supply physical sample materials directly, but the data set metadata included for each mock community indicate whether physical sample materials are available. At the time of this writing, mockrobiota contains 11 mock community data sets with known species compositions, including bacterial, archaeal, and eukaryotic mock communities, analyzed by high-throughput marker gene sequencing. IMPORTANCE The availability of standard and public mock community data will facilitate ongoing method optimizations, comparisons across studies that share source data, and greater transparency and access and eliminate redundancy. These are also valuable resources for bioinformatics teaching and training. This dynamic resource is intended to expand and evolve to meet the changing needs of the omics community. PMID:27822553
IFPA Meeting 2013 Workshop Report II: use of 'omics' in understanding placental development, bioinformatics tools for gene expression analysis, planning and coordination of a placenta research network, placental imaging, evolutionary approaches to understanding pre-eclampsia.
Ackerman, W E; Adamson, L; Carter, A M; Collins, S; Cox, B; Elliot, M G; Ermini, L; Gruslin, A; Hoodless, P A; Huang, J; Kniss, D A; McGowen, M R; Post, M; Rice, G; Robinson, W; Sadovsky, Y; Salafia, C; Salomon, C; Sled, J G; Todros, T; Wildman, D E; Zamudio, S; Lash, G E
Workshops are an important part of the IFPA annual meeting as they allow for discussion of specialized topics. At the IFPA meeting 2013 twelve themed workshops were presented, five of which are summarized in this report. These workshops related to various aspects of placental biology but collectively covered areas of new technologies for placenta research: 1) use of 'omics' in understanding placental development and pathologies; 2) bioinformatics and use of omics technologies; 3) planning and coordination of a placenta research network; 4) clinical imaging and pathological outcomes; 5) placental evolution.
Gill, Supreet Kaur; Christopher, Ajay Francis; Gupta, Vikas; Bansal, Parveen
Clinical research is making toiling efforts for promotion and wellbeing of the health status of the people. There is a rapid increase in number and severity of diseases like cancer, hepatitis, HIV etc, resulting in high morbidity and mortality. Clinical research involves drug discovery and development whereas clinical trials are performed to establish safety and efficacy of drugs. Drug discovery is a long process starting with the target identification, validation and lead optimization. This is followed by the preclinical trials, intensive clinical trials and eventually post marketing vigilance for drug safety. Softwares and the bioinformatics tools play a great role not only in the drug discovery but also in drug development. It involves the use of informatics in the development of new knowledge pertaining to health and disease, data management during clinical trials and to use clinical data for secondary research. In addition, new technology likes molecular docking, molecular dynamics simulation, proteomics and quantitative structure activity relationship in clinical research results in faster and easier drug discovery process. During the preclinical trials, the software is used for randomization to remove bias and to plan study design. In clinical trials software like electronic data capture, Remote data capture and electronic case report form (eCRF) is used to store the data. eClinical, Oracle clinical are software used for clinical data management and for statistical analysis of the data. After the drug is marketed the safety of a drug could be monitored by drug safety software like Oracle Argus or ARISg. Therefore, softwares are used from the very early stages of drug designing, to drug development, clinical trials and during pharmacovigilance. This review describes different aspects related to application of computers and bioinformatics in drug designing, discovery and development, formulation designing and clinical research. PMID:27453827
Gill, Supreet Kaur; Christopher, Ajay Francis; Gupta, Vikas; Bansal, Parveen
Clinical research is making toiling efforts for promotion and wellbeing of the health status of the people. There is a rapid increase in number and severity of diseases like cancer, hepatitis, HIV etc, resulting in high morbidity and mortality. Clinical research involves drug discovery and development whereas clinical trials are performed to establish safety and efficacy of drugs. Drug discovery is a long process starting with the target identification, validation and lead optimization. This is followed by the preclinical trials, intensive clinical trials and eventually post marketing vigilance for drug safety. Softwares and the bioinformatics tools play a great role not only in the drug discovery but also in drug development. It involves the use of informatics in the development of new knowledge pertaining to health and disease, data management during clinical trials and to use clinical data for secondary research. In addition, new technology likes molecular docking, molecular dynamics simulation, proteomics and quantitative structure activity relationship in clinical research results in faster and easier drug discovery process. During the preclinical trials, the software is used for randomization to remove bias and to plan study design. In clinical trials software like electronic data capture, Remote data capture and electronic case report form (eCRF) is used to store the data. eClinical, Oracle clinical are software used for clinical data management and for statistical analysis of the data. After the drug is marketed the safety of a drug could be monitored by drug safety software like Oracle Argus or ARISg. Therefore, softwares are used from the very early stages of drug designing, to drug development, clinical trials and during pharmacovigilance. This review describes different aspects related to application of computers and bioinformatics in drug designing, discovery and development, formulation designing and clinical research.
Gelbart, Hadas; Ben-Dor, Shifra; Yarden, Anat
Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’. Students’ learning outcomes and attitudes toward the bioinformatics learning environment were measured by analyzing their answers to questions embedded within the activities, questionnaires, interviews and observations. Students’ difficulties and knowledge acquisition were characterized based on four categories: the required domain-specific knowledge (declarative, procedural, strategic or situational), the scientific field that each question stems from (biology, bioinformatics or their combination), the associated cognitive-process dimension (remember, understand, apply, analyze, evaluate, create) and the type of question (open-ended or multiple choice). Analysis of students’ cognitive outcomes revealed learning gains in bioinformatics and related scientific fields, as well as appropriation of the bioinformatics approach as part of the students’ scientific ‘toolbox’. For students, questions stemming from the ‘old world’ biology field and requiring declarative or strategic knowledge were harder to deal with. This stands in contrast to their teachers’ prediction. Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum. PMID:26801769
Alyuruk, Hakan; Cavas, Levent
Genomics and proteomics projects have produced a huge amount of raw biological data including DNA and protein sequences. Although these data have been stored in data banks, their evaluation is strictly dependent on bioinformatics tools. These tools have been developed by multidisciplinary experts for fast and robust analysis of biological data.…
White, Benjamen; Fatima, Vayani; Fatima, Nazeefa; Das, Sayoni; Rahman, Farzana; Hassan, Mehedi
Following the success of the 1 st Student Symposium by ISCB RSG-UK, a 2 nd Student Symposium took place on 7 th October 2015 at The Genome Analysis Centre, Norwich, UK. This short report summarizes the main highlights from the 2 nd Bioinformatics Student Symposium. PMID:27239284
Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L.; Stevens, Robert
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371. PMID:27331905
Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.
Readhead, Ben; Dudley, Joel
Significance A majority of therapeutic interventions occur late in the pathological process, when treatment outcome can be less predictable and effective, highlighting the need for new precise and preventive therapeutic development strategies that consider genomic and environmental context. Translational bioinformatics is well positioned to contribute to the many challenges inherent in bridging this gap between our current reactive methods of healthcare delivery and the intent of precision medicine, particularly in the areas of drug development, which forms the focus of this review. Recent Advances A variety of powerful informatics methods for organizing and leveraging the vast wealth of available molecular measurements available for a broad range of disease contexts have recently emerged. These include methods for data driven disease classification, drug repositioning, identification of disease biomarkers, and the creation of disease network models, each with significant impacts on drug development approaches. Critical Issues An important bottleneck in the application of bioinformatics methods in translational research is the lack of investigators who are versed in both biomedical domains and informatics. Efforts to nurture both sets of competencies within individuals and to increase interfield visibility will help to accelerate the adoption and increased application of bioinformatics in translational research. Future Directions It is possible to construct predictive, multiscale network models of disease by integrating genotype, gene expression, clinical traits, and other multiscale measures using causal network inference methods. This can enable the identification of the “key drivers” of pathology, which may represent novel therapeutic targets or biomarker candidates that play a more direct role in the etiology of disease. PMID:24527359
Accardi, L.; Freudenberg, Wolfgang; Ohya, Masanori
/ H. Kamimura -- Massive collection of full-length complementary DNA clones and microarray analyses: keys to rice transcriptome analysis / S. Kikuchi -- Changes of influenza A(H5) viruses by means of entropic chaos degree / K. Sato and M. Ohya -- Basics of genome sequence analysis in bioinformatics - its fundamental ideas and problems / T. Suzuki and S. Miyazaki -- A basic introduction to gene expression studies using microarray expression data analysis / D. Wanke and J. Kilian -- Integrating biological perspectives: a quantum leap for microarray expression analysis / D. Wanke ... [et al.].
Tuffner, Francis K.; Singh, Ruchi
Distributed generators (DG) are small scale power supplying sources owned by customers or utilities and scattered throughout the power system distribution network. Distributed generation can be both renewable and non-renewable. Addition of distributed generation is primarily to increase feeder capacity and to provide peak load reduction. However, this addition comes with several impacts on the distribution feeder. Several studies have shown that addition of DG leads to reduction of feeder loss. However, most of these studies have considered lumped load and distributed load models to analyze the effects on system losses, where the dynamic variation of load due to seasonal changes is ignored. It is very important for utilities to minimize the losses under all scenarios to decrease revenue losses, promote efficient asset utilization, and therefore, increase feeder capacity. This paper will investigate an IEEE 13-node feeder populated with photovoltaic generators on detailed residential houses with water heater, Heating Ventilation and Air conditioning (HVAC) units, lights, and other plug and convenience loads. An analysis of losses for different power system components, such as transformers, underground and overhead lines, and triplex lines, will be performed. The analysis will utilize different seasons and different solar penetration levels (15%, 30%).
Narayanan, S. R.; Surampudi, S.; Attia, A. I.; Bankston, C. P.
The overcharge condition in secondary lithium batteries employing redox additives for overcharge protection, has been theoretically analyzed in terms of a finite linear diffusion model. The analysis leads to expressions relating the steady-state overcharge current density and cell voltage to the concentration, diffusion coefficient, standard reduction potential of the redox couple, and interelectrode distance. The model permits the estimation of the maximum permissible overcharge rate for any chosen set of system conditions. Digital simulation of the overcharge experiment leads to numerical representation of the potential transients, and estimate of the influence of diffusion coefficient and interelectrode distance on the transient attainment of the steady state during overcharge. The model has been experimentally verified using 1,1-prime-dimethyl ferrocene as a redox additive. The analysis of the experimental results in terms of the theory allows the calculation of the diffusion coefficient and the formal potential of the redox couple. The model and the theoretical results may be exploited in the design and optimization of overcharge protection by the redox additive approach.
Kesh, Someswa; Raghupathi, Wullianallur
This article provides an overview of the field of bioinformatics and its implications for the various participants. Next-generation issues facing developers (programmers), users (molecular biologists), and the general public (patients) who would benefit from the potential applications are identified. The goal is to create awareness and debate on the opportunities (such as career paths) and the challenges such as privacy that arise. A triad model of the participants' roles and responsibilities is presented along with the identification of the challenges and possible solutions. PMID:18066389
Alkema, Wynand; Boekhorst, Jos; Wels, Michiel; van Hijum, Sacha A F T
In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput 'omics' technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety.
Handl, Julia; Kell, Douglas B; Knowles, Joshua
This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology. A survey of existing work, organized by application area, forms the main body of the review, following an introduction to the key concepts in multiobjective optimization. An original contribution of the review is the identification of five distinct "contexts," giving rise to multiple objectives: These are used to explain the reasons behind the use of multiobjective optimization in each application area and also to point the way to potential future uses of the technique.
Tenenbaum, Jessica D.
Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field. PMID:26876718
Alkema, Wynand; Boekhorst, Jos; Wels, Michiel
In the production of fermented foods, microbes play an important role. Optimization of fermentation processes or starter culture production traditionally was a trial-and-error approach inspired by expert knowledge of the fermentation process. Current developments in high-throughput ‘omics’ technologies allow developing more rational approaches to improve fermentation processes both from the food functionality as well as from the food safety perspective. Here, the authors thematically review typical bioinformatics techniques and approaches to improve various aspects of the microbial production of fermented food products and food safety. PMID:26082168
Fang, Min; Zhang, Pei; Zhao, Yanxin; Liu, Xueyuan
APP/PS1 transgenic mice with Alzheimer disease (AD) are widely used as a reliable animal model in studies about behaviors, physiology, biochemistry and histomorphology of AD, but few studies have been conducted to investigate the role of lncRNAs in this model. In this study, lncRNA microarray was employed to detect the gene expression profile and lncRNA expression profile in the mouse brain. Then, bioinformatics was used to predict the differentially expressed genes related to AD (n=20). Among different lncRNAs (n=249), 99 were downregulated and 150 upregulated. Co-expression network was applied to analyze the co-expression of differential lncRNAs and different genes. In network, lncRNA Gm13498 and lncRNA 1700030L20Rik correlated with the most genes and their degrees were 6 and 5, respectively. Then, the function and signal transduction pathways related to the differentially co-expressed lncRNAs were analyzed with bioinformatics, and results showed that these lncRNAs were involved in the systemic development of neurons, intercellular communication, regulation of action potential of neurons, development and differentiation of oligodendrocytes, neurotransmitters transmission, and neuronal regeneration. Realtime PCR was employed to detect the expression of relevant lncRNAs and differentially expressed RNAs in 10 samples, and results were consistent with above findings from microarray. PMID:28386363
Liu, Xiaoqiao; Wu, Jianmin; Wang, Jun; Liu, Xiaochuan; Zhao, Shuqi; Li, Zhe; Kong, Lei; Gu, Xiaocheng; Luo, Jingchu; Gao, Ge
With the rapid progress of biological research, great demands are proposed for integrative knowledge-sharing systems to efficiently support collaboration of biological researchers from various fields. To fulfill such requirements, we have developed a data-centric knowledge-sharing platform WebLab for biologists to fetch, analyze, manipulate and share data under an intuitive web interface. Dedicated space is provided for users to store their input data and analysis results. Users can upload local data or fetch public data from remote databases, and then perform analysis using more than 260 integrated bioinformatic tools. These tools can be further organized as customized analysis workflows to accomplish complex tasks automatically. In addition to conventional biological data, WebLab also provides rich supports for scientific literatures, such as searching against full text of uploaded literatures and exporting citations into various well-known citation managers such as EndNote and BibTex. To facilitate team work among colleagues, WebLab provides a powerful and flexible sharing mechanism, which allows users to share input data, analysis results, scientific literatures and customized workflows to specified users or groups with sophisticated privilege settings. WebLab is publicly available at http://weblab.cbi.pku.edu.cn, with all source code released as Free Software.
Yan, Ying; Yi, Grace Y
Covariate measurement error occurs commonly in survival analysis. Under the proportional hazards model, measurement error effects have been well studied, and various inference methods have been developed to correct for error effects under such a model. In contrast, error-contaminated survival data under the additive hazards model have received relatively less attention. In this paper, we investigate this problem by exploring measurement error effects on parameter estimation and the change of the hazard function. New insights of measurement error effects are revealed, as opposed to well-documented results for the Cox proportional hazards model. We propose a class of bias correction estimators that embraces certain existing estimators as special cases. In addition, we exploit the regression calibration method to reduce measurement error effects. Theoretical results for the developed methods are established, and numerical assessments are conducted to illustrate the finite sample performance of our methods.
González-Rodríguez, M Victoria; Dopico-García, M Sonia; Noguerol-Cal, Rosalía; Carballeira-Amarelo, Tania; López-Vilariño, José M; Fernández-Martínez, Gerado
This article investigates the applicability of HPLC-UV, ultra performance LC-evaporative light-scattering detection (UPLC-ELSD), HPLC-ESI(+)-MS and HPLC-hybrid linear ion trap (LTQ) Orbitrap MS for the analysis of different non-ionic antistatic additives, Span 20, Span 60, Span 65, Span 80, Span 85 (sorbitan fatty acid esters), Atmer 129 (glycerol fatty acid ester) and Atmer 163 (ethoxylated alkylamine). Several alkyl chain length or different degrees of esterification of polyol derivatives can be present in commercial mixtures of these polymer additives. Therefore, their identification and quantification is complicated. Qualitative composition of the studied compounds was analysed by MS. HPLC-UV, UPLC-ELSD and HPLC-LTQ Orbitrap MS methods were applied to the quantitative determination of the different Spans, Atmer 129 and Atmer 163, respectively. Quality parameters of these methods were established and no derivatization was necessary.
With the development of the Internet and the growth of online resources, bioinformatics training for wet-lab biologists became necessary as a part of their education. This article describes a one-semester course ‘Applied Bioinformatics Course’ (ABC, http://abc.cbi.pku.edu.cn/) that the author has been teaching to biological graduate students at the Peking University and the Chinese Academy of Agricultural Sciences for the past 13 years. ABC is a hands-on practical course to teach students to use online bioinformatics resources to solve biological problems related to their ongoing research projects in molecular biology. With a brief introduction to the background of the course, detailed information about the teaching strategies of the course are outlined in the ‘How to teach’ section. The contents of the course are briefly described in the ‘What to teach’ section with some real examples. The author wishes to share his teaching experiences and the online teaching materials with colleagues working in bioinformatics education both in local and international universities. PMID:24008274
Moon, Ji-Hoi; Lee, Jae-Hyung
The human oral cavity contains a highly personalized microbiome essential to maintaining health, but capable of causing oral and systemic diseases. Thus, an in-depth definition of "healthy oral microbiome" is critical to understanding variations in disease states from preclinical conditions, and disease onset through progressive states of disease. With rapid advances in DNA sequencing and analytical technologies, population-based studies have documented the range and diversity of both taxonomic compositions and functional potentials observed in the oral microbiome in healthy individuals. Besides factors specific to the host, such as age and race/ethnicity, environmental factors also appear to contribute to the variability of the healthy oral microbiome. Here, we review bioinformatic techniques for metagenomic datasets, including their strengths and limitations. In addition, we summarize the interpersonal and intrapersonal diversity of the oral microbiome, taking into consideration the recent large-scale and longitudinal studies, including the Human Microbiome Project. [BMB Reports 2016; 49(12): 662-670].
Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo
We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
Moon, Ji-Hoi; Lee, Jae-Hyung
The human oral cavity contains a highly personalized microbiome essential to maintaining health, but capable of causing oral and systemic diseases. Thus, an in-depth definition of “healthy oral microbiome” is critical to understanding variations in disease states from preclinical conditions, and disease onset through progressive states of disease. With rapid advances in DNA sequencing and analytical technologies, population-based studies have documented the range and diversity of both taxonomic compositions and functional potentials observed in the oral microbiome in healthy individuals. Besides factors specific to the host, such as age and race/ethnicity, environmental factors also appear to contribute to the variability of the healthy oral microbiome. Here, we review bioinformatic techniques for metagenomic datasets, including their strengths and limitations. In addition, we summarize the interpersonal and intrapersonal diversity of the oral microbiome, taking into consideration the recent large-scale and longitudinal studies, including the Human Microbiome Project. PMID:27697111
Field, E. I.; Johnson, S. E.
Implementation is made of the three-dimensional family of linear, quadratic and cubic isoparametric solid elements into the NASA Structural Analysis program, NASTRAN. This work included program development, installation, testing, and documentation. The addition of these elements to NASTRAN provides a significant increase in modeling capability particularly for structures requiring specification of temperatures, material properties, displacements, and stresses which vary throughout each individual element. Complete program documentation is presented in the form of new sections and updates for direct insertion to the three NASTRAN manuals. The results of demonstration test problems are summarized. Excellent results are obtained with the isoparametric elements for static, normal mode, and buckling analyses.
Sultanov, Albert H.; Gayfulin, Renat R.; Vinogradova, Irina L.
Fiber optic telecommunication systems with duplex data transmitting over single fiber require reflection minimization. Moreover reflections may be so high that causes system deactivating by misoperation of conventional alarm, and system can not automatically adjudge the collision, so operator manual control is required. In this paper we proposed technical solution of mentioned problem based on additional analysis subsystem, realized on the installed Ufa-city fiber optic CTV system "Crystal". Experience of it's maintenance and results of investigations of the fault tolerance parameters are represented
Mangan, Mary E.; Perreault-Micale, Cynthia; Lathe, Scott; Sirohi, Neeraj; Lathe, Warren C.
The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review. PMID:20798181
The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of 'omics' to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines.
Park, Sang Hyun; Jeon, Hyeong Kyu; Kim, Jin Bong
Most of the diphyllobothriid tapeworms isolated from human samples in the Republic of Korea (= Korea) have been identified as Diphyllobothrium nihonkaiense by genetic analysis. This paper reports confirmation of D. nihonkaiense infections in 4 additional human samples obtained between 1995 and 2014, which were analyzed at the Department of Parasitology, Hallym University College of Medicine, Korea. Analysis of the mitochondrial cytochrome c oxidase 1 (cox1) gene revealed a 98.5-99.5% similarity with a reference D. nihonkaiense sequence in GenBank. The present report adds 4 cases of D. nihonkaiense infections to the literature, indicating that the dominant diphyllobothriid tapeworm species in Korea is D. nihonkaiense but not D. latum. PMID:25748716
Curtis, Andrew; Li, Bin; Marx, Brian D; Mills, Jacqueline W; Pine, John
This paper analyses structural and personal exposure to Hurricane Katrina. Structural exposure is measured by flood height and building damage; personal exposure is measured by the locations of 911 calls made during the response. Using these variables, this paper characterises the geography of exposure and also demonstrates the utility of a robust analytical approach in understanding health-related challenges to disadvantaged populations during recovery. Analysis is conducted using a contemporary statistical approach, a multiple additive regression tree (MART), which displays considerable improvement over traditional regression analysis. By using MART, the percentage of improvement in R-squares over standard multiple linear regression ranges from about 62 to more than 100 per cent. The most revealing finding is the modelled verification that African Americans experienced disproportionate exposure in both structural and personal contexts. Given the impact of exposure to health outcomes, this finding has implications for understanding the long-term health challenges facing this population.
Giugno, Rosalba; Pulvirenti, Alfredo
Advanced research requires intensive interaction among a multitude of actors, often possessing different expertise and usually working at a distance from each other. The field of collaborative research aims to establish suitable models and technologies to properly support these interactions. In this article, we first present the reasons for an interest of Bioinformatics in this context by also suggesting some research domains that could benefit from collaborative research. We then review the principles and some of the most relevant applications of social networking, with a special attention to networks supporting scientific collaboration, by also highlighting some critical issues, such as identification of users and standardization of formats. We then introduce some systems for collaborative document creation, including wiki systems and tools for ontology development, and review some of the most interesting biological wikis. We also review the principles of Collaborative Development Environments for software and show some examples in Bioinformatics. Finally, we present the principles and some examples of Learning Management Systems. In conclusion, we try to devise some of the goals to be achieved in the short term for the exploitation of these technologies. PMID:21984743
Wang, Lin; Qu, Moying; Chen, Yao; Zhou, Yaxiong; Wan, Zhi
Objectives We performed a meta-analysis to explore the effects of adding statins to standard treatment on adult patients of pulmonary hypertension (PH). Methods A systematic search up to December, 2015 of Medline, EMBASE, Cochrane Database of Systematic reviews and Cochrane Central Register of Controlled Trials was performed to identify randomized controlled trials with PH patients treated with statins. Results Five studies involving 425 patients were included into this meta-analysis. The results of our analysis showed that the statins can’t significantly increase 6-minute walking distance (6MWD, mean difference [MD] = -0.33 [CI: -18.25 to 17.59]), decrease the BORG dyspnea score (MD = -0.72 [CI: -2.28 to 0.85]), the clinical worsening risk (11% in statins vs. 10.1% in controls, Risk ratio = 1.06 [CI: 0.61, 1.83]), or the systolic pulmonary arterial pressure (SPAP) (MD = -0.72 [CI: -2.28 to 0.85]). Subgroup analysis for PH due to COPD or non-COPD also showed no significance. Conclusions Statins have no additional beneficial effect on standard therapy for PH, but the results from subgroup of PH due to COPD seem intriguing and further study with larger sample size and longer follow-up is suggested. PMID:27992469
Zhu, Liang; Zhao, Hui; Sun, Jianguo; Leisenring, Wendy; Robison, Leslie L
Event-history studies of recurrent events are often conducted in fields such as demography, epidemiology, medicine, and social sciences (Cook and Lawless, 2007, The Statistical Analysis of Recurrent Events. New York: Springer-Verlag; Zhao et al., 2011, Test 20, 1-42). For such analysis, two types of data have been extensively investigated: recurrent-event data and panel-count data. However, in practice, one may face a third type of data, mixed recurrent-event and panel-count data or mixed event-history data. Such data occur if some study subjects are monitored or observed continuously and thus provide recurrent-event data, while the others are observed only at discrete times and hence give only panel-count data. A more general situation is that each subject is observed continuously over certain time periods but only at discrete times over other time periods. There exists little literature on the analysis of such mixed data except that published by Zhu et al. (2013, Statistics in Medicine 32, 1954-1963). In this article, we consider the regression analysis of mixed data using the additive rate model and develop some estimating equation-based approaches to estimate the regression parameters of interest. Both finite sample and asymptotic properties of the resulting estimators are established, and the numerical studies suggest that the proposed methodology works well for practical situations. The approach is applied to a Childhood Cancer Survivor Study that motivated this study.
Grady, Joseph E.; Haller, William J.; Poinsatte, Philip E.; Halbig, Michael C.; Schnulo, Sydney L.; Singh, Mrityunjay; Weir, Don; Wali, Natalie; Vinup, Michael; Jones, Michael G.; Patterson, Clark; Santelle, Tom; Mehl, Jeremy
The research and development activities reported in this publication were carried out under NASA Aeronautics Research Institute (NARI) funded project entitled "A Fully Nonmetallic Gas Turbine Engine Enabled by Additive Manufacturing." The objective of the project was to conduct evaluation of emerging materials and manufacturing technologies that will enable fully nonmetallic gas turbine engines. The results of the activities are described in three part report. The first part of the report contains the data and analysis of engine system trade studies, which were carried out to estimate reduction in engine emissions and fuel burn enabled due to advanced materials and manufacturing processes. A number of key engine components were identified in which advanced materials and additive manufacturing processes would provide the most significant benefits to engine operation. The technical scope of activities included an assessment of the feasibility of using additive manufacturing technologies to fabricate gas turbine engine components from polymer and ceramic matrix composites, which were accomplished by fabricating prototype engine components and testing them in simulated engine operating conditions. The manufacturing process parameters were developed and optimized for polymer and ceramic composites (described in detail in the second and third part of the report). A number of prototype components (inlet guide vane (IGV), acoustic liners, engine access door) were additively manufactured using high temperature polymer materials. Ceramic matrix composite components included turbine nozzle components. In addition, IGVs and acoustic liners were tested in simulated engine conditions in test rigs. The test results are reported and discussed in detail.
Despite their diverse pharmacological effects, polyphenols are poor for use as drugs, which have been traditionally ascribed to their low bioavailability. However, Baell and co-workers recently proposed that the redox potential of polyphenols also plays an important role in this, because redox reactions bring promiscuous actions on various protein targets and thus produce non-specific pharmacological effects. To investigate whether the redox reactivity behaves as a critical factor in polyphenol promiscuity, we performed a chemical bioinformatics analysis on the structure-activity relationships of twenty polyphenols. It was found that the gene expression profiles of human cell lines induced by polyphenols were not correlated with the presence or not of redox moieties in the polyphenols, but significantly correlated with their molecular structures. Therefore, it is concluded that the promiscuous actions of polyphenols are likely to result from their inherent structural features rather than their redox potential.
Zhou, Yinhua; Datta, Saheli; Salter, Charlotte
The governments of China, India, and the United Kingdom are unanimous in their belief that bioinformatics should supply the link between basic life sciences research and its translation into health benefits for the population and the economy. Yet at the same time, as ambitious states vying for position in the future global bioeconomy they differ considerably in the strategies adopted in pursuit of this goal. At the heart of these differences lies the interaction between epistemic change within the scientific community itself and the apparatus of the state. Drawing on desk-based research and thirty-two interviews with scientists and policy makers in the three countries, this article analyzes the politics that shape this interaction. From this analysis emerges an understanding of the variable capacities of different kinds of states and political systems to work with science in harnessing the potential of new epistemic territories in global life sciences innovation. PMID:27546935
Williams, Jason J; Teal, Tracy K
In biology, a missing link connecting data generation and data-driven discovery is the training that prepares researchers to effectively manage and analyze data. National and international cyberinfrastructure along with evolving private sector resources place biologists and students within reach of the tools needed for data-intensive biology, but training is still required to make effective use of them. In this concept paper, we review a number of opportunities and challenges that can inform the creation of a national bioinformatics training infrastructure capable of servicing the large number of emerging and existing life scientists. While college curricula are slower to adapt, grassroots startup-spirited organizations, such as Software and Data Carpentry, have made impressive inroads in training on the best practices of software use, development, and data analysis. Given the transformative potential of biology and medicine as full-fledged data sciences, more support is needed to organize, amplify, and assess these efforts and their impacts.
Poirion, Olivier B.; Zhu, Xun; Ching, Travers; Garmire, Lana
The emerging single-cell RNA-Seq (scRNA-Seq) technology holds the promise to revolutionize our understanding of diseases and associated biological processes at an unprecedented resolution. It opens the door to reveal intercellular heterogeneity and has been employed to a variety of applications, ranging from characterizing cancer cells subpopulations to elucidating tumor resistance mechanisms. Parallel to improving experimental protocols to deal with technological issues, deriving new analytical methods to interpret the complexity in scRNA-Seq data is just as challenging. Here, we review current state-of-the-art bioinformatics tools and methods for scRNA-Seq analysis, as well as addressing some critical analytical challenges that the field faces. PMID:27708664
Milan, Thomas; Wilhelm, Brian T
The development of next-generation sequencing technologies has had a profound impact on the field of cancer genomics. With the enormous quantities of data being generated from tumor samples, researchers have had to rapidly adapt tools or develop new ones to analyse the raw data to maximize its value. While much of this effort has been focused on improving specific algorithms to get faster and more precise results, the accessibility of the final data for the research community remains a significant problem. Large amounts of data exist but are not easily available to researchers who lack the resources and experience to download and reanalyze them. In this article, we focus on RNA-seq analysis in the context of cancer genomics and discuss the bioinformatic tools available to explore these data. We also highlight the importance of developing new and more intuitive tools to provide easier access to public data and discuss the related issues of data sharing and patient privacy.
Kisiela, Michael; Skarka, Adam; Ebert, Bettina; Maser, Edmund
Steroidal compounds including cholesterol, bile acids and steroid hormones play a central role in various physiological processes such as cell signaling, growth, reproduction, and energy homeostasis. Hydroxysteroid dehydrogenases (HSDs), which belong to the superfamily of short-chain dehydrogenases/reductases (SDR) or aldo-keto reductases (AKR), are important enzymes involved in the steroid hormone metabolism. HSDs function as an enzymatic switch that controls the access of receptor-active steroids to nuclear hormone receptors and thereby mediate a fine-tuning of the steroid response. The aim of this study was the identification of classified functional HSDs and the bioinformatic annotation of these proteins in all complete sequenced bacterial genomes followed by a phylogenetic analysis. For the bioinformatic annotation we constructed specific hidden Markov models in an iterative approach to provide a reliable identification for the specific catalytic groups of HSDs. Here, we show a detailed phylogenetic analysis of 3α-, 7α-, 12α-HSDs and two further functional related enzymes (3-ketosteroid-Δ(1)-dehydrogenase, 3-ketosteroid-Δ(4)(5α)-dehydrogenase) from the superfamily of SDRs. For some bacteria that have been previously reported to posses a specific HSD activity, we could annotate the corresponding HSD protein. The dominating phyla that were identified to express HSDs were that of Actinobacteria, Proteobacteria, and Firmicutes. Moreover, some evolutionarily more ancient microorganisms (e.g., Cyanobacteria and Euryachaeota) were found as well. A large number of HSD-expressing bacteria constitute the normal human gastro-intestinal flora. Another group of bacteria were originally isolated from natural habitats like seawater, soil, marine and permafrost sediments. These bacteria include polycyclic aromatic hydrocarbons-degrading species such as Pseudomonas, Burkholderia and Rhodococcus. In conclusion, HSDs are found in a wide variety of microorganisms including
de la Calle, Guillermo; García-Remesal, Miguel; Chiesa, Stefano; de la Iglesia, Diana; Maojo, Victor
Background The rapid evolution of Internet technologies and the collaborative approaches that dominate the field have stimulated the development of numerous bioinformatics resources. To address this new framework, several initiatives have tried to organize these services and resources. In this paper, we present the BioInformatics Resource Inventory (BIRI), a new approach for automatically discovering and indexing available public bioinformatics resources using information extracted from the scientific literature. The index generated can be automatically updated by adding additional manuscripts describing new resources. We have developed web services and applications to test and validate our approach. It has not been designed to replace current indexes but to extend their capabilities with richer functionalities. Results We developed a web service to provide a set of high-level query primitives to access the index. The web service can be used by third-party web services or web-based applications. To test the web service, we created a pilot web application to access a preliminary knowledge base of resources. We tested our tool using an initial set of 400 abstracts. Almost 90% of the resources described in the abstracts were correctly classified. More than 500 descriptions of functionalities were extracted. Conclusion These experiments suggest the feasibility of our approach for automatically discovering and indexing current and future bioinformatics resources. Given the domain-independent characteristics of this tool, it is currently being applied by the authors in other areas, such as medical nanoinformatics. BIRI is available at . PMID:19811635
Probabilistic topic models have been developed for applications in various domains such as text mining, information retrieval and computer vision and bioinformatics domain. In this thesis, we focus on developing novel probabilistic topic models for image mining and bioinformatics studies. Specifically, a probabilistic topic-connection (PTC) model…
Howard, David R.; Miskowski, Jennifer A.; Grunwald, Sandra K.; Abler, Michael L.
At the University of Wisconsin-La Crosse, we have undertaken a program to integrate the study of bioinformatics across the undergraduate life science curricula. Our efforts have included incorporating bioinformatics exercises into courses in the biology, microbiology, and chemistry departments, as well as coordinating the efforts of faculty within…
Ramlo, Susan E.; McConnell, David; Duan, Zhong-Hui; Moore, Francisco B.
Faculty at a Midwestern metropolitan public university recently developed a course on bioinformatics that emphasized collaboration and inquiry. Bioinformatics, essentially the application of computational tools to biological data, is inherently interdisciplinary. Thus part of the challenge of creating this course was serving the needs and…
Jungck, John R; Donovan, Samuel S; Weisstein, Anton E; Khiripet, Noppadon; Everse, Stephen J
Bioinformatics is central to biology education in the 21st century. With the generation of terabytes of data per day, the application of computer-based tools to stored and distributed data is fundamentally changing research and its application to problems in medicine, agriculture, conservation and forensics. In light of this 'information revolution,' undergraduate biology curricula must be redesigned to prepare the next generation of informed citizens as well as those who will pursue careers in the life sciences. The BEDROCK initiative (Bioinformatics Education Dissemination: Reaching Out, Connecting and Knitting together) has fostered an international community of bioinformatics educators. The initiative's goals are to: (i) Identify and support faculty who can take leadership roles in bioinformatics education; (ii) Highlight and distribute innovative approaches to incorporating evolutionary bioinformatics data and techniques throughout undergraduate education; (iii) Establish mechanisms for the broad dissemination of bioinformatics resource materials and teaching models; (iv) Emphasize phylogenetic thinking and problem solving; and (v) Develop and publish new software tools to help students develop and test evolutionary hypotheses. Since 2002, BEDROCK has offered more than 50 faculty workshops around the world, published many resources and supported an environment for developing and sharing bioinformatics education approaches. The BEDROCK initiative builds on the established pedagogical philosophy and academic community of the BioQUEST Curriculum Consortium to assemble the diverse intellectual and human resources required to sustain an international reform effort in undergraduate bioinformatics education.
Stevens, R; Miller, C
Bioinformaticians seeking to provide services to working biologists are faced with the twin problems of distribution and diversity of resources. Bioinformatics databases are distributed around the world and exist in many kinds of storage forms, platforms and access paradigms. To provide adequate services to biologists, these distributed and diverse resources have to interoperate seamlessly within single applications. The Common Object Request Broker Architecture (CORBA) offers one technical solution to these problems. The key component of CORBA is its use of object orientation as an intermediate form to translate between different representations. This paper concentrates on an explanation of object orientation and how it can be used to overcome the problems of distribution and diversity by describing the interfaces between objects.
Mochida, Keiichi; Shinozaki, Kazuo
Recent remarkable innovations in platforms for omics-based research and application development provide crucial resources to promote research in model and applied plant species. A combinatorial approach using multiple omics platforms and integration of their outcomes is now an effective strategy for clarifying molecular systems integral to improving plant productivity. Furthermore, promotion of comparative genomics among model and applied plants allows us to grasp the biological properties of each species and to accelerate gene discovery and functional analyses of genes. Bioinformatics platforms and their associated databases are also essential for the effective design of approaches making the best use of genomic resources, including resource integration. We review recent advances in research platforms and resources in plant omics together with related databases and advances in technology. PMID:20208064
Moore, Alyssa C.; Winkjer, Jonathan S.; Tseng, Tsai-Tien
Biomarker identification is often associated with the diagnosis and evaluation of various diseases. Recently, the role of microRNA (miRNA) has been implicated in the development of diseases, particularly cancer. With the advent of next-generation sequencing, the amount of data on miRNA has increased tremendously in the last decade, requiring new bioinformatics approaches for processing and storing new information. New strategies have been developed in mining these sequencing datasets to allow better understanding toward the actions of miRNAs. As a result, many databases have also been established to disseminate these findings. This review focuses on several curated databases of miRNAs and their targets from both predicted and validated sources. PMID:26819547
Gregory, Philip C.; Lawler, Samantha M.; Gladman, Brett
A re-analysis of Gliese 667C HARPS precision radial velocity data was carried out with a Bayesian multi-planet Kepler periodogram (from 0 to 7 planets) based on a fusion Markov chain Monte Carlo algorithm. The most probable number of signals detected is six with a Bayesian false alarm probability of 0.012. The residuals were shown to be consistent with white noise. The six signals detected include two previously reported with periods of 7.198 (b) and 28.14 (c) days, plus additional periods of 30.82, 38.82, 53.22, and 91.3 days. The existence of these Keplerian-like signals suggest the possibility of additional planets in the habitable zone of Gl 667C although some of the signals could be artifacts arising from the sampling or stellar surface activity. N-body orbital integrations are being undertaken to determine which of these signals are consistent with a stable planetary system. Preliminary results demonstrate that four of the signals, with periods of 7.2, 28.1, 38.8, & 91 d, are consistent with a stable 4 planet system on time scales of 107 yr. The M sin i values are ~5.5, 4.4, 1.9, and 4.7 M⊕, respectively.
Lunder, Sonya; Woodruff, Tracey J; Axelrad, Daniel A
There are 188 air toxics listed as hazardous air pollutants (HAPs) in the Clean Air Act (CAA), based on their potential to adversely impact public health. This paper presents several analyses performed to screen potential candidates for addition to the HAPs list. We analyzed 1086 HAPs and potential HAPs, including chemicals regulated by the state of California or with emissions reported to the Toxics Release Inventory (TRI). HAPs and potential HAPs were ranked by their emissions to air, and by toxicity-weighted (tox-wtd) emissions for cancer and noncancer, using emissions information from the TRI and toxicity information from state and federal agencies. Separate consideration was given for persistent, bioaccumulative toxins (PBTs), reproductive or developmental toxins, and chemicals under evaluation for regulation as toxic air contaminants in California. Forty-four pollutants were identified as candidate HAPs based on three ranking analyses and whether they were a PBT or a reproductive or developmental toxin. Of these, nine qualified in two or three different rankings (ammonia [NH3], copper [Cu], Cu compounds, nitric acid [HNO3], N-methyl-2-pyrrolidone, sulfuric acid [H2SO4], vanadium [V] compounds, zinc [Zn], and Zn compounds). This analysis suggests further evaluation of several pollutants for possible addition to the CAA list of HAPs.
Seifert, Jana; Herbst, Florian-Alexander; Halkjaer Nielsen, Per; Planes, Francisco J; Jehmlich, Nico; Ferrer, Manuel; von Bergen, Martin
Metaproteomics of microbial communities promises to add functional information to the blueprint of genes derived from metagenomics. Right from its beginning, the achievements and developments in metaproteomics were closely interlinked with metagenomics. In addition, the evaluation, visualization, and interpretation of metaproteome data demanded for the developments in bioinformatics. This review will give an overview about recent strategies to use genomic data either from public databases or organismal specific genomes/metagenomes to increase the number of identified proteins obtained by mass spectrometric measurements. We will review different published metaproteogenomic approaches in respect to the used MS pipeline and to the used protein identification workflow. Furthermore, different approaches of data visualization and strategies for phylogenetic interpretation of metaproteome data are discussed as well as approaches for functional mapping of the results to the investigated biological systems. This information will in the end allow a comprehensive analysis of interactions and interdependencies within microbial communities.
George Priya Doss, C; Nagasundaram, N; Tanwar, Himani
Functional alteration in SMAD proteins leads to dis-regulation of its mechanism results in possibilities of high risk diseases like fibrosis, cancer, juvenile polyposis etc. Studying single nucleotide polymorphism (SNP) in SMAD genes helps understand the malfunction of these proteins. In this study, we focused on deleterious effects of nsSNPs in both structural and functional level using publically available bioinformatics tools. We have mainly focused on identifying deleterious nsSNPs in both structural and functional level in SMAD genes by using SIFT, PolyPhen, SNPs&GO, I-Mutant 3.0, MUpro and PANTHER. Structure analysis was carried out with the major mutation that occurred in the native protein coded by SMAD genes and its amino acid positions (R358W, K306S, R310G, S433R and R361C). SRide was used to check the stability of the native and mutant modelled proteins. In addition, we used MAPPER to identify SNPs present in transcription factor binding sites. These findings demonstrate that the in silico approaches can be used efficiently to identify potential candidate SNPs in large scale analysis.
Ulrich, Andrea; Wichser, Adrian
Fuel additives used in particle traps have to comply with environmental directives and should not support the formation of additional toxic substances. The emission of metal additives from diesel engines with downstream particle traps has been studied. Aspects of the optimisation of sampling procedure, sample preparation and analysis are described. Exemplary results in form of a mass balance calculation are presented. The results demonstrate the high retention rate of the studied filter system but also possible deposition of additive metals in the engine.
This study examines students' procedural and conceptual achievement in fraction addition in England and Taiwan. A total of 1209 participants (561 British students and 648 Taiwanese students) at ages 12 and 13 were recruited from England and Taiwan to take part in the study. A quantitative design by means of a self-designed written test is adopted as central to the methodological considerations. The test has two major parts: the concept part and the skill part. The former is concerned with students' conceptual knowledge of fraction addition and the latter is interested in students' procedural competence when adding fractions. There were statistically significant differences both in concept and skill parts between the British and Taiwanese groups with the latter having a higher score. The analysis of the students' responses to the skill section indicates that the superiority of Taiwanese students' procedural achievements over those of their British peers is because most of the former are able to apply algorithms to adding fractions far more successfully than the latter. Earlier, Hart  reported that around 30% of the British students in their study used an erroneous strategy (adding tops and bottoms, for example, 2/3 + 1/7 = 3/10) while adding fractions. This study also finds that nearly the same percentage of the British group remained using this erroneous strategy to add fractions as Hart found in 1981. The study also provides evidence to show that students' understanding of fractions is confused and incomplete, even those who are successfully able to perform operations. More research is needed to be done to help students make sense of the operations and eventually attain computational competence with meaningful grounding in the domain of fractions.
Argyropoulos, Christos; Unruh, Mark L.
Background Randomized Controlled Trials almost invariably utilize the hazard ratio calculated with a Cox proportional hazard model as a treatment efficacy measure. Despite the widespread adoption of HRs, these provide a limited understanding of the treatment effect and may even provide a biased estimate when the assumption of proportional hazards in the Cox model is not verified by the trial data. Additional treatment effect measures on the survival probability or the time scale may be used to supplement HRs but a framework for the simultaneous generation of these measures is lacking. Methods By splitting follow-up time at the nodes of a Gauss Lobatto numerical quadrature rule, techniques for Poisson Generalized Additive Models (PGAM) can be adopted for flexible hazard modeling. Straightforward simulation post-estimation transforms PGAM estimates for the log hazard into estimates of the survival function. These in turn were used to calculate relative and absolute risks or even differences in restricted mean survival time between treatment arms. We illustrate our approach with extensive simulations and in two trials: IPASS (in which the proportionality of hazards was violated) and HEMO a long duration study conducted under evolving standards of care on a heterogeneous patient population. Findings PGAM can generate estimates of the survival function and the hazard ratio that are essentially identical to those obtained by Kaplan Meier curve analysis and the Cox model. PGAMs can simultaneously provide multiple measures of treatment efficacy after a single data pass. Furthermore, supported unadjusted (overall treatment effect) but also subgroup and adjusted analyses, while incorporating multiple time scales and accounting for non-proportional hazards in survival data. Conclusions By augmenting the HR conventionally reported, PGAMs have the potential to support the inferential goals of multiple stakeholders involved in the evaluation and appraisal of clinical trial
Guo, Mei; Rupe, Mary A; Yang, Xiaofeng; Crasta, Oswald; Zinselmeier, Christopher; Smith, Oscar S; Bowen, Ben
Heterosis, or hybrid vigor, has been widely exploited in plant breeding for many decades, but the molecular mechanisms underlying the phenomenon remain unknown. In this study, we applied genome-wide transcript profiling to gain a global picture of the ways in which a large proportion of genes are expressed in the immature ear tissues of a series of 16 maize hybrids that vary in their degree of heterosis. Key observations include: (1) the proportion of allelic additively expressed genes is positively associated with hybrid yield and heterosis; (2) the proportion of genes that exhibit a bias towards the expression level of the paternal parent is negatively correlated with hybrid yield and heterosis; and (3) there is no correlation between the over- or under-expression of specific genes in maize hybrids with either yield or heterosis. The relationship of the expression patterns with hybrid performance is substantiated by analysis of a genetically improved modern hybrid (Pioneer hybrid 3394) versus a less improved older hybrid (Pioneer hybrid 3306) grown at different levels of plant density stress. The proportion of allelic additively expressed genes is positively associated with the modern high yielding hybrid, heterosis and high yielding environments, whereas the converse is true for the paternally biased gene expression. The dynamic changes of gene expression in hybrids responding to genotype and environment may result from differential regulation of the two parental alleles. Our findings suggest that differential allele regulation may play an important role in hybrid yield or heterosis, and provide a new insight to the molecular understanding of the underlying mechanisms of heterosis.
Zhang, Jun; Wang, Wenhai; Yang, Fengying; Zhou, Xinwen; Jin, Hong; Yang, Peng-yuan
The human hepatoma 3B cell line was chosen as an experimental model for in vitro test of drug screening. The drugs included chlorophyllin and its derivatives such as fluo-chlorophyllin, sodium copper chlorophyllin, and sodium iron chlorophyllin. The 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyl-tetrazolium bromide (MTT) method was used in this study to obtain the primary screening results. The results showed that sodium iron chlorophyllin had the best LC(50) value. Proteomic analysis was then performed for further investigation of the effect of sodium iron chlorophyllin addition to the Hep 3B cell line. The proteins identified from a total protein extract of Hep 3B before and after the drug addition were compared by two-dimensional-gel-electrophoresis. Then 32 three-fold differentially expressed proteins were successfully identified by MALDI-TOF-TOF-MS. There are 29 unique proteins among those identified proteins. These proteins include proliferating cell nuclear antigen (PCNA), T-complex protein, heterogeneous nuclear protein, nucleophosmin, heat shock protein A5 (HspA5) and peroxiredoxin. HspA5 is one of the proteins which are involved in protecting cancer cells against stress-induced apoptosis in cultured cells, protecting them against apoptosis through various mechanisms. Peroxiredoxin has anti-oxidant function and is related to cell proliferation, and signal transduction. It can protect the oxidation of other proteins. Peroxiredoxin has a close relationship with cancer and can eventually become a disease biomarker. This might help to develop a novel treatment method for carcinoma cancer.
Yu, Hui-Min; Luo, Hui; Shi, Yue; Sun, Xu-Dong; Shen, Zhong-Yao
Industrial biocatalysis is currently attracting much attention to rebuild or substitute traditional producing process of chemicals and drugs. One of key focuses in industrial biocatalysis is biocatalyst, which is usually one kind of microbial enzyme. In the recent, new technologies of bioinformatics have played and will continue to play more and more significant roles in researches of industrial bioc