Science.gov

Sample records for personal genomics bioinformatics

  1. Bioinformatics for personal genome interpretation.

    PubMed

    Capriotti, Emidio; Nehrt, Nathan L; Kann, Maricel G; Bromberg, Yana

    2012-07-01

    An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.

  2. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine

    PubMed Central

    Tsai, Ellen A.; Shakbatyan, Rimma; Evans, Jason; Rossetti, Peter; Graham, Chet; Sharma, Himanshu; Lin, Chiao-Feng; Lebo, Matthew S.

    2016-01-01

    Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient’s genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS) with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS. PMID:26927186

  3. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine.

    PubMed

    Tsai, Ellen A; Shakbatyan, Rimma; Evans, Jason; Rossetti, Peter; Graham, Chet; Sharma, Himanshu; Lin, Chiao-Feng; Lebo, Matthew S

    2016-01-01

    Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient's genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS) with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS. PMID:26927186

  4. Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics

    PubMed Central

    2012-01-01

    Progress in genomics has raised expectations in many fields, and particularly in personalized cancer research. The new technologies available make it possible to combine information about potential disease markers, altered function and accessible drug targets, which, coupled with pathological and medical information, will help produce more appropriate clinical decisions. The accessibility of such experimental techniques makes it all the more necessary to improve and adapt computational strategies to the new challenges. This review focuses on the critical issues associated with the standard pipeline, which includes: DNA sequencing analysis; analysis of mutations in coding regions; the study of genome rearrangements; extrapolating information on mutations to the functional and signaling level; and predicting the effects of therapies using mouse tumor models. We describe the possibilities, limitations and future challenges of current bioinformatics strategies for each of these issues. Furthermore, we emphasize the need for the collaboration between the bioinformaticians who implement the software and use the data resources, the computational biologists who develop the analytical methods, and the clinicians, the systems' end users and those ultimately responsible for taking medical decisions. Finally, the different steps in cancer genome analysis are illustrated through examples of applications in cancer genome analysis. PMID:22839973

  5. Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research.

    PubMed

    Yang, Jack Y; Yang, Mary Qu; Arabnia, Hamid R; Deng, Youping

    2008-01-01

    Supported by National Science Foundation (NSF), International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design and International Journal of Functional Informatics and Personalized Medicine, IEEE 7th Bioinformatics and Bioengineering attracted more than 600 papers and 500 researchers and medical doctors. It was the only synergistic inter/multidisciplinary IEEE conference with 24 Keynote Lectures, 7 Tutorials, 5 Cutting-Edge Research Workshops and 32 Scientific Sessions including 11 Special Research Interest Sessions that were designed dynamically at Harvard in response to the current research trends and advances. The committee was very grateful for the IEEE Plenary Keynote Lectures given by: Dr. A. Keith Dunker (Indiana), Dr. Jun Liu (Harvard), Dr. Brian Athey (Michigan), Dr. Mark Borodovsky (Georgia Tech and President of ISIBM), Dr. Hamid Arabnia (Georgia and Vice-President of ISIBM), Dr. Ruzena Bajcsy (Berkeley and Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Chih-Ming Ho (UCLA and Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Andy Baxevanis (United States National Institutes of Health), Dr. Arif Ghafoor (Purdue), Dr. John Quackenbush (Harvard), Dr. Eric Jakobsson (UIUC), Dr. Vladimir Uversky (Indiana), Dr. Laura Elnitski (United States National Institutes of Health) and other world-class scientific leaders. The Harvard meeting was a large academic event 100% full-sponsored by IEEE financially and academically. After a rigorous peer-review process, the committee selected 27 high-quality research papers from 600 submissions. The committee is grateful for contributions from keynote speakers Dr. Russ Altman (IEEE BIBM conference keynote lecturer on combining simulation and machine

  6. Bioinformatics Approach in Plant Genomic Research.

    PubMed

    Ong, Quang; Nguyen, Phuc; Thao, Nguyen Phuong; Le, Ly

    2016-08-01

    The advance in genomics technology leads to the dramatic change in plant biology research. Plant biologists now easily access to enormous genomic data to deeply study plant high-density genetic variation at molecular level. Therefore, fully understanding and well manipulating bioinformatics tools to manage and analyze these data are essential in current plant genome research. Many plant genome databases have been established and continued expanding recently. Meanwhile, analytical methods based on bioinformatics are also well developed in many aspects of plant genomic research including comparative genomic analysis, phylogenomics and evolutionary analysis, and genome-wide association study. However, constantly upgrading in computational infrastructures, such as high capacity data storage and high performing analysis software, is the real challenge for plant genome research. This review paper focuses on challenges and opportunities which knowledge and skills in bioinformatics can bring to plant scientists in present plant genomics era as well as future aspects in critical need for effective tools to facilitate the translation of knowledge from new sequencing data to enhancement of plant productivity. PMID:27499685

  7. [Bioinformatics in Cancer Clinical Sequencing -- An Emerging Field of Cancer Personalized Medicine].

    PubMed

    Kato, Mamoru

    2016-04-01

    Thus far, bioinformatics has mostly been applied in basic science research. It was initially used to analyze protein sequences in unicellular organisms, aiding discoveries in basic biology. Following the completion of human genome sequencing, it has also facilitated numerous discoveries in basic medicine. Recently, several clinical applications of bioinformatics have been reported. Most relevantly, bioinformatics has been applied to clinical sequencing - an emerging field of personalized medicine, or precision medicine. In this review, I will introduce basic techniques of bioinformatics used in clinical sequencing, avoiding excessive technical details. I will also discuss future directions for data analysis using bioinformatics in the field of personalized medicine.

  8. Personalized medicine: challenges and opportunities for translational bioinformatics

    PubMed Central

    Overby, Casey Lynnette; Tarczy-Hornoch, Peter

    2013-01-01

    Personalized medicine can be defined broadly as a model of healthcare that is predictive, personalized, preventive and participatory. Two US President’s Council of Advisors on Science and Technology reports illustrate challenges in personalized medicine (in a 2008 report) and in use of health information technology (in a 2010 report). Translational bioinformatics is a field that can help address these challenges and is defined by the American Medical Informatics Association as “the development of storage, analytic and interpretive methods to optimize the transformation of increasing voluminous biomedical data into proactive, predictive, preventative and participatory health.” This article discusses barriers to implementing genomics applications and current progress toward overcoming barriers, describes lessons learned from early experiences of institutions engaged in personalized medicine and provides example areas for translational bioinformatics research inquiry. PMID:24039624

  9. Bioinformatics tools for analysing viral genomic data.

    PubMed

    Orton, R J; Gu, Q; Hughes, J; Maabar, M; Modha, S; Vattipally, S B; Wilkie, G S; Davison, A J

    2016-04-01

    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing.

  10. Bioinformatics for analysis of poxvirus genomes.

    PubMed

    Da Silva, Melissa; Upton, Chris

    2012-01-01

    In recent years, there have been numerous unprecedented technological advances in the field of molecular biology; these include DNA sequencing, mass spectrometry of proteins, and microarray analysis of mRNA transcripts. Perhaps, however, it is the area of genomics, which has now generated the complete genome sequences of more than 100 poxviruses, that has had the greatest impact on the average virology researcher because the DNA sequence data is in constant use in many different ways by almost all molecular virologists. As this data resource grows, so does the importance of the availability of databases and software tools to enable the bench virologist to work with and make use of this (valuable/expensive) DNA sequence information. Thus, providing researchers with intuitive software to first select and reformat genomics data from large databases, second, to compare/analyze genomics data, and third, to view and interpret large and complex sets of results has become pivotal in enabling progress to be made in modern virology. This chapter is directed at the bench virologist and describes the software required for a number of common bioinformatics techniques that are useful for comparing and analyzing poxvirus genomes. In a number of examples, we also highlight the Viral Orthologous Clusters database system and integrated tools that we developed for the management and analysis of complete viral genomes.

  11. Pharmacogenetics and personal genomes

    PubMed Central

    Wagner, Michael J

    2010-01-01

    While pharmacogenetics - the correlation of genotype and response to medicines - currently has a small but measurable impact on the prescribing practice of clinicians, the advent of the `personal genome' is likely to change this significantly. Advances in high-throughput technologies aimed at characterizing human genetic variation, including chip-based genotyping and next-generation sequencing, are poised to provide a flood of information that will affect both pharmacogenetic discovery and pharmacogenetic application in clinical practice. In order for this flood of information to not overwhelm both researchers and clinicians alike, a variety of new and expanded information management tools will be needed, including electronic medical records, bioinformatic algorithms for analyzing sequence data, information management systems for storing, retrieving and interpreting whole-genome sequence data, and pharmacogenetic decision tools for prescribers. PMID:20190862

  12. Genomics and Bioinformatics Resources for Crop Improvement

    PubMed Central

    Mochida, Keiichi; Shinozaki, Kazuo

    2010-01-01

    Recent remarkable innovations in platforms for omics-based research and application development provide crucial resources to promote research in model and applied plant species. A combinatorial approach using multiple omics platforms and integration of their outcomes is now an effective strategy for clarifying molecular systems integral to improving plant productivity. Furthermore, promotion of comparative genomics among model and applied plants allows us to grasp the biological properties of each species and to accelerate gene discovery and functional analyses of genes. Bioinformatics platforms and their associated databases are also essential for the effective design of approaches making the best use of genomic resources, including resource integration. We review recent advances in research platforms and resources in plant omics together with related databases and advances in technology. PMID:20208064

  13. 2K09 and thereafter : the coming era of integrative bioinformatics, systems biology and intelligent computing for functional genomics and personalized medicine research.

    PubMed

    Yang, Jack Y; Niemierko, Andrzej; Bajcsy, Ruzena; Xu, Dong; Athey, Brian D; Zhang, Aidong; Ersoy, Okan K; Li, Guo-Zheng; Borodovsky, Mark; Zhang, Joe C; Arabnia, Hamid R; Deng, Youping; Dunker, A Keith; Liu, Yunlong; Ghafoor, Arif

    2010-12-01

    Significant interest exists in establishing synergistic research in bioinformatics, systems biology and intelligent computing. Supported by the United States National Science Foundation (NSF), International Society of Intelligent Biological Medicine (http://www.ISIBM.org), International Journal of Computational Biology and Drug Design (IJCBDD) and International Journal of Functional Informatics and Personalized Medicine, the ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (ISIBM IJCBS 2009) attracted more than 300 papers and 400 researchers and medical doctors world-wide. It was the only inter/multidisciplinary conference aimed to promote synergistic research and education in bioinformatics, systems biology and intelligent computing. The conference committee was very grateful for the valuable advice and suggestions from honorary chairs, steering committee members and scientific leaders including Dr. Michael S. Waterman (USC, Member of United States National Academy of Sciences), Dr. Chih-Ming Ho (UCLA, Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Wing H. Wong (Stanford, Member of United States National Academy of Sciences), Dr. Ruzena Bajcsy (UC Berkeley, Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Qu Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Andrzej Niemierko (Harvard), Dr. A. Keith Dunker (Indiana), Dr. Brian D. Athey (Michigan), Dr. Weida Tong (FDA, United States Department of Health and Human Services), Dr. Cathy H. Wu (Georgetown), Dr. Dong Xu (Missouri), Drs. Arif Ghafoor and Okan K Ersoy (Purdue), Dr. Mark Borodovsky (Georgia Tech, President of ISIBM), Dr. Hamid R. Arabnia (UGA, Vice-President of ISIBM), and other scientific leaders. The committee presented the 2009 ISIBM Outstanding Achievement Awards to Dr. Joydeep Ghosh (UT

  14. 2K09 and thereafter : the coming era of integrative bioinformatics, systems biology and intelligent computing for functional genomics and personalized medicine research

    PubMed Central

    2010-01-01

    Significant interest exists in establishing synergistic research in bioinformatics, systems biology and intelligent computing. Supported by the United States National Science Foundation (NSF), International Society of Intelligent Biological Medicine (http://www.ISIBM.org), International Journal of Computational Biology and Drug Design (IJCBDD) and International Journal of Functional Informatics and Personalized Medicine, the ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (ISIBM IJCBS 2009) attracted more than 300 papers and 400 researchers and medical doctors world-wide. It was the only inter/multidisciplinary conference aimed to promote synergistic research and education in bioinformatics, systems biology and intelligent computing. The conference committee was very grateful for the valuable advice and suggestions from honorary chairs, steering committee members and scientific leaders including Dr. Michael S. Waterman (USC, Member of United States National Academy of Sciences), Dr. Chih-Ming Ho (UCLA, Member of United States National Academy of Engineering and Academician of Academia Sinica), Dr. Wing H. Wong (Stanford, Member of United States National Academy of Sciences), Dr. Ruzena Bajcsy (UC Berkeley, Member of United States National Academy of Engineering and Member of United States Institute of Medicine of the National Academies), Dr. Mary Qu Yang (United States National Institutes of Health and Oak Ridge, DOE), Dr. Andrzej Niemierko (Harvard), Dr. A. Keith Dunker (Indiana), Dr. Brian D. Athey (Michigan), Dr. Weida Tong (FDA, United States Department of Health and Human Services), Dr. Cathy H. Wu (Georgetown), Dr. Dong Xu (Missouri), Drs. Arif Ghafoor and Okan K Ersoy (Purdue), Dr. Mark Borodovsky (Georgia Tech, President of ISIBM), Dr. Hamid R. Arabnia (UGA, Vice-President of ISIBM), and other scientific leaders. The committee presented the 2009 ISIBM Outstanding Achievement Awards to Dr. Joydeep Ghosh (UT

  15. Bacterial bioinformatics: pathogenesis and the genome.

    PubMed

    Paine, Kelly; Flower, Darren R

    2002-07-01

    As the number of completed microbial genome sequences continues to grow, there is a pressing need for the exploitation of this wealth of data through a synergistic interaction between the well-established science of bacteriology and the emergent discipline of bioinformatics. Antibiotic resistance and pathogenicity in virulent bacteria has become an increasing problem, with even the strongest drugs useless against some species, such as multi-drug resistant Enterococcus faecium and Mycobacterium tuberculosis. The global spread of Human Immunodeficiency Virus (HIV) and Acquired Immune Deficiency Syndrome (AIDS) has contributed to the re-emergence of tuberculosis and the threat from new and emergent diseases. To address these problems, bacterial pathogenicity requires redefinition as Koch's postulates become obsolete. This review discusses how the use of bacterial genomic information, and the in silico tools available at present, may aid in determining the definition of a current pathogen. The combination of both fields should provide a rapid and efficient way of assisting in the future development of antimicrobial therapies. PMID:12125816

  16. Online Bioinformatics Tutorials | Office of Cancer Genomics

    Cancer.gov

    Bioinformatics is a scientific discipline that applies computer science and information technology to help understand biological processes. The NIH provides a list of free online bioinformatics tutorials, either generated by the NIH Library or other institutes, which includes introductory lectures and "how to" videos on using various tools.

  17. Genomics and bioinformatics resources for translational science in Rosaceae.

    PubMed

    Jung, Sook; Main, Dorrie

    2014-01-01

    Recent technological advances in biology promise unprecedented opportunities for rapid and sustainable advancement of crop quality. Following this trend, the Rosaceae research community continues to generate large amounts of genomic, genetic and breeding data. These include annotated whole genome sequences, transcriptome and expression data, proteomic and metabolomic data, genotypic and phenotypic data, and genetic and physical maps. Analysis, storage, integration and dissemination of these data using bioinformatics tools and databases are essential to provide utility of the data for basic, translational and applied research. This review discusses the currently available genomics and bioinformatics resources for the Rosaceae family.

  18. Design and bioinformatics analysis of genome-wide CLIP experiments

    PubMed Central

    Wang, Tao; Xiao, Guanghua; Chu, Yongjun; Zhang, Michael Q.; Corey, David R.; Xie, Yang

    2015-01-01

    The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses. PMID:25958398

  19. The bioinformatics of psychosocial genomics in alternative and complementary medicine.

    PubMed

    Rossi, E

    2003-06-01

    The bioinformatics of alternative and complementary medicine is outlined in 3 hypotheses that extend the molecular-genomic revolution initiated by Watson and Crick 50 years ago to include psychology in the new discipline of psychosocial and cultural genomics. Stress-induced changes in the alternative splicing of genes demonstrate how psychosomatic stress in humans modulates activity-dependent gene expression, protein formation, physiological function, and psychological experience. The molecular messengers generated by stress, injury, and disease can activate immediate early genes within stem cells so that they then signal the target genes required to synthesize the proteins that will transform (differentiate) stem cells into mature well-functioning tissues. Such activity-dependent gene expression and its consequent activity-dependent neurogenesis and stem cell healing is proposed as the molecular-genomic-cellular basis of rehabilitative medicine, physical, and occupational therapy as well as the many alternative and complementary approaches to mind-body healing. The therapeutic replaying of enriching life experiences that evoke the novelty-numinosum-neurogenesis effect during creative moments of art, music, dance, drama, humor, literature, poetry, and spirituality, as well as cultural rituals of life transitions (birth, puberty, marriage, illness, healing, and death) can optimize consciousness, personal relationships, and healing in a manner that has much in common with the psychogenomic foundations of naturalistic and complementary medicine. The entire history of alternative and complementary approaches to healing is consistent with this new neuroscience world view about the role of psychological arousal and fascination in modulating gene expression, neurogenesis, and healing via the psychosocial and cultural rites of human societies.

  20. The bioinformatics of psychosocial genomics in alternative and complementary medicine.

    PubMed

    Rossi, E

    2003-06-01

    The bioinformatics of alternative and complementary medicine is outlined in 3 hypotheses that extend the molecular-genomic revolution initiated by Watson and Crick 50 years ago to include psychology in the new discipline of psychosocial and cultural genomics. Stress-induced changes in the alternative splicing of genes demonstrate how psychosomatic stress in humans modulates activity-dependent gene expression, protein formation, physiological function, and psychological experience. The molecular messengers generated by stress, injury, and disease can activate immediate early genes within stem cells so that they then signal the target genes required to synthesize the proteins that will transform (differentiate) stem cells into mature well-functioning tissues. Such activity-dependent gene expression and its consequent activity-dependent neurogenesis and stem cell healing is proposed as the molecular-genomic-cellular basis of rehabilitative medicine, physical, and occupational therapy as well as the many alternative and complementary approaches to mind-body healing. The therapeutic replaying of enriching life experiences that evoke the novelty-numinosum-neurogenesis effect during creative moments of art, music, dance, drama, humor, literature, poetry, and spirituality, as well as cultural rituals of life transitions (birth, puberty, marriage, illness, healing, and death) can optimize consciousness, personal relationships, and healing in a manner that has much in common with the psychogenomic foundations of naturalistic and complementary medicine. The entire history of alternative and complementary approaches to healing is consistent with this new neuroscience world view about the role of psychological arousal and fascination in modulating gene expression, neurogenesis, and healing via the psychosocial and cultural rites of human societies. PMID:12853721

  1. Incorporating Genomics and Bioinformatics across the Life Sciences Curriculum

    SciTech Connect

    Ditty, Jayna L.; Kvaal, Christopher A.; Goodner, Brad; Freyermuth, Sharyn K.; Bailey, Cheryl; Britton, Robert A.; Gordon, Stuart G.; Heinhorst, Sabine; Reed, Kelynne; Xu, Zhaohui; Sanders-Lorenz, Erin R.; Axen, Seth; Kim, Edwin; Johns, Mitrick; Scott, Kathleen; Kerfeld, Cheryl A.

    2011-08-01

    Undergraduate life sciences education needs an overhaul, as clearly described in the National Research Council of the National Academies publication BIO 2010: Transforming Undergraduate Education for Future Research Biologists. Among BIO 2010's top recommendations is the need to involve students in working with real data and tools that reflect the nature of life sciences research in the 21st century. Education research studies support the importance of utilizing primary literature, designing and implementing experiments, and analyzing results in the context of a bona fide scientific question in cultivating the analytical skills necessary to become a scientist. Incorporating these basic scientific methodologies in undergraduate education leads to increased undergraduate and post-graduate retention in the sciences. Toward this end, many undergraduate teaching organizations offer training and suggestions for faculty to update and improve their teaching approaches to help students learn as scientists, through design and discovery (e.g., Council of Undergraduate Research [www.cur.org] and Project Kaleidoscope [www.pkal.org]). With the advent of genome sequencing and bioinformatics, many scientists now formulate biological questions and interpret research results in the context of genomic information. Just as the use of bioinformatic tools and databases changed the way scientists investigate problems, it must change how scientists teach to create new opportunities for students to gain experiences reflecting the influence of genomics, proteomics, and bioinformatics on modern life sciences research. Educators have responded by incorporating bioinformatics into diverse life science curricula. While these published exercises in, and guidelines for, bioinformatics curricula are helpful and inspirational, faculty new to the area of bioinformatics inevitably need training in the theoretical underpinnings of the algorithms. Moreover, effectively integrating bioinformatics into

  2. Bioinformatics tools for small genomes, such as hepatitis B virus.

    PubMed

    Bell, Trevor G; Kramvis, Anna

    2015-02-01

    DNA sequence analysis is undertaken in many biological research laboratories. The workflow consists of several steps involving the bioinformatic processing of biological data. We have developed a suite of web-based online bioinformatic tools to assist with processing, analysis and curation of DNA sequence data. Most of these tools are genome-agnostic, with two tools specifically designed for hepatitis B virus sequence data. Tools in the suite are able to process sequence data from Sanger sequencing, ultra-deep amplicon resequencing (pyrosequencing) and chromatograph (trace files), as appropriate. The tools are available online at no cost and are aimed at researchers without specialist technical computer knowledge. The tools can be accessed at http://hvdr.bioinf.wits.ac.za/SmallGenomeTools, and the source code is available online at https://github.com/DrTrevorBell/SmallGenomeTools. PMID:25690798

  3. Making sense of genomes of parasitic worms: Tackling bioinformatic challenges.

    PubMed

    Korhonen, Pasi K; Young, Neil D; Gasser, Robin B

    2016-01-01

    Billions of people and animals are infected with parasitic worms (helminths). Many of these worms cause diseases that have a major socioeconomic impact worldwide, and are challenging to control because existing treatment methods are often inadequate. There is, therefore, a need to work toward developing new intervention methods, built on a sound understanding of parasitic worms at molecular level, the relationships that they have with their animal hosts and/or the diseases that they cause. Decoding the genomes and transcriptomes of these parasites brings us a step closer to this goal. The key focus of this article is to critically review and discuss bioinformatic tools used for the assembly and annotation of these genomes and transcriptomes, as well as various post-genomic analyses of transcription profiles, biological pathways, synteny, phylogeny, biogeography and the prediction and prioritisation of drug target candidates. Bioinformatic pipelines implemented and established recently provide practical and efficient tools for the assembly and annotation of genomes of parasitic worms, and will be applicable to a wide range of other parasites and eukaryotic organisms. Future research will need to assess the utility of long-read sequence data sets for enhanced genomic assemblies, and develop improved algorithms for gene prediction and post-genomic analyses, to enable comprehensive systems biology explorations of parasitic organisms.

  4. A Required Course in Human Genomics, Pharmacogenomics, and Bioinformatics

    PubMed Central

    Brazeau, Daniel A.; Brazeau, Gayle A.

    2006-01-01

    Objectives To provide students with an understanding of the principles and applications of human genetics and genomics in drug therapy optimization, patient care, and counseling. Design A 2-credit hour course entitled Principles of the Human Genome, Pharmacogenomics, and Bioinformatics was offered to third-professional year PharmD students. Written examinations, in-class exercises, and a written paper evaluating the current literature were used to evaluate student learning. Assessment Student course ratings on the pedagogical format of the course and the relevance of course material to professional practice have improved significantly since first implementation in 2002. Conclusion This course provided pharmacy students with an understanding of pharmacogenetics ranging from genetic principles and the inheritance of complex traits to specific examples of pharmacogenomics in drug therapy. PMID:17332851

  5. [Genomics and personalized medicine].

    PubMed

    Mooser, Vincent

    2014-05-01

    Personalized medicine has a substantial potential to transform the way diseases will be predicted, prevented and treated. The field will greatly benefit from novel DNA sequencing technologies, in particular commoditization of individual whole genome sequencing. This evolution cannot be stopped, and the medical and scientific community, as well as the society at large, have the responsibility to anticipate the expected benefits from this revolution, but also the potential risks associated with it. Massive investments will be needed for the potential of personalized medicine to be realized, and for the field to come to maturity. In particular, a paradigm change in the way clinical research is done is needed. Switzerland and its Western part pro-actively anticipate these changes.

  6. Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

    PubMed Central

    Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

    2015-01-01

    Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and

  7. Personal genomes: no bad news?

    PubMed

    Chadwick, Ruth

    2011-02-01

    Issues in genetics and genomics have been centre stage in Bioethics for much of its history, and have given rise to both negative and positive imagined futures. Ten years after the completion of the Human Genome Project, it is a good time to assess developments. The promise of whole genome sequencing of individuals requires reflection on personalization, genetic determinism, and privacy.

  8. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    PubMed Central

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  9. Genomics and natural products: role of bioinformatics and recent patents.

    PubMed

    Preuss, Charles; Das, Malay K; Pathak, Yashwant V

    2014-01-01

    The post genomic era has promised major breakthroughs in personalized medicine which will improve a patient's health by selecting treatments including diet based on the patient's unique DNA sequence. The post genomic era is allowing scientists and clinicians to examine an individuals' DNA and then recommend the best diet in order to remain healthy and attenuate disease processes which the individual might be predisposed to because of their genetic make-up, e.g., cardiovascular disease. Nutrigenomics and nutrigenetics are related terms to pharmacogenomics and pharmacogenetics with an emphasis on diet or nutrition. There has been an increasing interest in consumers on natural medicines or Nutraceuticals in order to remain healthy. The post genomic era will allow a patient to visit their physician who will screen the patients DNA on a silicon chip. This will indicate which of the patient's genes have polymorphisms, e.g., a single nucleotide polymorphism (SNP) that might lead the patient to be more susceptible to certain diseases and then the physician could prescribe the appropriate dietary supplements to prevent or diminish these potential diseases. Several recently published patents are discussed in the article covering recent developments in the field. PMID:25185982

  10. Genomics and natural products: role of bioinformatics and recent patents.

    PubMed

    Preuss, Charles; Das, Malay K; Pathak, Yashwant V

    2014-01-01

    The post genomic era has promised major breakthroughs in personalized medicine which will improve a patient's health by selecting treatments including diet based on the patient's unique DNA sequence. The post genomic era is allowing scientists and clinicians to examine an individuals' DNA and then recommend the best diet in order to remain healthy and attenuate disease processes which the individual might be predisposed to because of their genetic make-up, e.g., cardiovascular disease. Nutrigenomics and nutrigenetics are related terms to pharmacogenomics and pharmacogenetics with an emphasis on diet or nutrition. There has been an increasing interest in consumers on natural medicines or Nutraceuticals in order to remain healthy. The post genomic era will allow a patient to visit their physician who will screen the patients DNA on a silicon chip. This will indicate which of the patient's genes have polymorphisms, e.g., a single nucleotide polymorphism (SNP) that might lead the patient to be more susceptible to certain diseases and then the physician could prescribe the appropriate dietary supplements to prevent or diminish these potential diseases. Several recently published patents are discussed in the article covering recent developments in the field.

  11. Computational and Bioinformatics Frameworks for Next-Generation Whole Exome and Genome Sequencing

    PubMed Central

    Dolled-Filhart, Marisa P.; Lee, Michael; Ou-yang, Chih-wen; Haraksingh, Rajini Rani; Lin, Jimmy Cheng-Ho

    2013-01-01

    It has become increasingly apparent that one of the major hurdles in the genomic age will be the bioinformatics challenges of next-generation sequencing. We provide an overview of a general framework of bioinformatics analysis. For each of the three stages of (1) alignment, (2) variant calling, and (3) filtering and annotation, we describe the analysis required and survey the different software packages that are used. Furthermore, we discuss possible future developments as data sources grow and highlight opportunities for new bioinformatics tools to be developed. PMID:23365548

  12. Genomics Politics through Space and Time: The Case of Bioinformatics in Brazil.

    PubMed

    Bicudo, Edison

    2016-01-01

    The emergence of scientific disciplines, as well as the policies aimed to steer them, have geographical implications. This becomes visible in areas such as genomics and related fields. In this paper, the relation between scientific evolution, political decisions and geographical configuration is studied. The recent formation of bioinformatics in Brazil is focused on. The study involves an analysis of data collected on the website of CNPq, a funding agency attached to the Ministry of Science and Technology. Furthermore, I conducted fieldwork in four cities, interviewing 15 bioinformaticians. In the history of Brazilian bioinformatics, three periods can be identified. In the first period (1900-1996), bioinformatics was actually absent, but biology research groups were formed which would subsequently explore bioinformatics. The second period (1997-2006) was marked by the emergence of the discipline and geographical concentration of major research groups in the southern part of Brazil. A third period can be pointed to (2007-2014), in which political choices have turned geographical diffusion and institutional equality into a national target. As a consequence of the recent shifts, genomics and bioinformatics researchers have been involved in a debate, some defending the existence of few specialized research and sequencing platforms, whereas others welcoming the constitution of a scientific scenario based on decentralized platforms. I defend an intermediate solution, whereby some places would be selected to be genomics hubs. This would fit the regional diversity of this vast country, in addition to tackling the scientific weaknesses of the northern area.

  13. Genomics Politics through Space and Time: The Case of Bioinformatics in Brazil.

    PubMed

    Bicudo, Edison

    2016-01-01

    The emergence of scientific disciplines, as well as the policies aimed to steer them, have geographical implications. This becomes visible in areas such as genomics and related fields. In this paper, the relation between scientific evolution, political decisions and geographical configuration is studied. The recent formation of bioinformatics in Brazil is focused on. The study involves an analysis of data collected on the website of CNPq, a funding agency attached to the Ministry of Science and Technology. Furthermore, I conducted fieldwork in four cities, interviewing 15 bioinformaticians. In the history of Brazilian bioinformatics, three periods can be identified. In the first period (1900-1996), bioinformatics was actually absent, but biology research groups were formed which would subsequently explore bioinformatics. The second period (1997-2006) was marked by the emergence of the discipline and geographical concentration of major research groups in the southern part of Brazil. A third period can be pointed to (2007-2014), in which political choices have turned geographical diffusion and institutional equality into a national target. As a consequence of the recent shifts, genomics and bioinformatics researchers have been involved in a debate, some defending the existence of few specialized research and sequencing platforms, whereas others welcoming the constitution of a scientific scenario based on decentralized platforms. I defend an intermediate solution, whereby some places would be selected to be genomics hubs. This would fit the regional diversity of this vast country, in addition to tackling the scientific weaknesses of the northern area. PMID:26890397

  14. A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research

    ERIC Educational Resources Information Center

    Campbell, Chad E.; Nehm, Ross H.

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students' knowledge, attitudes, or skills. Although assessments are…

  15. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  16. Silicon Era of Carbon-Based Life: Application of Genomics and Bioinformatics in Crop Stress Research

    PubMed Central

    Li, Man-Wah; Qi, Xinpeng; Ni, Meng; Lam, Hon-Ming

    2013-01-01

    Abiotic and biotic stresses lead to massive reprogramming of different life processes and are the major limiting factors hampering crop productivity. Omics-based research platforms allow for a holistic and comprehensive survey on crop stress responses and hence may bring forth better crop improvement strategies. Since high-throughput approaches generate considerable amounts of data, bioinformatics tools will play an essential role in storing, retrieving, sharing, processing, and analyzing them. Genomic and functional genomic studies in crops still lag far behind similar studies in humans and other animals. In this review, we summarize some useful genomics and bioinformatics resources available to crop scientists. In addition, we also discuss the major challenges and advancements in the “-omics” studies, with an emphasis on their possible impacts on crop stress research and crop improvement. PMID:23759993

  17. Databases, models, and algorithms for functional genomics: a bioinformatics perspective.

    PubMed

    Singh, Gautam B; Singh, Harkirat

    2005-02-01

    A variety of patterns have been observed on the DNA and protein sequences that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns discovered on biological sequences, they are generally cataloged and maintained within internationally shared databases. Furthermore,the variability in a family of observed patterns is often represented using computational models in order to facilitate their search within an uncharacterized biological sequence. As the biological data is comprised of a mosaic of sequence-levels motifs, it is significant to unravel the synergies of macromolecular coordination utilized in cell-specific differential synthesis of proteins. This article provides an overview of the various pattern representation methodologies and the surveys the pattern databases available for use to the molecular biologists. Our aim is to describe the principles behind the computational modeling and analysis techniques utilized in bioinformatics research, with the objective of providing insight necessary to better understand and effectively utilize the available databases and analysis tools. We also provide a detailed review of DNA sequence level patterns responsible for structural conformations within the Scaffold or Matrix Attachment Regions (S/MARs).

  18. Public Access for Teaching Genomics, Proteomics, and Bioinformatics

    ERIC Educational Resources Information Center

    Campbell, A. Malcolm

    2003-01-01

    When the human genome project was conceived, its leaders wanted all researchers to have equal access to the data and associated research tools. Their vision of equal access provides an unprecedented teaching opportunity. Teachers and students have free access to the same databases that researchers are using. Furthermore, the recent movement to…

  19. Meet me halfway: when genomics meets structural bioinformatics.

    PubMed

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function. PMID:21350909

  20. Meet me halfway: when genomics meets structural bioinformatics.

    PubMed

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function.

  1. Tissue Banking, Bioinformatics, and Electronic Medical Records: The Front-End Requirements for Personalized Medicine

    PubMed Central

    Suh, K. Stephen; Sarojini, Sreeja; Youssif, Maher; Nalley, Kip; Milinovikj, Natasha; Elloumi, Fathi; Russell, Steven; Pecora, Andrew; Schecter, Elyssa; Goy, Andre

    2013-01-01

    Personalized medicine promises patient-tailored treatments that enhance patient care and decrease overall treatment costs by focusing on genetics and “-omics” data obtained from patient biospecimens and records to guide therapy choices that generate good clinical outcomes. The approach relies on diagnostic and prognostic use of novel biomarkers discovered through combinations of tissue banking, bioinformatics, and electronic medical records (EMRs). The analytical power of bioinformatic platforms combined with patient clinical data from EMRs can reveal potential biomarkers and clinical phenotypes that allow researchers to develop experimental strategies using selected patient biospecimens stored in tissue banks. For cancer, high-quality biospecimens collected at diagnosis, first relapse, and various treatment stages provide crucial resources for study designs. To enlarge biospecimen collections, patient education regarding the value of specimen donation is vital. One approach for increasing consent is to offer publically available illustrations and game-like engagements demonstrating how wider sample availability facilitates development of novel therapies. The critical value of tissue bank samples, bioinformatics, and EMR in the early stages of the biomarker discovery process for personalized medicine is often overlooked. The data obtained also require cross-disciplinary collaborations to translate experimental results into clinical practice and diagnostic and prognostic use in personalized medicine. PMID:23818899

  2. VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics.

    PubMed

    Megy, Karine; Emrich, Scott J; Lawson, Daniel; Campbell, David; Dialynas, Emmanuel; Hughes, Daniel S T; Koscielny, Gautier; Louis, Christos; Maccallum, Robert M; Redmond, Seth N; Sheehan, Andrew; Topalis, Pantelis; Wilson, Derek

    2012-01-01

    VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community. PMID:22135296

  3. proBAMsuite, a Bioinformatics Framework for Genome-Based Representation and Analysis of Proteomics Data*

    PubMed Central

    Wang, Xiaojing; Slebos, Robbert J. C.; Chambers, Matthew C.; Tabb, David L.; Liebler, Daniel C.; Zhang, Bing

    2016-01-01

    To facilitate genome-based representation and analysis of proteomics data, we developed a new bioinformatics framework, proBAMsuite, in which a central component is the protein BAM (proBAM) file format for organizing peptide spectrum matches (PSMs)1 within the context of the genome. proBAMsuite also includes two R packages, proBAMr and proBAMtools, for generating and analyzing proBAM files, respectively. Applying proBAMsuite to three recently published proteomics datasets, we demonstrated its utility in facilitating efficient genome-based sharing, interpretation, and integration of proteomics data. First, the interpretation of proteomics data is significantly enhanced with the rich genomic annotation information. Second, PSMs can be easily reannotated using user-specified gene annotation schemes and assembled into both protein and gene identifications. Third, using the genome as a common reference, proBAMsuite facilitates seamless proteomics and proteogenomics data integration. Finally, proBAM files can be readily visualized in genome browsers and thus bring proteomics data analysis to a general audience beyond the proteomics community. Results from this study establish proBAMsuite as a useful bioinformatics framework for proteomics and proteogenomics research. PMID:26657539

  4. Mudi, a web tool for identifying mutations by bioinformatics analysis of whole-genome sequence.

    PubMed

    Iida, Naoko; Yamao, Fumiaki; Nakamura, Yasukazu; Iida, Tetsushi

    2014-06-01

    In forward genetics, identification of mutations is a time-consuming and laborious process. Modern whole-genome sequencing, coupled with bioinformatics analysis, has enabled fast and cost-effective mutation identification. However, for many experimental researchers, bioinformatics analysis is still a difficult aspect of whole-genome sequencing. To address this issue, we developed a browser-accessible and easy-to-use bioinformatics tool called Mutation discovery (Mudi; http://naoii.nig.ac.jp/mudi_top.html), which enables 'one-click' identification of causative mutations from whole-genome sequence data. In this study, we optimized Mudi for pooled-linkage analysis aimed at identifying mutants in yeast model systems. After raw sequencing data are uploaded, Mudi performs sequential analysis, including mapping, detection of variant alleles, filtering and removal of background polymorphisms, prioritization, and annotation. In an example study of suppressor mutants of ptr1-1 in the fission yeast Schizosaccharomyces pombe, pooled-linkage analysis with Mudi identified mip1(+) , a component of Target of Rapamycin Complex 1 (TORC1), as a novel component involved in RNA interference (RNAi)-related cell-cycle control. The accessibility of Mudi will accelerate systematic mutation analysis in forward genetics.

  5. Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline

    PubMed Central

    Jiménez-Barrón, Laura T.; O'Rawe, Jason A.; Wu, Yiyang; Yoon, Margaret; Fang, Han; Iossifov, Ivan; Lyon, Gholson J.

    2015-01-01

    Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations. PMID:27148569

  6. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics.

    PubMed

    Nevado, B; Ramos-Onsins, S E; Perez-Enciso, M

    2014-04-01

    Decreasing costs of next-generation sequencing (NGS) experiments have made a wide range of genomic questions open for study with nonmodel organisms. However, experimental designs and analysis of NGS data from less well-known species are challenging because of the lack of genomic resources. In this work, we investigate the performance of alternative experimental designs and bioinformatics approaches in estimating variability and neutrality tests based on the site-frequency-spectrum (SFS) from individual resequencing data. We pay particular attention to challenges faced in the study of nonmodel organisms, in particular the absence of a species-specific reference genome, although phylogenetically close genomes are assumed to be available. We compare the performance of three alternative bioinformatics approaches – genotype calling, genotype–haplotype calling and direct estimation without calling genotypes. We find that relying on genotype calls provides biased estimates of population genetic statistics at low to moderate read depth (2–8X). Genotype–haplotype calling returns more accurate estimates irrespective of the divergence to the reference genome, but requires moderate depth (8–20X). Direct estimation without calling genotypes returns the most accurate estimates of variability and of most SFS tests investigated, including at low read depth (2–4X). Studies without species-specific reference genome should thus aim for low read depth and avoid genotype calling whenever individual genotypes are not essential. Otherwise, aiming for moderate to high depth at the expense of number of individuals, and using genotype–haplotype calling, is recommended. PMID:24795998

  7. CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

    PubMed Central

    2012-01-01

    Background It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. Results Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. Conclusions To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro. PMID:22901030

  8. [Bioinformatics Analysis of Clustered Regularly Interspaced Short Palindromic Repeats in the Genomes of Shigella].

    PubMed

    Wang, Pengfei; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Guo, Xiangjiao; Yang, Haiyan; Xi, Yuanlin

    2015-04-01

    This study was aimed to explore the features of clustered regularly interspaced short palindromic repeats (CRISPR) structures in Shigella by using bioinformatics. We used bioinformatics methods, including BLAST, alignment and RNA structure prediction, to analyze the CRISPR structures of Shigella genomes. The results showed that the CRISPRs existed in the four groups of Shigella, and the flanking sequences of upstream CRISPRs could be classified into the same group with those of the downstream. We also found some relatively conserved palindromic motifs in the leader sequences. Repeat sequences had the same group with corresponding flanking sequences, and could be classified into two different types by their RNA secondary structures, which contain "stem" and "ring". Some spacers were found to homologize with part sequences of plasmids or phages. The study indicated that there were correlations between repeat sequences and flanking sequences, and the repeats might act as a kind of recognition mechanism to mediate the interaction between foreign genetic elements and Cas proteins.

  9. Widening participation would be key in enhancing bioinformatics and genomics research in Africa

    PubMed Central

    Karikari, Thomas K.; Quansah, Emmanuel; Mohamed, Wael M.Y.

    2015-01-01

    Bioinformatics and genome science (BGS) are gradually gaining roots in Africa, contributing to studies that are leading to improved understanding of health, disease, agriculture and food security. While a few African countries have established foundations for research and training in these areas, BGS appear to be limited to only a few institutions in specific African countries. However, improving the disciplines in Africa will require pragmatic efforts to expand training and research partnerships to scientists in yet-unreached institutions. Here, we discuss the need to expand BGS programmes in Africa, and propose mechanisms to do so. PMID:26767163

  10. Bioinformatic Genome Comparisons for Taxonomic and Phylogenetic Assignments Using Aeromonas as a Test Case

    PubMed Central

    Colston, Sophie M.; Fullmer, Matthew S.; Beka, Lidia; Lamy, Brigitte

    2014-01-01

    ABSTRACT Prokaryotic taxonomy is the underpinning of microbiology, as it provides a framework for the proper identification and naming of organisms. The “gold standard” of bacterial species delineation is the overall genome similarity determined by DNA-DNA hybridization (DDH), a technically rigorous yet sometimes variable method that may produce inconsistent results. Improvements in next-generation sequencing have resulted in an upsurge of bacterial genome sequences and bioinformatic tools that compare genomic data, such as average nucleotide identity (ANI), correlation of tetranucleotide frequencies, and the genome-to-genome distance calculator, or in silico DDH (isDDH). Here, we evaluate ANI and isDDH in combination with phylogenetic studies using Aeromonas, a taxonomically challenging genus with many described species and several strains that were reassigned to different species as a test case. We generated improved, high-quality draft genome sequences for 33 Aeromonas strains and combined them with 23 publicly available genomes. ANI and isDDH distances were determined and compared to phylogenies from multilocus sequence analysis of housekeeping genes, ribosomal proteins, and expanded core genes. The expanded core phylogenetic analysis suggested relationships between distant Aeromonas clades that were inconsistent with studies using fewer genes. ANI values of ≥96% and isDDH values of ≥70% consistently grouped genomes originating from strains of the same species together. Our study confirmed known misidentifications, validated the recent revisions in the nomenclature, and revealed that a number of genomes deposited in GenBank are misnamed. In addition, two strains were identified that may represent novel Aeromonas species. PMID:25406383

  11. Bioinformatics visualization and integration with open standards: the Bluejay genomic browser.

    PubMed

    Turinsky, Andrei L; Ah-Seng, Andrew C; Gordon, Paul M K; Stromer, Julie N; Taschuk, Morgan L; Xu, Emily W; Sensen, Christoph W

    2005-01-01

    We have created a new Java-based integrated computational environment for the exploration of genomic data, called Bluejay. The system is capable of using almost any XML file related to genomic data. Non-XML data sources can be accessed via a proxy server. Bluejay has several features, which are new to Bioinformatics, including an unlimited semantic zoom capability, coupled with Scalable Vector Graphics (SVG) outputs; an implementation of the XLink standard, which features access to MAGPIE Genecards as well as any BioMOBY service accessible over the Internet; and the integration of gene chip analysis tools with the functional assignments. The system can be used as a signed web applet, Web Start, and a local stand-alone application, with or without connection to the Internet. It is available free of charge and as open source via http://bluejay.ucalgary.ca. PMID:15972014

  12. Importance of databases of nucleic acids for bioinformatic analysis focused to genomics

    NASA Astrophysics Data System (ADS)

    Jimenez-Gutierrez, L. R.; Barrios-Hernández, C. J.; Pedraza-Ferreira, G. R.; Vera-Cala, L.; Martinez-Perez, F.

    2016-08-01

    Recently, bioinformatics has become a new field of science, indispensable in the analysis of millions of nucleic acids sequences, which are currently deposited in international databases (public or private); these databases contain information of genes, RNA, ORF, proteins, intergenic regions, including entire genomes from some species. The analysis of this information requires computer programs; which were renewed in the use of new mathematical methods, and the introduction of the use of artificial intelligence. In addition to the constant creation of supercomputing units trained to withstand the heavy workload of sequence analysis. However, it is still necessary the innovation on platforms that allow genomic analyses, faster and more effectively, with a technological understanding of all biological processes.

  13. GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics

    PubMed Central

    Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel

    2003-01-01

    Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976

  14. A Critical Analysis of Assessment Quality in Genomics and Bioinformatics Education Research

    PubMed Central

    Campbell, Chad E.; Nehm, Ross H.

    2013-01-01

    The growing importance of genomics and bioinformatics methods and paradigms in biology has been accompanied by an explosion of new curricula and pedagogies. An important question to ask about these educational innovations is whether they are having a meaningful impact on students’ knowledge, attitudes, or skills. Although assessments are necessary tools for answering this question, their outputs are dependent on their quality. Our study 1) reviews the central importance of reliability and construct validity evidence in the development and evaluation of science assessments and 2) examines the extent to which published assessments in genomics and bioinformatics education (GBE) have been developed using such evidence. We identified 95 GBE articles (out of 226) that contained claims of knowledge increases, affective changes, or skill acquisition. We found that 1) the purpose of most of these studies was to assess summative learning gains associated with curricular change at the undergraduate level, and 2) a minority (<10%) of studies provided any reliability or validity evidence, and only one study out of the 95 sampled mentioned both validity and reliability. Our findings raise concerns about the quality of evidence derived from these instruments. We end with recommendations for improving assessment quality in GBE. PMID:24006400

  15. Edge Bioinformatics

    SciTech Connect

    Lo, Chien-Chi

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in a genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen or co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance

  16. Edge Bioinformatics

    2015-08-03

    Edge Bioinformatics is a developmental bioinformatics and data management platform which seeks to supply laboratories with bioinformatics pipelines for analyzing data associated with common samples case goals. Edge Bioinformatics enables sequencing as a solution and forward-deployed situations where human-resources, space, bandwidth, and time are limited. The Edge bioinformatics pipeline was designed based on following USE CASES and specific to illumina sequencing reads. 1. Assay performance adjudication (PCR): Analysis of an existing PCR assay in amore » genomic context, and automated design of a new assay to resolve conflicting results; 2. Clinical presentation with extreme symptoms: Characterization of a known pathogen or co-infection with a. Novel emerging disease outbreak or b. Environmental surveillance« less

  17. Personal genomes, quantitative dynamic omics and personalized medicine

    PubMed Central

    Mias, George I.; Snyder, Michael

    2015-01-01

    The rapid technological developments following the Human Genome Project have made possible the availability of personalized genomes. As the focus now shifts from characterizing genomes to making personalized disease associations, in combination with the availability of other omics technologies, the next big push will be not only to obtain a personalized genome, but to quantitatively follow other omics. This will include transcriptomes, proteomes, metabolomes, antibodyomes, and new emerging technologies, enabling the profiling of thousands of molecular components in individuals. Furthermore, omics profiling performed longitudinally can probe the temporal patterns associated with both molecular changes and associated physiological health and disease states. Such data necessitates the development of computational methodology to not only handle and descriptively assess such data, but also construct quantitative biological models. Here we describe the availability of personal genomes and developing omics technologies that can be brought together for personalized implementations and how these novel integrated approaches may effectively provide a precise personalized medicine that focuses on not only characterization and treatment but ultimately the prevention of disease. PMID:25798291

  18. Basics of Genome Sequence Analysis in Bioinformatics -- its Fundamental Ideas and Problems

    NASA Astrophysics Data System (ADS)

    Suzuki, Tomonori; Miyazaki, Satoru

    2009-02-01

    The genome sequences are one of the most fundamental data among various omics analyses. So far, basic bioinformatics tools have developing to treat genome sequences. First step of genome sequence analysis is to predict or assign "genes" on genome sequences. In the case of Eukaryotes, we can identify genes by use of full length cDNA sequences with local alignment tools such as search, blast and fasta, etc. However, it is difficult to catch mRNAs (transcripts) in Prokaryotes. Therefore, computational prediction for gene identification is first choice to start genome sequence analysis. In this review, we pick up methods for computational gene prediction first. Once genes are predicted, next step is to functions for proteins or RNAs encoded on a gene. Then, how we can define the distance between gene sequences is very important for the further analysis. So, we describe the basics of mathematical concept for gene comparison. And we also introduce our novel concept for biological sequence comparisons for the view point of informational theory. In the post genome era, many researchers are very interested in not only gene functions but also the gene regulations whose information is also on genome sequences. Cis-regulatory elements, however, is too short to find some mathematical rules. Therefore, computationally predicted cis-elements tend to include many false-positives. To reduce the ratio false-positives, we need reliable database of set of cis-regulatory elements called cis-regulatory modules for a gene. So, we are trying to develop the Cis-Regulatory Elements Module Reference Database. In the third section, we introduce you the procedure to construct the Cis-Regulatory Elements Module Reference Database and its user interfaces.

  19. Genomic and personalized medicine: foundations and applications.

    PubMed

    Ginsburg, Geoffrey S; Willard, Huntington F

    2009-12-01

    The last decade has witnessed a steady embrace of genomic and personalized medicine by senior government officials, industry leadership, health care providers, and the public. Genomic medicine, which is the use of information from genomes and their derivatives (RNA, proteins, and metabolites) to guide medical decision making-is a key component of personalized medicine, which is a rapidly advancing field of health care that is informed by each person's unique clinical, genetic, genomic, and environmental information. As medicine begins to embrace genomic tools that enable more precise prediction and treatment disease, which include "whole genome" interrogation of sequence variation, transcription, proteins, and metabolites, the fundamentals of genomic and personalized medicine will require the development, standardization, and integration of several important tools into health systems and clinical workflows. These tools include health risk assessment, family health history, and clinical decision support for complex risk and predictive information. Together with genomic information, these tools will enable a paradigm shift to a comprehensive approach that will identify individual risks and guide clinical management and decision making, all of which form the basis for a more informed and effective approach to patient care. DNA-based risk assessment for common complex disease, molecular signatures for cancer diagnosis and prognosis, and genome-guided therapy and dose selection are just among the few important examples for which genome information has already enabled personalized health care along the continuum from health to disease. In addition, information from individual genomes, which is a fast-moving area of technological development, is spawning a social and information revolution among consumers that will undoubtedly affect health care decision making. Although these and other scientific findings are making their way from the genome to the clinic, the full

  20. [Ethical issues in personal genome research].

    PubMed

    Kato, Kazuto; Minari, Jusaku

    2013-03-01

    The rapid expansion of techniques for studying human genomics has remarkably changed research and practice. It is expected that more progress will be made in the field of medical and biological research owing to the technological advances. Genomics researchers collect human genetic material, including DNA and cells, from a large number of individuals and carry out "personal genome analysis"; as a result, new types of ethical, legal, and social issues (ELSI) have arisen, including issues such as informed consent procedures, data sharing, protection of genetic information, and return of research results. To address these issues, many large research projects have established specialist groups that are devoted to manage ELSI of their research. The guidelines for genomics research set by the government are also expected to be revised accordingly. In this paper, we present an overview of ELSI of personal genome research and discuss necessary measures to tackle these issues.

  1. Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine

    PubMed Central

    Xiao, Wenming; Wu, Leihong; Yavas, Gokhan; Simonyan, Vahan; Ning, Baitang; Hong, Huixiao

    2016-01-01

    -response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly. PMID:27110816

  2. The Translational Genomics Core at Partners Personalized Medicine: Facilitating the Transition of Research towards Personalized Medicine.

    PubMed

    Blau, Ashley; Brown, Alison; Mahanta, Lisa; Amr, Sami S

    2016-01-01

    The Translational Genomics Core (TGC) at Partners Personalized Medicine (PPM) serves as a fee-for-service core laboratory for Partners Healthcare researchers, providing access to technology platforms and analysis pipelines for genomic, transcriptomic, and epigenomic research projects. The interaction of the TGC with various components of PPM provides it with a unique infrastructure that allows for greater IT and bioinformatics opportunities, such as sample tracking and data analysis. The following article describes some of the unique opportunities available to an academic research core operating within PPM, such the ability to develop analysis pipelines with a dedicated bioinformatics team and maintain a flexible Laboratory Information Management System (LIMS) with the support of an internal IT team, as well as the operational challenges encountered to respond to emerging technologies, diverse investigator needs, and high staff turnover. In addition, the implementation and operational role of the TGC in the Partners Biobank genotyping project of over 25,000 samples is presented as an example of core activities working with other components of PPM. PMID:26927185

  3. The Translational Genomics Core at Partners Personalized Medicine: Facilitating the Transition of Research towards Personalized Medicine

    PubMed Central

    Blau, Ashley; Brown, Alison; Mahanta, Lisa; Amr, Sami S.

    2016-01-01

    The Translational Genomics Core (TGC) at Partners Personalized Medicine (PPM) serves as a fee-for-service core laboratory for Partners Healthcare researchers, providing access to technology platforms and analysis pipelines for genomic, transcriptomic, and epigenomic research projects. The interaction of the TGC with various components of PPM provides it with a unique infrastructure that allows for greater IT and bioinformatics opportunities, such as sample tracking and data analysis. The following article describes some of the unique opportunities available to an academic research core operating within PPM, such the ability to develop analysis pipelines with a dedicated bioinformatics team and maintain a flexible Laboratory Information Management System (LIMS) with the support of an internal IT team, as well as the operational challenges encountered to respond to emerging technologies, diverse investigator needs, and high staff turnover. In addition, the implementation and operational role of the TGC in the Partners Biobank genotyping project of over 25,000 samples is presented as an example of core activities working with other components of PPM. PMID:26927185

  4. Computational biology of genome expression and regulation--a review of microarray bioinformatics.

    PubMed

    Wang, Junbai

    2008-01-01

    Microarray technology is being used widely in various biomedical research areas; the corresponding microarray data analysis is an essential step toward the best utilizing of array technologies. Here we review two components of the microarray data analysis: a low level of microarray data analysis that emphasizes the designing, the quality control, and the preprocessing of microarray experiments, then a high level of microarray data analysis that focuses on the domain-specific microarray applications such as tumor classification, biomarker prediction, analyzing array CGH experiments, and reverse engineering of gene expression networks. Additionally, we will review the recent development of building a predictive model in genome expression and regulation studies. This review may help biologists grasp a basic knowledge of microarray bioinformatics as well as its potential impact on the future evolvement of biomedical research fields.

  5. Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

    PubMed Central

    2012-01-01

    Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941

  6. Advantages of genomic complexity: bioinformatics opportunities in microRNA cancer signatures

    PubMed Central

    Stadler, Walter M; Chen, James L

    2011-01-01

    MicroRNAs, small non-coding RNAs, may act as tumor suppressors or oncogenes, and each regulate their own transcription and that of hundreds of genes, often in a tissue-dependent manner. This creates a tightly interwoven network regulating and underlying oncogenesis and cancer biology. Although protein-coding gene signatures and single protein pathway markers have proliferated over the past decade, routine adoption of the former has been hampered by interpretability, reproducibility, and dimensionality, whereas the single molecule–phenotype reductionism of the latter is often overly simplistic to account for complex phenotypes. MicroRNA-derived biomarkers offer a powerful alternative; they have both the flexibility of gene expression signature classifiers and the desirable mechanistic transparency of single protein biomarkers. Furthermore, several advances have recently demonstrated the robust detection of microRNAs from various biofluids, thus providing an additional opportunity for obtaining bioinformatically derived biomarkers to accelerate the identification of individual patients for personalized therapy. PMID:22101905

  7. Empowered genome community: leveraging a bioinformatics platform as a citizen-scientist collaboration tool.

    PubMed

    Wendelsdorf, Katherine; Shah, Sohela

    2015-09-01

    There is on-going effort in the biomedical research community to leverage Next Generation Sequencing (NGS) technology to identify genetic variants that affect our health. The main challenge facing researchers is getting enough samples from individuals either sick or healthy - to be able to reliably identify the few variants that are causal for a phenotype among all other variants typically seen among individuals. At the same time, more and more individuals are having their genome sequenced either out of curiosity or to identify the cause of an illness. These individuals may benefit from of a way to view and understand their data. QIAGEN's Ingenuity Variant Analysis is an online application that allows users with and without extensive bioinformatics training to incorporate information from published experiments, genetic databases, and a variety of statistical models to identify variants, from a long list of candidates, that are most likely causal for a phenotype as well as annotate variants with what is already known about them in the literature and databases. Ingenuity Variant Analysis is also an information sharing platform where users may exchange samples and analyses. The Empowered Genome Community (EGC) is a new program in which QIAGEN is making this on-line tool freely available to any individual who wishes to analyze their own genetic sequence. EGC members are then able to make their data available to other Ingenuity Variant Analysis users to be used in research. Here we present and describe the Empowered Genome Community in detail. We also present a preliminary, proof-of-concept study that utilizes the 200 genomes currently available through the EGC. The goal of this program is to allow individuals to access and understand their own data as well as facilitate citizen-scientist collaborations that can drive research forward and spur quality scientific dialogue in the general public. PMID:27054071

  8. Empowered genome community: leveraging a bioinformatics platform as a citizen-scientist collaboration tool.

    PubMed

    Wendelsdorf, Katherine; Shah, Sohela

    2015-09-01

    There is on-going effort in the biomedical research community to leverage Next Generation Sequencing (NGS) technology to identify genetic variants that affect our health. The main challenge facing researchers is getting enough samples from individuals either sick or healthy - to be able to reliably identify the few variants that are causal for a phenotype among all other variants typically seen among individuals. At the same time, more and more individuals are having their genome sequenced either out of curiosity or to identify the cause of an illness. These individuals may benefit from of a way to view and understand their data. QIAGEN's Ingenuity Variant Analysis is an online application that allows users with and without extensive bioinformatics training to incorporate information from published experiments, genetic databases, and a variety of statistical models to identify variants, from a long list of candidates, that are most likely causal for a phenotype as well as annotate variants with what is already known about them in the literature and databases. Ingenuity Variant Analysis is also an information sharing platform where users may exchange samples and analyses. The Empowered Genome Community (EGC) is a new program in which QIAGEN is making this on-line tool freely available to any individual who wishes to analyze their own genetic sequence. EGC members are then able to make their data available to other Ingenuity Variant Analysis users to be used in research. Here we present and describe the Empowered Genome Community in detail. We also present a preliminary, proof-of-concept study that utilizes the 200 genomes currently available through the EGC. The goal of this program is to allow individuals to access and understand their own data as well as facilitate citizen-scientist collaborations that can drive research forward and spur quality scientific dialogue in the general public.

  9. Genomes, Populations and Diseases: Ethnic Genomics and Personalized Medicine

    PubMed Central

    Stepanov, V.A.

    2010-01-01

    This review discusses the progress of ethnic genetics, the genetics of common diseases, and the concepts of personalized medicine. We show the relationship between the structure of genetic diversity in human populations and the varying frequencies of Mendelian and multifactor diseases. We also examine the population basis of pharmacogenetics and evaluate the effectiveness of pharmacotherapy, along with a review of new achievements and prospects in personalized genomics. PMID:22649660

  10. Accurate and comprehensive sequencing of personal genomes.

    PubMed

    Ajay, Subramanian S; Parker, Stephen C J; Abaan, Hatice Ozel; Fajardo, Karin V Fuentes; Margulies, Elliott H

    2011-09-01

    As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported.

  11. Using Informatics-, Bioinformatics- and Genomics-Based Approaches for the Molecular Surveillance and Detection of Biothreat Agents

    NASA Astrophysics Data System (ADS)

    Seto, Donald

    The convergence and wealth of informatics, bioinformatics and genomics methods and associated resources allow a comprehensive and rapid approach for the surveillance and detection of bacterial and viral organisms. Coupled with the continuing race for the fastest, most cost-efficient and highest-quality DNA sequencing technology, that is, "next generation sequencing", the detection of biological threat agents by `cheaper and faster' means is possible. With the application of improved bioinformatic tools for the understanding of these genomes and for parsing unique pathogen genome signatures, along with `state-of-the-art' informatics which include faster computational methods, equipment and databases, it is feasible to apply new algorithms to biothreat agent detection. Two such methods are high-throughput DNA sequencing-based and resequencing microarray-based identification. These are illustrated and validated by two examples involving human adenoviruses, both from real-world test beds.

  12. Forward Individualized Medicine from Personal Genomes to Interactomes

    PubMed Central

    Zhang, Xiang; Kuivenhoven, Jan A.; Groen, Albert K.

    2015-01-01

    When considering the variation in the genome, transcriptome, proteome and metabolome, and their interaction with the environment, every individual can be rightfully considered as a unique biological entity. Individualized medicine promises to take this uniqueness into account to optimize disease treatment and thereby improve health benefits for every patient. The success of individualized medicine relies on a precise understanding of the genotype-phenotype relationship. Although omics technologies advance rapidly, there are several challenges that need to be overcome: Next generation sequencing can efficiently decipher genomic sequences, epigenetic changes, and transcriptomic variation in patients, but it does not automatically indicate how or whether the identified variation will cause pathological changes. This is likely due to the inability to account for (1) the consequences of gene-gene and gene-environment interactions, and (2) (post)transcriptional as well as (post)translational processes that eventually determine the concentration of key metabolites. The technologies to accurately measure changes in these latter layers are still under development, and such measurements in humans are also mainly restricted to blood and circulating cells. Despite these challenges, it is already possible to track dynamic changes in the human interactome in healthy and diseased states by using the integration of multi-omics data. In this review, we evaluate the potential value of current major bioinformatics and systems biology-based approaches, including genome wide association studies, epigenetics, gene regulatory and protein-protein interaction networks, and genome-scale metabolic modeling. Moreover, we address the question whether integrative analysis of personal multi-omics data will help understanding of personal genotype-phenotype relationships. PMID:26696898

  13. The Human Genome Project, and recent advances in personalized genomics.

    PubMed

    Wilson, Brenda J; Nicholls, Stuart G

    2015-01-01

    The language of "personalized medicine" and "personal genomics" has now entered the common lexicon. The idea of personalized medicine is the integration of genomic risk assessment alongside other clinical investigations. Consistent with this approach, testing is delivered by health care professionals who are not medical geneticists, and where results represent risks, as opposed to clinical diagnosis of disease, to be interpreted alongside the entirety of a patient's health and medical data. In this review we consider the evidence concerning the application of such personalized genomics within the context of population screening, and potential implications that arise from this. We highlight two general approaches which illustrate potential uses of genomic information in screening. The first is a narrowly targeted approach in which genetic profiling is linked with standard population-based screening for diseases; the second is a broader targeting of variants associated with multiple single gene disorders, performed opportunistically on patients being investigated for unrelated conditions. In doing so we consider the organization and evaluation of tests and services, the challenge of interpretation with less targeted testing, professional confidence, barriers in practice, and education needs. We conclude by discussing several issues pertinent to health policy, namely: avoiding the conflation of genetics with biological determinism, resisting the "technological imperative", due consideration of the organization of screening services, the need for professional education, as well as informed decision making and public understanding.

  14. Personal genomes in progress: from the human genome project to the personal genome project.

    PubMed

    Lunshof, Jeantine E; Bobe, Jason; Aach, John; Angrist, Misha; Thakuria, Joseph V; Vorhaus, Daniel B; Hoehe, Margret R; Church, George M

    2010-01-01

    The cost of a diploid human genome sequence has dropped from about $70M to $2000 since 2007--even as the standards for redundancy have increased from 7x to 40x in order to improve call rates. Coupled with the low return on investment for common single-nucleotide polylmorphisms, this has caused a significant rise in interest in correlating genome sequences with comprehensive environmental and trait data (GET). The cost of electronic health records, imaging, and microbial, immunological, and behavioral data are also dropping quickly. Sharing such integrated GET datasets and their interpretations with a diversity of researchers and research subjects highlights the need for informed-consent models capable of addressing novel privacy and other issues, as well as for flexible data-sharing resources that make materials and data available with minimum restrictions on use. This article examines the Personal Genome Project's effort to develop a GET database as a public genomics resource broadly accessible to both researchers and research participants, while pursuing the highest standards in research ethics.

  15. Personal genomes in progress: from the Human Genome Project to the Personal Genome Project

    PubMed Central

    Lunshof (Co-first author), Jeantine E.; Bobe (Co-first author), Jason; Aach, John; Angrist, Misha; V. Thakuria, Joseph; Vorhaus, Daniel B.; R. Hoehe (Co-last author), Margret; Church (Co-last author), George M.

    2010-01-01

    The cost of a diploid human genome sequence has dropped from about $70M to $2000 since 2007- even as the standards for redundancy have increased from 7x to 40x in order to improve call rates. Coupled with the low return on investment for common single-nucleotide polymorphisms, this has caused a significant rise in interest in correlating genome sequences with comprehensive environmental and trait data (GET). The cost of electronic health records, imaging, and microbial, immunological, and behavioral data are also dropping quickly. Sharing such integrated GET datasets and their interpretations with a diversity of researchers and research subjects highlights the need for informed-consent models capable of addressing novel privacy and other issues, as well as for flexible data-sharing resources that make materials and data available with minimum restrictions on use. This article examines the Personal Genome Project's effort to develop a GET database as a public genomics resource broadly accessible to both researchers and research participants, while pursuing the highest standards in research ethics. PMID:20373666

  16. Elucidating ANTs in worms using genomic and bioinformatic tools--biotechnological prospects?

    PubMed

    Hu, Min; Zhong, Weiwei; Campbell, Bronwyn E; Sternberg, Paul W; Pellegrino, Mark W; Gasser, Robin B

    2010-01-01

    Adenine nucleotide translocators (ANTs) belong to the mitochondrial carrier family (MCF) of proteins. ATP production and consumption are tightly linked to ANTs, the kinetics of which have been proposed to play a key regulatory role in mitochondrial oxidative phosphorylation. ANTs are also recognized as a central component of the mitochondrial permeability transition pore associated with apoptosis. Although ANTs have been investigated in a range of vertebrates, including human, mouse and cattle, and invertebrates, such as Drosophila melanogaster (vinegar fly), Saccharomyces cerevisiae (yeast) and Caenorhabditis elegans (free-living nematode), there has been a void of information on these molecules for parasitic nematodes of socio-economic importance. Exploring ANTs in nematodes has the potential lead to a better understanding of their fundamental roles in key biological pathways and might provide an avenue for the identification of targets for the rational design of nematocidal drugs. In the present article, we describe the discovery of an ANT from Haemonchus contortus (one of the most economically important parasitic nematodes of sheep and goats), conduct a comparative analysis of key ANTs and their genes (particularly ant-1.1) in nematodes and other organisms, predict the functional roles utilizing a combined genomic-bioinformatic approach and propose ANTs and associated molecules as possible drug targets, with the potential for biotechnological outcomes. PMID:19770033

  17. The Human Genome Project, and recent advances in personalized genomics

    PubMed Central

    Wilson, Brenda J; Nicholls, Stuart G

    2015-01-01

    The language of “personalized medicine” and “personal genomics” has now entered the common lexicon. The idea of personalized medicine is the integration of genomic risk assessment alongside other clinical investigations. Consistent with this approach, testing is delivered by health care professionals who are not medical geneticists, and where results represent risks, as opposed to clinical diagnosis of disease, to be interpreted alongside the entirety of a patient’s health and medical data. In this review we consider the evidence concerning the application of such personalized genomics within the context of population screening, and potential implications that arise from this. We highlight two general approaches which illustrate potential uses of genomic information in screening. The first is a narrowly targeted approach in which genetic profiling is linked with standard population-based screening for diseases; the second is a broader targeting of variants associated with multiple single gene disorders, performed opportunistically on patients being investigated for unrelated conditions. In doing so we consider the organization and evaluation of tests and services, the challenge of interpretation with less targeted testing, professional confidence, barriers in practice, and education needs. We conclude by discussing several issues pertinent to health policy, namely: avoiding the conflation of genetics with biological determinism, resisting the “technological imperative”, due consideration of the organization of screening services, the need for professional education, as well as informed decision making and public understanding. PMID:25733939

  18. Whole genome identification of Mycobacterium tuberculosis vaccine candidates by comprehensive data mining and bioinformatic analyses

    PubMed Central

    Zvi, Anat; Ariel, Naomi; Fulkerson, John; Sadoff, Jerald C; Shafferman, Avigdor

    2008-01-01

    Background Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), infects ~8 million annually culminating in ~2 million deaths. Moreover, about one third of the population is latently infected, 10% of which develop disease during lifetime. Current approved prophylactic TB vaccines (BCG and derivatives thereof) are of variable efficiency in adult protection against pulmonary TB (0%–80%), and directed essentially against early phase infection. Methods A genome-scale dataset was constructed by analyzing published data of: (1) global gene expression studies under conditions which simulate intra-macrophage stress, dormancy, persistence and/or reactivation; (2) cellular and humoral immunity, and vaccine potential. This information was compiled along with revised annotation/bioinformatic characterization of selected gene products and in silico mapping of T-cell epitopes. Protocols for scoring, ranking and prioritization of the antigens were developed and applied. Results Cross-matching of literature and in silico-derived data, in conjunction with the prioritization scheme and biological rationale, allowed for selection of 189 putative vaccine candidates from the entire genome. Within the 189 set, the relative distribution of antigens in 3 functional categories differs significantly from their distribution in the whole genome, with reduction in the Conserved hypothetical category (due to improved annotation) and enrichment in Lipid and in Virulence categories. Other prominent representatives in the 189 set are the PE/PPE proteins; iron sequestration, nitroreductases and proteases, all within the Intermediary metabolism and respiration category; ESX secretion systems, resuscitation promoting factors and lipoproteins, all within the Cell wall category. Application of a ranking scheme based on qualitative and quantitative scores, resulted in a list of 45 best-scoring antigens, of which: 74% belong to the dormancy/reactivation/resuscitation classes; 30% belong

  19. Optimal Drug Prediction from Personal Genomics Profiles

    PubMed Central

    Sheng, Jianting; Li, Fuhai; Wong, Stephen T.C.

    2015-01-01

    Cancer patients often show heterogeneous drug responses such that only a small subset of patients is sensitive to a given anti-cancer drug. With the availability of large-scale genomic profiling via next generation sequencing (NGS), it is now economically feasible to profile the whole transcriptome and genome of individual patients in order to identify their unique genetic mutations and differentially expressed genes, which are believed to be responsible for heterogeneous drug responses. Although subtyping analysis has identified patient subgroups sharing common biomarkers, there is no effective method to predict the drug response of individual patients precisely and reliably. Herein, we propose a novel computational algorithm to predict the drug response of individual patients based on personal genomic profiles, as well as pharmacogenomic and drug sensitivity data. Specifically, more than 600 cancer cell lines (viewed as individual patients) across over 50 types of cancers and their responses to 75 drugs were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC) database. The drug-specific sensitivity signatures were determined from the changes in genomic profiles of individual cell lines in response to a specific drug. The optimal drugs for individual cell lines were predicted by integrating the votes from other cell lines. The experimental results show that the proposed drug prediction algorithm can be used to improve greatly the reliability of finding optimal drugs for individual patients and will thus form a key component in the precision medicine infrastructure for oncology care. PMID:25781964

  20. Revisiting Respect for Persons in Genomic Research

    PubMed Central

    Mathews, Debra J.H.; Jamal, Leila

    2014-01-01

    The risks and benefits of research using large databases of personal information are evolving in an era of ubiquitous, internet-based data exchange. In addition, information technology has facilitated a shift in the relationship between individuals and their personal data, enabling increased individual control over how (and how much) personal data are used in research, and by whom. This shift in control has created new opportunities to engage members of the public as partners in the research enterprise on more equal and transparent terms. Here, we consider how some of the technological advances driving and paralleling developments in genomics can also be used to supplement the practice of informed consent with other strategies to ensure that the research process as a whole honors the notion of respect for persons upon which human research subjects protections are premised. Further, we suggest that technological advances can help the research enterprise achieve a more thoroughgoing respect for persons than was possible when current policies governing human subject research were developed. Questions remain about the best way to revise policy to accommodate these changes. PMID:24705284

  1. 2010 Translational bioinformatics year in review

    PubMed Central

    Miller, Katharine S

    2011-01-01

    A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future. PMID:21672905

  2. Ethical Considerations Regarding Classroom Use of Personal Genomic Information

    PubMed Central

    Parker, Lisa S.; Grubs, Robin

    2014-01-01

    Rapidly decreasing costs of genetic technologies—especially next-generation sequencing—and intensifying need for a clinical workforce trained in genomic medicine have increased interest in having students use personal genomic information to motivate and enhance genomics education. Numerous ethical issues attend classroom/pedagogical use of students’ personal genomic information, including their informed decision to participate, pressures to participate, privacy concerns, and psychosocial sequelae of learning genomic information. This paper addresses these issues, advocates explicit discussion of these issues to cultivate students’ ethical reasoning skills, suggests ways to mitigate potential harms, and recommends collection of ethically relevant data regarding pedagogical use of personal genomic information. PMID:25574277

  3. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease.

    PubMed

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu

    2016-06-01

    MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources.

  4. MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease.

    PubMed

    Shen, Lishuang; Diroma, Maria Angela; Gonzalez, Michael; Navarro-Gomez, Daniel; Leipzig, Jeremy; Lott, Marie T; van Oven, Mannis; Wallace, Douglas C; Muraresku, Colleen Clarke; Zolkipli-Cunningham, Zarazuela; Chinnery, Patrick F; Attimonelli, Marcella; Zuchner, Stephan; Falk, Marni J; Gai, Xiaowu

    2016-06-01

    MSeqDR is the Mitochondrial Disease Sequence Data Resource, a centralized and comprehensive genome and phenome bioinformatics resource built by the mitochondrial disease community to facilitate clinical diagnosis and research investigations of individual patient phenotypes, genomes, genes, and variants. A central Web portal (https://mseqdr.org) integrates community knowledge from expert-curated databases with genomic and phenotype data shared by clinicians and researchers. MSeqDR also functions as a centralized application server for Web-based tools to analyze data across both mitochondrial and nuclear DNA, including investigator-driven whole exome or genome dataset analyses through MSeqDR-Genesis. MSeqDR-GBrowse genome browser supports interactive genomic data exploration and visualization with custom tracks relevant to mtDNA variation and mitochondrial disease. MSeqDR-LSDB is a locus-specific database that currently manages 178 mitochondrial diseases, 1,363 genes associated with mitochondrial biology or disease, and 3,711 pathogenic variants in those genes. MSeqDR Disease Portal allows hierarchical tree-style disease exploration to evaluate their unique descriptions, phenotypes, and causative variants. Automated genomic data submission tools are provided that capture ClinVar compliant variant annotations. PhenoTips will be used for phenotypic data submission on deidentified patients using human phenotype ontology terminology. The development of a dynamic informed patient consent process to guide data access is underway to realize the full potential of these resources. PMID:26919060

  5. Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes

    PubMed Central

    Rashid, Mamunur; Robles-Espinoza, Carla Daniela; Rust, Alistair G.; Adams, David J.

    2013-01-01

    Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application. Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/ Contact: da1@sanger.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:23803469

  6. Genomic expression profiling and bioinformatics analysis on diabetic nephrology with ginsenoside Rg3

    PubMed Central

    Wang, Juan; Cui, Chunli; Fu, Li; Xiao, Zili; Xie, Nanzi; Liu, Yang; Yu, Lu; Wang, Haifeng; Luo, Bangzhen

    2016-01-01

    Diabetic nephropathy (DN), a common diabetes-related complication, is the leading cause of progressive chronic kidney disease (CKD) and end-stage renal disease. Despite the rapid development in the treatment of DN, currently available therapies used in early DN cannot prevent progressive CKD. The exact pathogenic mechanisms and the molecular events underlying DN development remain unclear. Ginsenoside Rg3 is a herbal medicine with numerous pharmacological effects. To gain a greater understanding of the molecular mechanism and signaling pathway underlying the effect of ginsenoside Rg3 in DN therapy, an RNA sequencing approach was performed to screen differential gene expression in a rat model of DN treated with ginsenoside Rg3. A combined bioinformatics analysis was then conducted to obtain insights into the underlying molecular mechanisms of the disease development, in order to identify potential novel targets for the treatment of DN. Six Sprague-Dawley male rats were randomly divided into 3 groups: Normal control group, DN group and ginsenoside-Rg3 treatment group, with two rats in each group. RNA sequencing was adopted for transcriptome profiling of cells from the renal cortex of DN rat model. Differentially expressed genes were screened out. Cluster analysis, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were used to analyze the differentially expressed genes. In total, 78 differentially expressed genes in the DN control group were identified when compared with the normal control group, of which 52 genes were upregulated and 26 genes were downregulated. Differential expression of 43 genes was observed in the ginsenoside-Rg3 treatment group when compared with the DN control group, consisting of 10 upregulated genes and 33 downregulated genes. Notably, 21 that were downregulated in the DN control group compared with the control were then shown to be upregulated in the ginsenoside-Rg3 treatment group compared with the DN

  7. Genomic Discoveries and Personalized Medicine in Neurological Diseases

    PubMed Central

    Zhang, Li; Hong, Huixiao

    2015-01-01

    In the past decades, we have witnessed dramatic changes in clinical diagnoses and treatments due to the revolutions of genomics and personalized medicine. Undoubtedly we also met many challenges when we use those advanced technologies in drug discovery and development. In this review, we describe when genomic information is applied in personal healthcare in general. We illustrate some case examples of genomic discoveries and promising personalized medicine applications in the area of neurological disease particular. Available data suggest that individual genomics can be applied to better treat patients in the near future. PMID:26690205

  8. Integrated Bioinformatics, Environmental Epidemiologic and Genomic Approaches to Identify Environmental and Molecular Links between Endometriosis and Breast Cancer

    PubMed Central

    Roy, Deodutta; Morgan, Marisa; Yoo, Changwon; Deoraj, Alok; Roy, Sandhya; Yadav, Vijay Kumar; Garoub, Mohannad; Assaggaf, Hamza; Doke, Mayur

    2015-01-01

    We present a combined environmental epidemiologic, genomic, and bioinformatics approach to identify: exposure of environmental chemicals with estrogenic activity; epidemiologic association between endocrine disrupting chemical (EDC) and health effects, such as, breast cancer or endometriosis; and gene-EDC interactions and disease associations. Human exposure measurement and modeling confirmed estrogenic activity of three selected class of environmental chemicals, polychlorinated biphenyls (PCBs), bisphenols (BPs), and phthalates. Meta-analysis showed that PCBs exposure, not Bisphenol A (BPA) and phthalates, increased the summary odds ratio for breast cancer and endometriosis. Bioinformatics analysis of gene-EDC interactions and disease associations identified several hundred genes that were altered by exposure to PCBs, phthalate or BPA. EDCs-modified genes in breast neoplasms and endometriosis are part of steroid hormone signaling and inflammation pathways. All three EDCs–PCB 153, phthalates, and BPA influenced five common genes—CYP19A1, EGFR, ESR2, FOS, and IGF1—in breast cancer as well as in endometriosis. These genes are environmentally and estrogen responsive, altered in human breast and uterine tumors and endometriosis lesions, and part of Mitogen Activated Protein Kinase (MAPK) signaling pathways in cancer. Our findings suggest that breast cancer and endometriosis share some common environmental and molecular risk factors. PMID:26512648

  9. Integrated Bioinformatics, Environmental Epidemiologic and Genomic Approaches to Identify Environmental and Molecular Links between Endometriosis and Breast Cancer.

    PubMed

    Roy, Deodutta; Morgan, Marisa; Yoo, Changwon; Deoraj, Alok; Roy, Sandhya; Yadav, Vijay Kumar; Garoub, Mohannad; Assaggaf, Hamza; Doke, Mayur

    2015-01-01

    We present a combined environmental epidemiologic, genomic, and bioinformatics approach to identify: exposure of environmental chemicals with estrogenic activity; epidemiologic association between endocrine disrupting chemical (EDC) and health effects, such as, breast cancer or endometriosis; and gene-EDC interactions and disease associations. Human exposure measurement and modeling confirmed estrogenic activity of three selected class of environmental chemicals, polychlorinated biphenyls (PCBs), bisphenols (BPs), and phthalates. Meta-analysis showed that PCBs exposure, not Bisphenol A (BPA) and phthalates, increased the summary odds ratio for breast cancer and endometriosis. Bioinformatics analysis of gene-EDC interactions and disease associations identified several hundred genes that were altered by exposure to PCBs, phthalate or BPA. EDCs-modified genes in breast neoplasms and endometriosis are part of steroid hormone signaling and inflammation pathways. All three EDCs-PCB 153, phthalates, and BPA influenced five common genes-CYP19A1, EGFR, ESR2, FOS, and IGF1-in breast cancer as well as in endometriosis. These genes are environmentally and estrogen responsive, altered in human breast and uterine tumors and endometriosis lesions, and part of Mitogen Activated Protein Kinase (MAPK) signaling pathways in cancer. Our findings suggest that breast cancer and endometriosis share some common environmental and molecular risk factors. PMID:26512648

  10. Integrated Bioinformatics, Environmental Epidemiologic and Genomic Approaches to Identify Environmental and Molecular Links between Endometriosis and Breast Cancer.

    PubMed

    Roy, Deodutta; Morgan, Marisa; Yoo, Changwon; Deoraj, Alok; Roy, Sandhya; Yadav, Vijay Kumar; Garoub, Mohannad; Assaggaf, Hamza; Doke, Mayur

    2015-10-23

    We present a combined environmental epidemiologic, genomic, and bioinformatics approach to identify: exposure of environmental chemicals with estrogenic activity; epidemiologic association between endocrine disrupting chemical (EDC) and health effects, such as, breast cancer or endometriosis; and gene-EDC interactions and disease associations. Human exposure measurement and modeling confirmed estrogenic activity of three selected class of environmental chemicals, polychlorinated biphenyls (PCBs), bisphenols (BPs), and phthalates. Meta-analysis showed that PCBs exposure, not Bisphenol A (BPA) and phthalates, increased the summary odds ratio for breast cancer and endometriosis. Bioinformatics analysis of gene-EDC interactions and disease associations identified several hundred genes that were altered by exposure to PCBs, phthalate or BPA. EDCs-modified genes in breast neoplasms and endometriosis are part of steroid hormone signaling and inflammation pathways. All three EDCs-PCB 153, phthalates, and BPA influenced five common genes-CYP19A1, EGFR, ESR2, FOS, and IGF1-in breast cancer as well as in endometriosis. These genes are environmentally and estrogen responsive, altered in human breast and uterine tumors and endometriosis lesions, and part of Mitogen Activated Protein Kinase (MAPK) signaling pathways in cancer. Our findings suggest that breast cancer and endometriosis share some common environmental and molecular risk factors.

  11. From genomic landscapes to personalized cancer management-is there a roadmap?

    PubMed

    Swanton, Charles; Caldas, Carlos

    2010-10-01

    Despite rapid progress in annotating the human genome, progress in biomarker discovery has been limited, in part, due to the restricted adoption of biomarker analysis in clinical trials. In this short review we present a roadmap to drive progress in the field of personalized cancer management and patient stratification. We suggest that improved understanding of disease biology and drug response in advance of clinical trial design would enable novel biomarkers to be identified and prospectively evaluated during early phase trials; there will also be value in banked material from completed clinical trials to identify and validate biomarkers. Such progress requires standardized tissue collection protocols, novel bioinformatics strategies integrated with functional genomics analysis, and next generation sequencing technologies. We argue that the failure to adopt these methods rapidly into clinical trial design will increase late stage drug attrition, waste trial resources, and risk patient harm within unselected cohorts.

  12. Challenges of web-based personal genomic data sharing.

    PubMed

    Shabani, Mahsa; Borry, Pascal

    2015-01-01

    In order to study the relationship between genes and diseases, the increasing availability and sharing of phenotypic and genotypic data have been promoted as an imperative within the scientific community. In parallel with data sharing practices by clinicians and researchers, recent initiatives have been observed in which individuals are sharing personal genomic data. The involvement of individuals in such initiatives is facilitated by the increased accessibility of personal genomic data, offered by private test providers along with availability of online networks. Personal webpages and on-line data sharing platforms such as Consent to Research (Portable Legal Consent), Free the Data, and Genomes Unzipped are being utilized to host and share genotypes, electronic health records and family history uploaded by individuals. Although personal genomic data sharing initiatives vary in nature, the emphasis on the individuals' control on their data in order to benefit research and ultimately health care has seen as a key theme across these initiatives. In line with the growing practice of personal genomic data sharing, this paper aims to shed light on the potential challenges surrounding these initiatives. As in the course of these initiatives individuals are solicited to individually balance the risks and benefits of sharing their genomic data, their awareness of the implications of personal genomic data sharing for themselves and their family members is a necessity. Furthermore, given the sensitivity of genomic data and the controversies around their complete de-identifiability, potential privacy risks and harms originating from unintended uses of data have to be taken into consideration.

  13. New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome.

    PubMed

    Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

    2015-01-01

    Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.

  14. New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome

    PubMed Central

    Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton

    2015-01-01

    Abstract Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of “domestication” of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci. PMID:25853282

  15. A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data

    PubMed Central

    Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

    2014-01-01

    With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data). PMID:24804244

  16. Personal utility in genomic testing: is there such a thing?

    PubMed

    Bunnik, Eline M; Janssens, A Cecile J W; Schermer, Maartje H N

    2015-04-01

    In ethical and regulatory discussions on new applications of genomic testing technologies, the notion of 'personal utility' has been mentioned repeatedly. It has been used to justify direct access to commercially offered genomic testing or feedback of individual research results to research or biobank participants. Sometimes research participants or consumers claim a right to genomic information with an appeal to personal utility. As of yet, no systematic account of the umbrella notion of personal utility has been given. This paper offers a definition of personal utility that places it in the middle of the spectrum between clinical utility and personal perceptions of utility, and that acknowledges its normative charge. The paper discusses two perspectives on personal utility, the healthcare perspective and the consumer perspective, and argues that these are too narrow and too wide, respectively. Instead, it proposes a normative definition of personal utility that postulates information and potential use as necessary conditions of utility. This definition entails that perceived utility does not equal personal utility, and that expert judgment may be necessary to help determine whether a genomic test can have personal utility for someone. Two examples of genomic tests are presented to illustrate the discrepancies between perceived utility and our proposed definition of personal utility. The paper concludes that while there is room for the notion of personal utility in the ethical evaluation and regulation of genomic tests, the justificatory role of personal utility is not unlimited. For in the absence of clinical validity and reasonable potential use of information, there is no personal utility.

  17. WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

    PubMed Central

    2010-01-01

    Background An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. Methods This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. Results A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. Conclusion WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. PMID:21210985

  18. A bioinformatics approach to reanalyze the genome annotation of kinetoplastid protozoan parasite Leishmania donovani.

    PubMed

    Pawar, Harsh; Kulkarni, Aditi; Dixit, Tanwi; Chaphekar, Deepa; Patole, Milind S

    2014-12-01

    Leishmania donovani is a kinetoplastid protozoan parasite which causes the fatal disease visceral leishmaniasis in humans. Genome sequencing of L. donovani revealed information about the arrangement of genes and genome architecture. After curation of the genome sequence, many genes in L. donovani were assigned as truncated or "partial" genes by the genome sequencing group. In the present study, we have carried out an extensive analysis and attempted to improve the gene models of these partial genes. Our analysis resulted in the identification of 308 partial genes in L. donovani, which were further categorized as C-terminal extensions, joining of genes, tandemly repeated paralogs and wrong chromosomal assignments. We have analyzed each of these genes from these categories and have improved the annotation of existing gene models in L. donovani. Some of these corrections have been confirmed by mass spectrometry derived peptide data from our previous comparative proteogenomics study in L. donovani.

  19. Personal Genomic Information Management and Personalized Medicine: Challenges, Current Solutions, and Roles of HIM Professionals

    PubMed Central

    Alzu'bi, Amal; Zhou, Leming; Watzlaf, Valerie

    2014-01-01

    In recent years, the term personalized medicine has received more and more attention in the field of healthcare. The increasing use of this term is closely related to the astonishing advancement in DNA sequencing technologies and other high-throughput biotechnologies. A large amount of personal genomic data can be generated by these technologies in a short time. Consequently, the needs for managing, analyzing, and interpreting these personal genomic data to facilitate personalized care are escalated. In this article, we discuss the challenges for implementing genomics-based personalized medicine in healthcare, current solutions to these challenges, and the roles of health information management (HIM) professionals in genomics-based personalized medicine. PMID:24808804

  20. Overview of personalized medicine in the disease genomic era.

    PubMed

    Hong, Kyung-Won; Oh, Bermseok

    2010-10-01

    Sir William Osler (1849-1919) recognized that "variability is the law of life, and as no two faces are the same, so no two bodies are alike, and no two individuals react alike and behave alike under the abnormal conditions we know as disease". Accordingly, the traditional methods of medicine are not always best for all patients. Over the last decade, the study of genomes and their derivatives (RNA, protein and metabolite) has rapidly advanced to the point that genomic research now serves as the basis for many medical decisions and public health initiatives. Genomic tools such as sequence variation, transcription and, more recently, personal genome sequencing enable the precise prediction and treatment of disease. At present, DNA-based risk assessment for common complex diseases, application of molecular signatures for cancer diagnosis and prognosis, genome-guided therapy, and dose selection of therapeutic drugs are the important issues in personalized medicine. In order to make personalized medicine effective, these genomic techniques must be standardized and integrated into health systems and clinical workflow. In addition, full application of personalized or genomic medicine requires dramatic changes in regulatory and reimbursement policies as well as legislative protection related to privacy. This review aims to provide a general overview of these topics in the field of personalized medicine.

  1. The personal genome browser: visualizing functions of genetic variants.

    PubMed

    Juan, Liran; Teng, Mingxiang; Zang, Tianyi; Hao, Yafeng; Wang, Zhenxing; Yan, Chengwu; Liu, Yongzhuang; Li, Jie; Zhang, Tianjiao; Wang, Yadong

    2014-07-01

    Advances in high-throughput sequencing technologies have brought us into the individual genome era. Projects such as the 1000 Genomes Project have led the individual genome sequencing to become more and more popular. How to visualize, analyse and annotate individual genomes with knowledge bases to support genome studies and personalized healthcare is still a big challenge. The Personal Genome Browser (PGB) is developed to provide comprehensive functional annotation and visualization for individual genomes based on the genetic-molecular-phenotypic model. Investigators can easily view individual genetic variants, such as single nucleotide variants (SNVs), INDELs and structural variations (SVs), as well as genomic features and phenotypes associated to the individual genetic variants. The PGB especially highlights potential functional variants using the PGB built-in method or SIFT/PolyPhen2 scores. Moreover, the functional risks of genes could be evaluated by scanning individual genetic variants on the whole genome, a chromosome, or a cytoband based on functional implications of the variants. Investigators can then navigate to high risk genes on the scanned individual genome. The PGB accepts Variant Call Format (VCF) and Genetic Variation Format (GVF) files as the input. The functional annotation of input individual genome variants can be visualized in real time by well-defined symbols and shapes. The PGB is available at http://www.pgbrowser.org/. PMID:24799434

  2. Atlas2 Cloud: a framework for personal genome analysis in the cloud

    PubMed Central

    2012-01-01

    Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663

  3. Bioinformatic genome comparisons for taxonomic and phylogenic assignments using Aeromonas as a test case

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Prokaryotic taxonomy is the underpinning of microbiology, providing a framework for the proper identification and naming of organisms. The 'gold standard' of bacterial species delineation is the overall genome similarity as determined by DNA-DNA hybridization (DDH), a technically rigorous yet someti...

  4. Genome-scale analysis of replication timing: from bench to bioinformatics.

    PubMed

    Ryba, Tyrone; Battaglia, Dana; Pope, Benjamin D; Hiratani, Ichiro; Gilbert, David M

    2011-06-01

    Replication timing profiles are cell type-specific and reflect genome organization changes during differentiation. In this protocol, we describe how to analyze genome-wide replication timing (RT) in mammalian cells. Asynchronously cycling cells are pulse labeled with the nucleotide analog 5-bromo-2-deoxyuridine (BrdU) and sorted into S-phase fractions on the basis of DNA content using flow cytometry. BrdU-labeled DNA from each fraction is immunoprecipitated, amplified, differentially labeled and co-hybridized to a whole-genome comparative genomic hybridization microarray, which is currently more cost effective than high-throughput sequencing and equally capable of resolving features at the biologically relevant level of tens to hundreds of kilobases. We also present a guide to analyzing the resulting data sets based on methods we use routinely. Subjects include normalization, scaling and data quality measures, LOESS (local polynomial) smoothing of RT values, segmentation of data into domains and assignment of timing values to gene promoters. Finally, we cover clustering methods and means to relate changes in the replication program to gene expression and other genetic and epigenetic data sets. Some experience with R or similar programming languages is assumed. All together, the protocol takes ∼3 weeks per batch of samples.

  5. Teaching Synthetic Biology, Bioinformatics and Engineering to Undergraduates: The Interdisciplinary Build-a-Genome Course

    PubMed Central

    Dymond, Jessica S.; Scheifele, Lisa Z.; Richardson, Sarah; Lee, Pablo; Chandrasegaran, Srinivasan; Bader, Joel S.; Boeke, Jef D.

    2009-01-01

    A major challenge in undergraduate life science curricula is the continual evaluation and development of courses that reflect the constantly shifting face of contemporary biological research. Synthetic biology offers an excellent framework within which students may participate in cutting-edge interdisciplinary research and is therefore an attractive addition to the undergraduate biology curriculum. This new discipline offers the promise of a deeper understanding of gene function, gene order, and chromosome structure through the de novo synthesis of genetic information, much as synthetic approaches informed organic chemistry. While considerable progress has been achieved in the synthesis of entire viral and prokaryotic genomes, fabrication of eukaryotic genomes requires synthesis on a scale that is orders of magnitude higher. These high-throughput but labor-intensive projects serve as an ideal way to introduce undergraduates to hands-on synthetic biology research. We are pursuing synthesis of Saccharomyces cerevisiae chromosomes in an undergraduate laboratory setting, the Build-a-Genome course, thereby exposing students to the engineering of biology on a genomewide scale while focusing on a limited region of the genome. A synthetic chromosome III sequence was designed, ordered from commercial suppliers in the form of oligonucleotides, and subsequently assembled by students into ∼750-bp fragments. Once trained in assembly of such DNA “building blocks” by PCR, the students accomplish high-yield gene synthesis, becoming not only technically proficient but also constructively critical and capable of adapting their protocols as independent researchers. Regular “lab meeting” sessions help prepare them for future roles in laboratory science. PMID:19015540

  6. Genome-Scale Analysis of Replication Timing: from Bench to Bioinformatics

    PubMed Central

    Ryba, Tyrone; Battaglia, Dana; Pope, Benjamin D.; Hiratani, Ichiro; Gilbert, David M.

    2011-01-01

    SUMMARY Replication timing profiles are cell type-specific and reflect genome organization changes upon differentiation. In this protocol we describe how to analyze replication timing genome-wide in mammalian cells. Asynchronously cycling cells are pulse labeled with the nucleotide analog 5-bromo-2-deoxyuridine (BrdU) and sorted into S-phase fractions based on DNA content using flow cytometry. BrdU-labeled DNA from each fraction is immunoprecipitated, amplified, differentially labeled, and co-hybridized to a whole-genome CGH microarray, which is currently more cost effective than high-throughput sequencing and equally capable of resolving features at the biologically relevant level of tens to hundreds of kilobases. We also present a guide to analyzing the resulting datasets, based on methods we use routinely. Subjects include normalization, scaling, and data quality measures, loess (local polynomial) smoothing of replication timing values, segmentation of data into domains, and assignment of timing values to gene promoters. Finally, we cover clustering methods and means to relate changes in the replication program to gene expression and other genetic and epigenetic datasets. Some experience with R or similar programming languages is assumed. Altogether, the protocol takes approximately 3 weeks to complete. PMID:21637205

  7. Direct-to-consumer personalized genomic testing

    PubMed Central

    Bloss, Cinnamon S.; Darst, Burcu F.; Topol, Eric J.; Schork, Nicholas J.

    2011-01-01

    Over the past 18 months, there have been notable developments in the direct-to-consumer (DTC) genomic testing arena, in particular with regard to issues surrounding governmental regulation in the USA. While commentaries continue to proliferate on this topic, actual empirical research remains relatively scant. In terms of DTC genomic testing for disease susceptibility, most of the research has centered on uptake, perceptions and attitudes toward testing among health care professionals and consumers. Only a few available studies have examined actual behavioral response among consumers, and we are not aware of any studies that have examined response to DTC genetic testing for ancestry or for drug response. We propose that further research in this area is desperately needed, despite challenges in designing appropriate studies given the rapid pace at which the field is evolving. Ultimately, DTC genomic testing for common markers and conditions is only a precursor to the eventual cost-effectiveness and wide availability of whole genome sequencing of individuals, although it remains unclear whether DTC genomic information will still be attainable. Either way, however, current knowledge needs to be extended and enhanced with respect to the delivery, impact and use of increasingly accurate and comprehensive individualized genomic data. PMID:21828075

  8. Challenges of web-based personal genomic data sharing.

    PubMed

    Shabani, Mahsa; Borry, Pascal

    2015-01-01

    In order to study the relationship between genes and diseases, the increasing availability and sharing of phenotypic and genotypic data have been promoted as an imperative within the scientific community. In parallel with data sharing practices by clinicians and researchers, recent initiatives have been observed in which individuals are sharing personal genomic data. The involvement of individuals in such initiatives is facilitated by the increased accessibility of personal genomic data, offered by private test providers along with availability of online networks. Personal webpages and on-line data sharing platforms such as Consent to Research (Portable Legal Consent), Free the Data, and Genomes Unzipped are being utilized to host and share genotypes, electronic health records and family history uploaded by individuals. Although personal genomic data sharing initiatives vary in nature, the emphasis on the individuals' control on their data in order to benefit research and ultimately health care has seen as a key theme across these initiatives. In line with the growing practice of personal genomic data sharing, this paper aims to shed light on the potential challenges surrounding these initiatives. As in the course of these initiatives individuals are solicited to individually balance the risks and benefits of sharing their genomic data, their awareness of the implications of personal genomic data sharing for themselves and their family members is a necessity. Furthermore, given the sensitivity of genomic data and the controversies around their complete de-identifiability, potential privacy risks and harms originating from unintended uses of data have to be taken into consideration. PMID:26085313

  9. Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.

    PubMed

    Phan, John H; Quo, Chang-Feng; Wang, May D

    2006-01-01

    The goal of this chapter is to introduce some of the available computational methods for expression analysis. Genomic and proteomic experimental techniques are briefly discussed to help the reader understand these methods and results better in context with the biological significance. Furthermore, a case study is presented that will illustrate the use of these analytical methods to extract significant biomarkers from high-throughput microarray data. Genomic and proteomic data analysis is essential for understanding the underlying factors that are involved in human disease. Currently, such experimental data are generally obtained by high-throughput microarray or mass spectrometry technologies among others. The sheer amount of raw data obtained using these methods warrants specialized computational methods for data analysis. Biomarker discovery for neurological diagnosis and prognosis is one such example. By extracting significant genomic and proteomic biomarkers in controlled experiments, we come closer to understanding how biological mechanisms contribute to neural degenerative diseases such as Alzheimers' and how drug treatments interact with the nervous system. In the biomarker discovery process, there are several computational methods that must be carefully considered to accurately analyze genomic or proteomic data. These methods include quality control, clustering, classification, feature ranking, and validation. Data quality control and normalization methods reduce technical variability and ensure that discovered biomarkers are statistically significant. Preprocessing steps must be carefully selected since they may adversely affect the results of the following expression analysis steps, which generally fall into two categories: unsupervised and supervised. Unsupervised or clustering methods can be used to group similar genomic or proteomic profiles and therefore can elucidate relationships within sample groups. These methods can also assign biomarkers to sub

  10. lobSTR: A short tandem repeat profiler for personal genomes

    PubMed Central

    Gymrek, Melissa; Golan, David; Rosset, Saharon; Erlich, Yaniv

    2012-01-01

    Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format. PMID:22522390

  11. AnnoTALE: bioinformatics tools for identification, annotation, and nomenclature of TALEs from Xanthomonas genomic sequences.

    PubMed

    Grau, Jan; Reschke, Maik; Erkes, Annett; Streubel, Jana; Morgan, Richard D; Wilson, Geoffrey G; Koebnik, Ralf; Boch, Jens

    2016-01-01

    Transcription activator-like effectors (TALEs) are virulence factors, produced by the bacterial plant-pathogen Xanthomonas, that function as gene activators inside plant cells. Although the contribution of individual TALEs to infectivity has been shown, the specific roles of most TALEs, and the overall TALE diversity in Xanthomonas spp. is not known. TALEs possess a highly repetitive DNA-binding domain, which is notoriously difficult to sequence. Here, we describe an improved method for characterizing TALE genes by the use of PacBio sequencing. We present 'AnnoTALE', a suite of applications for the analysis and annotation of TALE genes from Xanthomonas genomes, and for grouping similar TALEs into classes. Based on these classes, we propose a unified nomenclature for Xanthomonas TALEs that reveals similarities pointing to related functionalities. This new classification enables us to compare related TALEs and to identify base substitutions responsible for the evolution of TALE specificities. PMID:26876161

  12. Integrative bioinformatics for functional genome annotation: trawling for G protein-coupled receptors.

    PubMed

    Flower, Darren R; Attwood, Teresa K

    2004-12-01

    G protein-coupled receptors (GPCR) are amongst the best studied and most functionally diverse types of cell-surface protein. The importance of GPCRs as mediates or cell function and organismal developmental underlies their involvement in key physiological roles and their prominence as targets for pharmacological therapeutics. In this review, we highlight the requirement for integrated protocols which underline the different perspectives offered by different sequence analysis methods. BLAST and FastA offer broad brush strokes. Motif-based search methods add the fine detail. Structural modelling offers another perspective which allows us to elucidate the physicochemical properties that underlie ligand binding. Together, these different views provide a more informative and a more detailed picture of GPCR structure and function. Many GPCRs remain orphan receptors with no identified ligand, yet as computer-driven functional genomics starts to elaborate their functions, a new understanding of their roles in cell and developmental biology will follow. PMID:15561589

  13. Genome Science and Personalized Cancer Treatment

    SciTech Connect

    Gray, Joe

    2009-08-04

    Summer Lecture Series 2009: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  14. Genome Science and Personalized Cancer Treatment

    SciTech Connect

    Gray, Joe

    2009-08-07

    August 4, 2009 Berkeley Lab lecture: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  15. Genome Science and Personalized Cancer Treatment

    ScienceCinema

    Gray, Joe

    2016-07-12

    August 4, 2009 Berkeley Lab lecture: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  16. [Genome-wide identification and bioinformatic analysis of PPR gene family in tomato].

    PubMed

    Ding, Anming; Li, Ling; Qu, Xu; Sun, Tingting; Chen, Yaqiong; Zong, Peng; Li, Zunqiang; Gong, Daping; Sun, Yuhe

    2014-01-01

    Pentatricopeptide repeats (PPRs) genes constitute one of the largest gene families in plants, which play a broad and essential role in plant growth and development. In this study, the protein sequences annotated by the tomato (S. lycopersicum L.) genome project were screened with the Pfam PPR sequences. A total of 471 putative PPR-encoding genes were identified. Based on the motifs defined in A. thaliana L., protein structure and conserved sequences for each tomato motif were analyzed. We also analyzed phylogenetic relationship, subcellular localization, expression and GO analysis of the identified gene sequences. Our results demonstrate that tomato PPR gene family contains two subfamilies, P and PLS, each accounting for half of the family. PLS subfamily can be divided into four subclasses i.e., PLS, E, E+ and DYW. Each subclass of sequences forms a clade in the phylogenetic tree. The PPR motifs were found highly conserved among plants. The tomato PPR genes were distributed over 12 chromosomes and most of them lack introns. The majority of PPR proteins harbor mitochondrial or chloroplast localization sequences, whereas GO analysis showed that most PPR proteins participate in RNA-related biological processes.

  17. Genomic and bioinformatic analysis of NADPH-cytochrome P450 reductase in Anopheles stephensi (Diptera: Culicidae).

    PubMed

    Suwanchaichinda, C; Brattsten, L B

    2014-01-01

    The cytochrome P450 monooxygenase (P450) enzyme system is a major mechanism of xenobiotic biotransformation. The nicotinamide adenine dinucleotide phosphate (NADPH)-cytochrome P450 reductase (CPR) is required for transfer of electrons from NADPH to P450. One CPR gene was identified in the genome of the malaria-transmitting mosquito Anopheles stephensi Liston (Diptera: Culicidae). The gene encodes a polypeptide containing highly conserved flavin mononucleotide-, flavin adenine dinucleotide-, and NADPH-binding domains, a unique characteristic of the reductase. Phylogenetic analysis revealed that the A. stephensi and other known mosquito CPRs belong to a monophyletic group distinctly separated from other insects in the same order, Diptera. Amino acid residues of CPRs involved in binding of P450 and cytochrome c are conserved between A. stephensi and the Norway rat Rattus norvegicus Berkenhout (Rodentia: Muridae). However, gene structure particularly within the coding region is evidently different between the two organisms. Such difference might arise during the evolution process as also seen in the difference of P450 families and isoforms found in these organisms. CPR in the mosquito A. stephensi is expected to be active and serve as an essential component of the P450 system.

  18. The predictive capacity of personal genome sequencing.

    PubMed

    Roberts, Nicholas J; Vogelstein, Joshua T; Parmigiani, Giovanni; Kinzler, Kenneth W; Vogelstein, Bert; Velculescu, Victor E

    2012-05-01

    New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a "genometype." A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the maximum capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that (i) for 23 of the 24 diseases, most of the individuals will receive negative test results; (ii) these negative test results will, in general, not be very informative, because the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 to 80% of that in the general population; and (iii) on the positive side, in the best-case scenario, more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy-makers, and consumers. PMID:22472521

  19. Moving beyond genome sequencing into personalized genomic medicine: biological and computing challenges

    PubMed Central

    2011-01-01

    A report of the second annual Beyond the Genome conference held on the 19-22 September 2011 at The Universities at Shady Grove, Rockville, Maryland, USA, where increases in computing that may help make personal genomics a reality were a major focus. PMID:22023790

  20. Assessing Student Understanding of the "New Biology": Development and Evaluation of a Criterion-Referenced Genomics and Bioinformatics Assessment

    NASA Astrophysics Data System (ADS)

    Campbell, Chad Edward

    Over the past decade, hundreds of studies have introduced genomics and bioinformatics (GB) curricula and laboratory activities at the undergraduate level. While these publications have facilitated the teaching and learning of cutting-edge content, there has yet to be an evaluation of these assessment tools to determine if they are meeting the quality control benchmarks set forth by the educational research community. An analysis of these assessment tools indicated that <10% referenced any quality control criteria and that none of the assessments met more than one of the quality control benchmarks. In the absence of evidence that these benchmarks had been met, it is unclear whether these assessment tools are capable of generating valid and reliable inferences about student learning. To remedy this situation the development of a robust GB assessment aligned with the quality control benchmarks was undertaken in order to ensure evidence-based evaluation of student learning outcomes. Content validity is a central piece of construct validity, and it must be used to guide instrument and item development. This study reports on: (1) the correspondence of content validity evidence gathered from independent sources; (2) the process of item development using this evidence; (3) the results from a pilot administration of the assessment; (4) the subsequent modification of the assessment based on the pilot administration results and; (5) the results from the second administration of the assessment. Twenty-nine different subtopics within GB (Appendix B: Genomics and Bioinformatics Expert Survey) were developed based on preliminary GB textbook analyses. These subtopics were analyzed using two methods designed to gather content validity evidence: (1) a survey of GB experts (n=61) and (2) a detailed content analyses of GB textbooks (n=6). By including only the subtopics that were shown to have robust support across these sources, 22 GB subtopics were established for inclusion in the

  1. Novel Bioinformatics Method for Identification of Genome-Wide Non-Canonical Spliced Regions Using RNA-Seq Data

    PubMed Central

    Ziyar, Ahdad; Li, Philip; Wright, Zachary; Menon, Rajasree; Omenn, Gilbert S.; Cavalcoli, James D.; Kaufman, Randal J.; Sartor, Maureen A.

    2014-01-01

    Setting During endoplasmic reticulum (ER) stress, the endoribonuclease (RNase) Ire1α initiates removal of a 26 nt region from the mRNA encoding the transcription factor Xbp1 via an unconventional mechanism (atypically within the cytosol). This causes an open reading frame-shift that leads to altered transcriptional regulation of numerous downstream genes in response to ER stress as part of the unfolded protein response (UPR). Strikingly, other examples of targeted, unconventional splicing of short mRNA regions have yet to be reported. Objective Our goal was to develop an approach to identify non-canonical, possibly very short, splicing regions using RNA-Seq data and apply it to ER stress-induced Ire1α heterozygous and knockout mouse embryonic fibroblast (MEF) cell lines to identify additional Ire1α targets. Results We developed a bioinformatics approach called the Read-Split-Walk (RSW) pipeline, and evaluated it using two Ire1α heterozygous and two Ire1α-null samples. The 26 nt non-canonical splice site in Xbp1 was detected as the top hit by our RSW pipeline in heterozygous samples but not in the negative control Ire1α knockout samples. We compared the Xbp1 results from our approach with results using the alignment program BWA, Bowtie2, STAR, Exonerate and the Unix “grep” command. We then applied our RSW pipeline to RNA-Seq data from the SKBR3 human breast cancer cell line. RSW reported a large number of non-canonical spliced regions for 108 genes in chromosome 17, which were identified by an independent study. Conclusions We conclude that our RSW pipeline is a practical approach for identifying non-canonical splice junction sites on a genome-wide level. We demonstrate that our pipeline can detect novel splice sites in RNA-Seq data generated under similar conditions for multiple species, in our case mouse and human. PMID:24991935

  2. MEMOSys 2.0: an update of the bioinformatics database for genome-scale models and genomic data.

    PubMed

    Pabinger, Stephan; Snajder, Rene; Hardiman, Timo; Willi, Michaela; Dander, Andreas; Trajanoski, Zlatko

    2014-01-01

    The MEtabolic MOdel research and development System (MEMOSys) is a versatile database for the management, storage and development of genome-scale models (GEMs). Since its initial release, the database has undergone major improvements, and the new version introduces several new features. First, the novel concept of derived models allows users to create model hierarchies that automatically propagate modifications along their order. Second, all stored components can now be easily enhanced with additional annotations that can be directly extracted from a supplied Systems Biology Markup Language (SBML) file. Third, the web application has been substantially revised and now features new query mechanisms, an easy search system for reactions and new link-out services to publicly available databases. Fourth, the updated database now contains 20 publicly available models, which can be easily exported into standardized formats for further analysis. Fifth, MEMOSys 2.0 is now also available as a fully configured virtual image and can be found online at http://www.icbi.at/memosys and http://memoys.i-med.ac.at. Database URL: http://memosys.i-med.ac.at.

  3. Genome Paths: A Way to Personalized and Predictive Medicine

    PubMed Central

    2009-01-01

    The review is devoted to the impact of human genome research on progress in modern medicine. Basic achievements in genome research have resulted in the deciphering of the human genome and creation of a molecular landmarks map of the human haploid genome (HapMap Project), which has made a tremendous contribution to our understanding of common genetic and multifactorial (complex) disorders. Current genome studies mainly focus on genetic testing and gene association studies of multifactorial (complex) diseases, with the purpose of their efficient diagnostics and prevention . Identification of candidate ("predisposition") genes participating in the functional genetic modules underlying each common disorder and the use of this genetic background to elaborate sophisticated measures to efficiently prevent them constitutes a major goal in personalized molecular medicine. The concept of a genetic pass as an individual DNA databank reflecting inherited human predisposition to different complex and monogenic disorders, with special emphasis on its present state, and the numerous difficulties related to the practical implementation of personalized medicine are outlined. The problems related to the uncertainness of the results of genetic testing could be overcome at least partly by means of new technological achievements in genome research methods, such as genome-wide association studies (GWAS), massive parallel DNA sequencing, and genetic and epigenetic profiling. The basic tasks of genomic today could be determined as the need to properly estimate the clinical value of genetic testing and its applicability in clinical practice. Feasible ways towards the gradual implementation of personal genetic data, in line with routine laboratory tests, for the benefit of clinical practice are discussed. PMID:22649616

  4. Human genome and the african personality: implications for social work.

    PubMed

    Mickel, Elijah; Miller, Sheila D

    2011-01-01

    The integration of the human genome with the African personality should be viewed as an interdependent whole. The African personality, for purposes of this article, comprises Black experiences, Negritude, and an Africa-centered axiology and epistemology. The outcome results in a spiritual focused collective consciousness. Anthropologically, historically (and with the Human Genome Project), genetically Africa has proven to be the source of all human life. Human kind wherever they exist on the planet using the African personality must be viewed as interconnected. Although racism and its progeny discrimination preexist the human genome project (HGP), the human genome provides an evidence-based rationale for the end to all policy and subsequent practice based on race and racism. Policy must be based on evidence to be competent practice. It would be remiss if not irresponsible of social work and the other behavioral scientist concerned with intervention and prevention behaviors to not infuse the findings of the HCPs. The African personality is a concept that provides a wholistic way to evaluate human behavior from an African worldview.

  5. Re-Examining the Gene in Personalized Genomics

    ERIC Educational Resources Information Center

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  6. Getting up close and personal with your genome.

    PubMed

    Bonetta, Laura

    2008-05-30

    A new type of company is offering to scan a person's genome and reveal the information it holds for as little as $1000. Are these services fun novelty items or do they provide valuable information that will help people take better care of their health? PMID:18510915

  7. Anticipation of Personal Genomics Data Enhances Interest and Learning Environment in Genomics and Molecular Biology Undergraduate Courses.

    PubMed

    Weber, K Scott; Jensen, Jamie L; Johnson, Steven M

    2015-01-01

    An important discussion at colleges is centered on determining more effective models for teaching undergraduates. As personalized genomics has become more common, we hypothesized it could be a valuable tool to make science education more hands on, personal, and engaging for college undergraduates. We hypothesized that providing students with personal genome testing kits would enhance the learning experience of students in two undergraduate courses at Brigham Young University: Advanced Molecular Biology and Genomics. These courses have an emphasis on personal genomics the last two weeks of the semester. Students taking these courses were given the option to receive personal genomics kits in 2014, whereas in 2015 they were not. Students sent their personal genomics samples in on their own and received the data after the course ended. We surveyed students in these courses before and after the two-week emphasis on personal genomics to collect data on whether anticipation of obtaining their own personal genomic data impacted undergraduate student learning. We also tested to see if specific personal genomic assignments improved the learning experience by analyzing the data from the undergraduate students who completed both the pre- and post-course surveys. Anticipation of personal genomic data significantly enhanced student interest and the learning environment based on the time students spent researching personal genomic material and their self-reported attitudes compared to those who did not anticipate getting their own data. Personal genomics homework assignments significantly enhanced the undergraduate student interest and learning based on the same criteria and a personal genomics quiz. We found that for the undergraduate students in both molecular biology and genomics courses, incorporation of personal genomic testing can be an effective educational tool in undergraduate science education. PMID:26241308

  8. Anticipation of Personal Genomics Data Enhances Interest and Learning Environment in Genomics and Molecular Biology Undergraduate Courses.

    PubMed

    Weber, K Scott; Jensen, Jamie L; Johnson, Steven M

    2015-01-01

    An important discussion at colleges is centered on determining more effective models for teaching undergraduates. As personalized genomics has become more common, we hypothesized it could be a valuable tool to make science education more hands on, personal, and engaging for college undergraduates. We hypothesized that providing students with personal genome testing kits would enhance the learning experience of students in two undergraduate courses at Brigham Young University: Advanced Molecular Biology and Genomics. These courses have an emphasis on personal genomics the last two weeks of the semester. Students taking these courses were given the option to receive personal genomics kits in 2014, whereas in 2015 they were not. Students sent their personal genomics samples in on their own and received the data after the course ended. We surveyed students in these courses before and after the two-week emphasis on personal genomics to collect data on whether anticipation of obtaining their own personal genomic data impacted undergraduate student learning. We also tested to see if specific personal genomic assignments improved the learning experience by analyzing the data from the undergraduate students who completed both the pre- and post-course surveys. Anticipation of personal genomic data significantly enhanced student interest and the learning environment based on the time students spent researching personal genomic material and their self-reported attitudes compared to those who did not anticipate getting their own data. Personal genomics homework assignments significantly enhanced the undergraduate student interest and learning based on the same criteria and a personal genomics quiz. We found that for the undergraduate students in both molecular biology and genomics courses, incorporation of personal genomic testing can be an effective educational tool in undergraduate science education.

  9. Making Personalized Health Care Even More Personalized: Insights From Activities of the IOM Genomics Roundtable

    PubMed Central

    David, Sean P.; Johnson, Samuel G.; Berger, Adam C.; Feero, W. Gregory; Terry, Sharon F.; Green, Larry A.; Phillips, Robert L.; Ginsburg, Geoffrey S.

    2015-01-01

    Genomic research has generated much new knowledge into mechanisms of human disease, with the potential to catalyze novel drug discovery and development, prenatal and neonatal screening, clinical pharmacogenomics, more sensitive risk prediction, and enhanced diagnostics. Genomic medicine, however, has been limited by critical evidence gaps, especially those related to clinical utility and applicability to diverse populations. Genomic medicine may have the greatest impact on health care if it is integrated into primary care, where most health care is received and where evidence supports the value of personalized medicine grounded in continuous healing relationships. Redesigned primary care is the most relevant setting for clinically useful genomic medicine research. Taking insights gained from the activities of the Institute of Medicine (IOM) Roundtable on Translating Genomic-Based Research for Health, we apply lessons learned from the patient-centered medical home national experience to implement genomic medicine in a patient-centered, learning health care system. PMID:26195686

  10. Discover hidden splicing variations by mapping personal transcriptomes to personal genomes.

    PubMed

    Stein, Shayna; Lu, Zhi-Xiang; Bahrami-Samani, Emad; Park, Juw Won; Xing, Yi

    2015-12-15

    RNA-seq has become a popular technology for studying genetic variation of pre-mRNA alternative splicing. Commonly used RNA-seq aligners rely on the consensus splice site dinucleotide motifs to map reads across splice junctions. Consequently, genomic variants that create novel splice site dinucleotides may produce splice junction RNA-seq reads that cannot be mapped to the reference genome. We developed and evaluated an approach to identify 'hidden' splicing variations in personal transcriptomes, by mapping personal RNA-seq data to personal genomes. Computational analysis and experimental validation indicate that this approach identifies personal specific splice junctions at a low false positive rate. Applying this approach to an RNA-seq data set of 75 individuals, we identified 506 personal specific splice junctions, among which 437 were novel splice junctions not documented in current human transcript annotations. 94 splice junctions had splice site SNPs associated with GWAS signals of human traits and diseases. These involve genes whose splicing variations have been implicated in diseases (such as OAS1), as well as novel associations between alternative splicing and diseases (such as ICA1). Collectively, our work demonstrates that the personal genome approach to RNA-seq read alignment enables the discovery of a large but previously unknown catalog of splicing variations in human populations.

  11. Discover hidden splicing variations by mapping personal transcriptomes to personal genomes

    PubMed Central

    Stein, Shayna; Lu, Zhi-xiang; Bahrami-Samani, Emad; Park, Juw Won; Xing, Yi

    2015-01-01

    RNA-seq has become a popular technology for studying genetic variation of pre-mRNA alternative splicing. Commonly used RNA-seq aligners rely on the consensus splice site dinucleotide motifs to map reads across splice junctions. Consequently, genomic variants that create novel splice site dinucleotides may produce splice junction RNA-seq reads that cannot be mapped to the reference genome. We developed and evaluated an approach to identify ‘hidden’ splicing variations in personal transcriptomes, by mapping personal RNA-seq data to personal genomes. Computational analysis and experimental validation indicate that this approach identifies personal specific splice junctions at a low false positive rate. Applying this approach to an RNA-seq data set of 75 individuals, we identified 506 personal specific splice junctions, among which 437 were novel splice junctions not documented in current human transcript annotations. 94 splice junctions had splice site SNPs associated with GWAS signals of human traits and diseases. These involve genes whose splicing variations have been implicated in diseases (such as OAS1), as well as novel associations between alternative splicing and diseases (such as ICA1). Collectively, our work demonstrates that the personal genome approach to RNA-seq read alignment enables the discovery of a large but previously unknown catalog of splicing variations in human populations. PMID:26578562

  12. Company strategies for using bioinformatics.

    PubMed

    Bains, W

    1996-08-01

    Bioinformatics enables biotechnology companies to access and analyse their growing databases of experimental results, and to exploit public data from genome programmes and other sources. Traditionally occupying the domain of a 'guru' supplying answers to infrequent research questions, corporate bioinformatics is breaking down under the flood of data. New, more robust, professional and expandable systems will give scientists effective access to new tools. This review outlines how companies have evolved beyond the 'guru', and have organized their bioinformatics by acquiring or developing bioinformatics resources. It also describes why the biologist must be central to this process, and why this is a problem for computer professionals to solve, not for 'gurus'.

  13. Education and personalized genomics: deciphering the public's genetic health report

    PubMed Central

    Lamb, Neil E; Myers, Richard M; Gunter, Chris

    2010-01-01

    Where do members of the public turn to understand what genetic tests mean in terms of their own health? Now that genome-wide association studies and complete genome sequencing are widely available, the importance of education in personalized genomics cannot be overstated. Although some media have introduced the concept of genetic testing to better understand health and disease, the public's understanding of the scope and impact of genetic variation has not kept up with the pace of the science or technology. Unfortunately, the likely sources to which the public turn to for guidance – their physician and the media – are often no better prepared. We examine several venues for information, including print and online guides for both lay and health-oriented audiences, and summarize selected resources in multiple formats. We also note on the roadblocks to progress and discuss ways to remove them, as urgent action is needed to connect people with their genomes in a meaningful way. PMID:20161675

  14. Identification of conserved and polymorphic STRs for personal genomes

    PubMed Central

    2014-01-01

    Background Short tandem repeats (STRs) are abundant in human genomes. Numerous STRs have been shown to be associated with genetic diseases and gene regulatory functions, and have been selected as genetic markers for evolutionary and forensic analyses. High-throughput next generation sequencers have fostered new cutting-edge computing techniques for genome-scale analyses, and cross-genome comparisons have facilitated the efficient identification of polymorphic STR markers for various applications. Results An automated and efficient system for detecting human polymorphic STRs at the genome scale is proposed in this study. Assembled contigs from next generation sequencing data were aligned and calibrated according to selected reference sequences. To verify identified polymorphic STRs, human genomes from the 1000 Genomes Project were employed for comprehensive analyses, and STR markers from the Combined DNA Index System (CODIS) and disease-related STR motifs were also applied as cases for evaluation. In addition, we analyzed STR variations for highly conserved homologous genes and human-unique genes. In total 477 polymorphic STRs were identified from 492 human-unique genes, among which 26 STRs were retrieved and clustered into three different groups for efficient comparison. Conclusions We have developed an online system that efficiently identifies polymorphic STRs and provides novel distinguishable STR biomarkers for different levels of specificity. Candidate polymorphic STRs within a personal genome could be easily retrieved and compared to the constructed STR profile through query keywords, gene names, or assembled contigs. PMID:25560225

  15. Bioinformatics for Genome Analysis

    SciTech Connect

    Gary J. Olsen

    2005-06-30

    Nesbo, Boucher and Doolittle (2001) used phylogenetic trees of four taxa to assess whether euryarchaeal genes share a common history. They have suggested that of the 521 genes examined, each of the three possible tree topologies relating the four taxa was supported essentially equal numbers of times. They suggest that this might be the result of numerous horizontal gene transfer events, essentially randomizing the relationships between gene histories (as inferred in the 521 gene trees) and organismal relationships (which would be a single underlying tree). Motivated by the fact that the order in which sequences are added to a multiple sequence alignment influences the alignment, and ultimately inferred tree, they were interested in the extent to which the variations among inferred trees might be due to variations in the alignment order. This bears directly on their efforts to evaluate and improve upon methods of multiple sequence alignment. They set out to analyze the influence of alignment order on the tree inferred for 43 genes shared among these same 4 taxa. Because alignments produced by CLUSTALW are directed by a rooted guide tree (the denderogram), there are 15 possible alignment orders of 4 taxa. For each gene they tested all 15 alignment orders, and as a 16th option, allowed CLUSTALW to generate its own guide tree. If we supply all 15 possible rooted guide trees, they expected that at least one of them should be as good at CLUSTAL's own guide tree, but most of the time they differed (sometimes being better than CLUSTAL's default tree and sometimes being worse). The difference seems to be that the user-supplied tree is not given meaningful branch lengths, which effect the assumed probability of amino acid changes. They examined the practicality of modifying CLUSTALW to improve its treatment of user-supplied guide trees. This work became ever increasing bogged down in finding and repairing minor bugs in the CLUSTALW code. This effort was put on hold as we feel that our other proposed approaches will ultimately be better.

  16. GenePING: secure, scalable management of personal genomic data

    PubMed Central

    Adida, Ben; Kohane, Isaac S

    2006-01-01

    Background Patient genomic data are rapidly becoming part of clinical decision making. Within a few years, full genome expression profiling and genotyping will be affordable enough to perform on every individual. The management of such sizeable, yet fine-grained, data in compliance with privacy laws and best practices presents significant security and scalability challenges. Results We present the design and implementation of GenePING, an extension to the PING personal health record system that supports secure storage of large, genome-sized datasets, as well as efficient sharing and retrieval of individual datapoints (e.g. SNPs, rare mutations, gene expression levels). Even with full access to the raw GenePING storage, an attacker cannot discover any stored genomic datapoint on any single patient. Given a large-enough number of patient records, an attacker cannot discover which data corresponds to which patient, or even the size of a given patient's record. The computational overhead of GenePING's security features is a small constant, making the system usable, even in emergency care, on today's hardware. Conclusion GenePING is the first personal health record management system to support the efficient and secure storage and sharing of large genomic datasets. GenePING is available online at , licensed under the LGPL. PMID:16638151

  17. Systematic evaluation of personal genome services for Japanese individuals.

    PubMed

    Kido, Takashi; Kawashima, Minae; Nishino, Seiji; Swan, Melanie; Kamatani, Naoyuki; Butte, Atul J

    2013-11-01

    Disease risk prediction (DRP) is one of the most important challenges in personal genome research. Although many direct-to-consumer genetic test (DTC) companies have begun to offer personal genome services for DRP, there is still no consensus on what constitutes a gold-standard service. Here, we systematically evaluated the distributions of DRPs from three DTC companies, that is, 23andMe, Navigenics and deCODEme, for 22 diseases using three Japanese samples. We systematically quantified and analyzed the differences between each DTC company's DRPs. Our independency test showed that the overall prediction results were correlated with each other, but not perfectly matched; less than onethird mismatching of the opposite direction occurred in eight diseases. Moreover, we found that the differences could mainly be attributed to four factors: (1) single nucleotide polymorphism (SNP) selection, (2) average risk estimation, (3) the disease risk calculation algorithm and (4) ethnicity adjustment. In particular, only 7.1% of SNPs over 22 diseases were reviewed by all three companies. Therefore, development of a universal core SNPs list for non-Caucasian samples will be important for achieving better prediction capacity for Japanese samples. This systematic methodology provides useful insights for improving the capacity of DRPs in future personal genome services. PMID:24067293

  18. Attitudes regarding privacy of genomic information in personalized cancer therapy

    PubMed Central

    Rogith, Deevakar; Yusuf, Rafeek A; Hovick, Shelley R; Peterson, Susan K; Burton-Chase, Allison M; Li, Yisheng; Meric-Bernstam, Funda; Bernstam, Elmer V

    2014-01-01

    Objective To evaluate attitudes regarding privacy of genomic data in a sample of patients with breast cancer. Methods Female patients with breast cancer (n=100) completed a questionnaire assessing attitudes regarding concerns about privacy of genomic data. Results Most patients (83%) indicated that genomic data should be protected. However, only 13% had significant concerns regarding privacy of such data. Patients expressed more concern about insurance discrimination than employment discrimination (43% vs 28%, p<0.001). They expressed less concern about research institutions protecting the security of their molecular data than government agencies or drug companies (20% vs 38% vs 44%; p<0.001). Most did not express concern regarding the association of their genomic data with their name and personal identity (49% concerned), billing and insurance information (44% concerned), or clinical data (27% concerned). Significantly fewer patients were concerned about the association with clinical data than other data types (p<0.001). In the absence of direct benefit, patients were more willing to consent to sharing of deidentified than identified data with researchers not involved in their care (76% vs 60%; p<0.001). Most (85%) patients were willing to consent to DNA banking. Discussion While patients are opposed to indiscriminate release of genomic data, privacy does not appear to be their primary concern. Furthermore, we did not find any specific predictors of privacy concerns. Conclusions Patients generally expressed low levels of concern regarding privacy of genomic data, and many expressed willingness to consent to sharing their genomic data with researchers. PMID:24737606

  19. Biology in 'silico': The Bioinformatics Revolution.

    ERIC Educational Resources Information Center

    Bloom, Mark

    2001-01-01

    Explains the Human Genome Project (HGP) and efforts to sequence the human genome. Describes the role of bioinformatics in the project and considers it the genetics Swiss Army Knife, which has many different uses, for use in forensic science, medicine, agriculture, and environmental sciences. Discusses the use of bioinformatics in the high school…

  20. Virtual bioinformatics distance learning suite*.

    PubMed

    Tolvanen, Martti; Vihinen, Mauno

    2004-05-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material over the Internet. Currently, we provide two fully computer-based courses, "Introduction to Bioinformatics" and "Bioinformatics in Functional Genomics." Here we will discuss the application of distance learning in bioinformatics training and our experiences gained during the 3 years that we have run the courses, with about 400 students from a number of universities. The courses are available at bioinf.uta.fi.

  1. TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology

    PubMed Central

    Hong, Dongwan; Park, Sung-Soo; Ju, Young Seok; Kim, Sheehyun; Shin, Jong-Yeon; Kim, Sujung; Yu, Saet-Byeol; Lee, Won-Chul; Lee, Seungbok; Park, Hansoo; Kim, Jong-Il; Seo, Jeong-Sun

    2011-01-01

    High-throughput genomic technologies have been used to explore personal human genomes for the past few years. Although the integration of technologies is important for high-accuracy detection of personal genomic variations, no databases have been prepared to systematically archive genomes and to facilitate the comparison of personal genomic data sets prepared using a variety of experimental platforms. We describe here the Total Integrated Archive of Short-Read and Array (TIARA; http://tiara.gmi.ac.kr) database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA supports a user-friendly genome browser, which retrieves read-depths (RDs) and log2 ratios from NGS and CGH arrays, respectively. In addition, this database provides information on all genomic variants and the raw data, including short reads and feature-level CGH data, through anonymous file transfer protocol. More personal genomes will be archived as more individuals are analyzed by NGS or CGH array. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research. PMID:21051338

  2. Illuminating the Black Box of Genome Sequence Assembly: A Free Online Tool to Introduce Students to Bioinformatics

    ERIC Educational Resources Information Center

    Taylor, D. Leland; Campbell, A. Malcolm; Heyer, Laurie J.

    2013-01-01

    Next-generation sequencing technologies have greatly reduced the cost of sequencing genomes. With the current sequencing technology, a genome is broken into fragments and sequenced, producing millions of "reads." A computer algorithm pieces these reads together in the genome assembly process. PHAST is a set of online modules…

  3. Diagnosis of an imprinted-gene syndrome by a novel bioinformatics analysis of whole-genome sequences from a family trio.

    PubMed

    Bodian, Dale L; Solomon, Benjamin D; Khromykh, Alina; Thach, Dzung C; Iyer, Ramaswamy K; Link, Kathleen; Baker, Robin L; Baveja, Rajiv; Vockley, Joseph G; Niederhuber, John E

    2014-11-01

    Whole-genome sequencing and whole-exome sequencing are becoming more widely applied in clinical medicine to help diagnose rare genetic diseases. Identification of the underlying causative mutations by genome-wide sequencing is greatly facilitated by concurrent analysis of multiple family members, most often the mother-father-proband trio, using bioinformatics pipelines that filter genetic variants by mode of inheritance. However, current pipelines are limited to Mendelian inheritance patterns and do not specifically address disorders caused by mutations in imprinted genes, such as forms of Angelman syndrome and Beckwith-Wiedemann syndrome. Using publicly available tools, we implemented a genetic inheritance search mode to identify imprinted-gene mutations. Application of this search mode to whole-genome sequences from a family trio led to a diagnosis for a proband for whom extensive clinical testing and Mendelian inheritance-based sequence analysis were nondiagnostic. The condition in this patient, IMAGe syndrome, is likely caused by the heterozygous mutation c.832A>G (p.Lys278Glu) in the imprinted gene CDKN1C. The genotypes and disease status of six members of the family are consistent with maternal expression of the gene, and allele-biased expression was confirmed by RNA-Seq for the heterozygotes. This analysis demonstrates that an imprinted-gene search mode is a valuable addition to genome sequence analysis pipelines for identifying disease-causative variants. PMID:25614875

  4. Diagnosis of an imprinted-gene syndrome by a novel bioinformatics analysis of whole-genome sequences from a family trio

    PubMed Central

    Bodian, Dale L; Solomon, Benjamin D; Khromykh, Alina; Thach, Dzung C; Iyer, Ramaswamy K; Link, Kathleen; Baker, Robin L; Baveja, Rajiv; Vockley, Joseph G; Niederhuber, John E

    2014-01-01

    Whole-genome sequencing and whole-exome sequencing are becoming more widely applied in clinical medicine to help diagnose rare genetic diseases. Identification of the underlying causative mutations by genome-wide sequencing is greatly facilitated by concurrent analysis of multiple family members, most often the mother–father–proband trio, using bioinformatics pipelines that filter genetic variants by mode of inheritance. However, current pipelines are limited to Mendelian inheritance patterns and do not specifically address disorders caused by mutations in imprinted genes, such as forms of Angelman syndrome and Beckwith–Wiedemann syndrome. Using publicly available tools, we implemented a genetic inheritance search mode to identify imprinted-gene mutations. Application of this search mode to whole-genome sequences from a family trio led to a diagnosis for a proband for whom extensive clinical testing and Mendelian inheritance-based sequence analysis were nondiagnostic. The condition in this patient, IMAGe syndrome, is likely caused by the heterozygous mutation c.832A>G (p.Lys278Glu) in the imprinted gene CDKN1C. The genotypes and disease status of six members of the family are consistent with maternal expression of the gene, and allele-biased expression was confirmed by RNA-Seq for the heterozygotes. This analysis demonstrates that an imprinted-gene search mode is a valuable addition to genome sequence analysis pipelines for identifying disease-causative variants. PMID:25614875

  5. Bioinformatic tools for using whole genome sequencing as a rapid high resolution diagnostic typing tool when tracing bioterror organisms in the food and feed chain.

    PubMed

    Segerman, Bo; De Medici, Dario; Ehling Schulz, Monika; Fach, Patrick; Fenicia, Lucia; Fricker, Martina; Wielinga, Peter; Van Rotterdam, Bart; Knutsson, Rickard

    2011-03-01

    The rapid technological development in the field of parallel sequencing offers new opportunities when tracing and tracking microorganisms in the food and feed chain. If a bioterror organism is deliberately spread it is of crucial importance to get as much information as possible regarding the strain as fast as possible to aid the decision process and select suitable controls, tracing and tracking tools. A lot of efforts have been made to sequence multiple strains of potential bioterror organisms so there is a relatively large set of reference genomes available. This study is focused on how to use parallel sequencing for rapid phylogenomic analysis and screen for genetic modifications. A bioinformatic methodology has been developed to rapidly analyze sequence data with minimal post-processing. Instead of assembling the genome, defining genes, defining orthologous relations and calculating distances, the present method can achieve a similar high resolution directly from the raw sequence data. The method defines orthologous sequence reads instead of orthologous genes and the average similarity of the core genome (ASC) is calculated. The sequence reads from the core and from the non-conserved genomic regions can also be separated for further analysis. Finally, the comparison algorithm is used to visualize the phylogenomic diversity of the bacterial bioterror organisms Bacillus anthracis and Clostridium botulinum using heat plot diagrams.

  6. Re-examining the Gene in Personalized Genomics

    NASA Astrophysics Data System (ADS)

    Bartol, Jordan

    2013-10-01

    Personalized genomics companies (PG; also called `direct-to-consumer genetics') are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept presented to customers and the relation between the information given and the science behind PG. Two quite different gene concepts are present in company rhetoric, but only one features in the science. To explain this, we must appreciate the delicate tension between PG, academic science, public expectation, and market forces.

  7. hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms.

    PubMed

    Xie, Qingjun; Tzfadia, Oren; Levy, Matan; Weithorn, Efrat; Peled-Zehavi, Hadas; Van Parys, Thomas; Van de Peer, Yves; Galili, Gad

    2016-05-01

    Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements-the presence of acidic amino acids and the absence of positively charged amino acids in certain positions-to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/. PMID:27071037

  8. Bioinformatics pipelines for targeted resequencing and whole-exome sequencing of human and mouse genomes: a virtual appliance approach for instant deployment.

    PubMed

    Li, Jason; Doyle, Maria A; Saeed, Isaam; Wong, Stephen Q; Mar, Victoria; Goode, David L; Caramia, Franco; Doig, Ken; Ryland, Georgina L; Thompson, Ella R; Hunter, Sally M; Halgamuge, Saman K; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G; Papenfuss, Anthony T; McArthur, Grant A; Tothill, Richard W

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  9. Bioinformatics Pipelines for Targeted Resequencing and Whole-Exome Sequencing of Human and Mouse Genomes: A Virtual Appliance Approach for Instant Deployment

    PubMed Central

    Saeed, Isaam; Wong, Stephen Q.; Mar, Victoria; Goode, David L.; Caramia, Franco; Doig, Ken; Ryland, Georgina L.; Thompson, Ella R.; Hunter, Sally M.; Halgamuge, Saman K.; Ellul, Jason; Dobrovic, Alexander; Campbell, Ian G.; Papenfuss, Anthony T.; McArthur, Grant A.; Tothill, Richard W.

    2014-01-01

    Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance), making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing), enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/. PMID:24752294

  10. Genome-wide association study of antisocial personality disorder.

    PubMed

    Rautiainen, M-R; Paunio, T; Repo-Tiihonen, E; Virkkunen, M; Ollila, H M; Sulkava, S; Jolanki, O; Palotie, A; Tiihonen, J

    2016-01-01

    The pathophysiology of antisocial personality disorder (ASPD) remains unclear. Although the most consistent biological finding is reduced grey matter volume in the frontal cortex, about 50% of the total liability to developing ASPD has been attributed to genetic factors. The contributing genes remain largely unknown. Therefore, we sought to study the genetic background of ASPD. We conducted a genome-wide association study (GWAS) and a replication analysis of Finnish criminal offenders fulfilling DSM-IV criteria for ASPD (N=370, N=5850 for controls, GWAS; N=173, N=3766 for controls and replication sample). The GWAS resulted in suggestive associations of two clusters of single-nucleotide polymorphisms at 6p21.2 and at 6p21.32 at the human leukocyte antigen (HLA) region. Imputation of HLA alleles revealed an independent association with DRB1*01:01 (odds ratio (OR)=2.19 (1.53-3.14), P=1.9 × 10(-5)). Two polymorphisms at 6p21.2 LINC00951-LRFN2 gene region were replicated in a separate data set, and rs4714329 reached genome-wide significance (OR=1.59 (1.37-1.85), P=1.6 × 10(-9)) in the meta-analysis. The risk allele also associated with antisocial features in the general population conditioned for severe problems in childhood family (β=0.68, P=0.012). Functional analysis in brain tissue in open access GTEx and Braineac databases revealed eQTL associations of rs4714329 with LINC00951 and LRFN2 in cerebellum. In humans, LINC00951 and LRFN2 are both expressed in the brain, especially in the frontal cortex, which is intriguing considering the role of the frontal cortex in behavior and the neuroanatomical findings of reduced gray matter volume in ASPD. To our knowledge, this is the first study showing genome-wide significant and replicable findings on genetic variants associated with any personality disorder.

  11. Genomic resources for a commercial flatfish, the Senegalese sole (Solea senegalensis): EST sequencing, oligo microarray design, and development of the Soleamold bioinformatic platform

    PubMed Central

    Cerdà, Joan; Mercadé, Jaume; Lozano, Juan José; Manchado, Manuel; Tingaud-Sequeira, Angèle; Astola, Antonio; Infante, Carlos; Halm, Silke; Viñas, Jordi; Castellana, Barbara; Asensio, Esther; Cañavate, Pedro; Martínez-Rodríguez, Gonzalo; Piferrer, Francesc; Planas, Josep V; Prat, Francesc; Yúfera, Manuel; Durany, Olga; Subirada, Francesc; Rosell, Elisabet; Maes, Tamara

    2008-01-01

    Background The Senegalese sole, Solea senegalensis, is a highly prized flatfish of growing commercial interest for aquaculture in Southern Europe. However, despite the industrial production of Senegalese sole being hampered primarily by lack of information on the physiological mechanisms involved in reproduction, growth and immunity, very limited genomic information is available on this species. Results Sequencing of a S. senegalensis multi-tissue normalized cDNA library, from adult tissues (brain, stomach, intestine, liver, ovary, and testis), larval stages (pre-metamorphosis, metamorphosis), juvenile stages (post-metamorphosis, abnormal fish), and undifferentiated gonads, generated 10,185 expressed sequence tags (ESTs). Clones were sequenced from the 3'-end to identify isoform specific sequences. Assembly of the entire EST collection into contigs gave 5,208 unique sequences of which 1,769 (34%) had matches in GenBank, thus showing a low level of redundancy. The sequence of the 5,208 unigenes was used to design and validate an oligonucleotide microarray representing 5,087 unique Senegalese sole transcripts. Finally, a novel interactive bioinformatic platform, Soleamold, was developed for the Senegalese sole EST collection as well as microarray and ISH data. Conclusion New genomic resources have been developed for S. senegalensis, an economically important fish in aquaculture, which include a collection of expressed genes, an oligonucleotide microarray, and a publicly available bioinformatic platform that can be used to study gene expression in this species. These resources will help elucidate transcriptional regulation in wild and captive Senegalese sole for optimization of its production under intensive culture conditions. PMID:18973667

  12. Genomics and Bioinformatics in Undergraduate Curricula: Contexts for Hybrid Laboratory/Lecture Courses for Entering and Advanced Science Students

    ERIC Educational Resources Information Center

    Temple, Louise; Cresawn, Steven G.; Monroe, Jonathan D.

    2010-01-01

    Emerging interest in genomics in the scientific community prompted biologists at James Madison University to create two courses at different levels to modernize the biology curriculum. The courses are hybrids of classroom and laboratory experiences. An upper level class uses raw sequence of a genome (plasmid or virus) as the subject on which to…

  13. Playing with heart and soul…and genomes: sports implications and applications of personal genomics

    PubMed Central

    2013-01-01

    Whether the integration of genetic/omic technologies in sports contexts will facilitate player success, promote player safety, or spur genetic discrimination depends largely upon the game rules established by those currently designing genomic sports medicine programs. The integration has already begun, but there is not yet a playbook for best practices. Thus far discussions have focused largely on whether the integration would occur and how to prevent the integration from occurring, rather than how it could occur in such a way that maximizes benefits, minimizes risks, and avoids the exacerbation of racial disparities. Previous empirical research has identified members of the personal genomics industry offering sports-related DNA tests, and previous legal research has explored the impact of collective bargaining in professional sports as it relates to the employment protections of the Genetic Information Nondiscrimination Act (GINA). Building upon that research and upon participant observations with specific sports-related DNA tests purchased from four direct-to-consumer companies in 2011 and broader personal genomics (PGx) services, this anthropological, legal, and ethical (ALE) discussion highlights fundamental issues that must be addressed by those developing personal genomic sports medicine programs, either independently or through collaborations with commercial providers. For example, the vulnerability of student-athletes creates a number of issues that require careful, deliberate consideration. More broadly, however, this ALE discussion highlights potential sports-related implications (that ultimately might mitigate or, conversely, exacerbate racial disparities among athletes) of whole exome/genome sequencing conducted by biomedical researchers and clinicians for non-sports purposes. For example, the possibility that exome/genome sequencing of individuals who are considered to be non-patients, asymptomatic, normal, etc. will reveal the presence of variants of

  14. Playing with heart and soul…and genomes: sports implications and applications of personal genomics.

    PubMed

    Wagner, Jennifer K

    2013-01-01

    Whether the integration of genetic/omic technologies in sports contexts will facilitate player success, promote player safety, or spur genetic discrimination depends largely upon the game rules established by those currently designing genomic sports medicine programs. The integration has already begun, but there is not yet a playbook for best practices. Thus far discussions have focused largely on whether the integration would occur and how to prevent the integration from occurring, rather than how it could occur in such a way that maximizes benefits, minimizes risks, and avoids the exacerbation of racial disparities. Previous empirical research has identified members of the personal genomics industry offering sports-related DNA tests, and previous legal research has explored the impact of collective bargaining in professional sports as it relates to the employment protections of the Genetic Information Nondiscrimination Act (GINA). Building upon that research and upon participant observations with specific sports-related DNA tests purchased from four direct-to-consumer companies in 2011 and broader personal genomics (PGx) services, this anthropological, legal, and ethical (ALE) discussion highlights fundamental issues that must be addressed by those developing personal genomic sports medicine programs, either independently or through collaborations with commercial providers. For example, the vulnerability of student-athletes creates a number of issues that require careful, deliberate consideration. More broadly, however, this ALE discussion highlights potential sports-related implications (that ultimately might mitigate or, conversely, exacerbate racial disparities among athletes) of whole exome/genome sequencing conducted by biomedical researchers and clinicians for non-sports purposes. For example, the possibility that exome/genome sequencing of individuals who are considered to be non-patients, asymptomatic, normal, etc. will reveal the presence of variants of

  15. The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

    PubMed

    Droege, Marcus; Hill, Brendon

    2008-08-31

    The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer. PMID:18616967

  16. Personal genomics and individual identities: motivations and moral imperatives of early users

    PubMed Central

    McGowan, Michelle L.; Fishman, Jennifer R.; Lambrix, Marcie A.

    2010-01-01

    Since 2007, consumer genomics companies have marketed personal genome scanning services to assess users’ genetic predispositions to a variety of complex diseases and traits. This study investigates early users’ reasons for utilizing personal genome services, their evaluation of the technology, how they interpret the results, and how they incorporate the results into health-related decision-making. The analysis contextualizes early users’ relationships to the technology, the knowledge generated by it, and how it mediates their relationship to their own health and to biomedicine more broadly. The results reveal that early users approach personal genome scanning with both optimism for genomic research and scepticism about the technology’s current capabilities, which runs contrary to concerns that consumers may be ill equipped to interpret and understand genome scan results. These findings provide important qualitative insight into early users’ conceptualizations of personal genomic risk assessment and illuminate their involvement in configuring this technology in the making. PMID:21076647

  17. Integration of bioinformatics to biodegradation

    PubMed Central

    2014-01-01

    Bioinformatics and biodegradation are two primary scientific fields in applied microbiology and biotechnology. The present review describes development of various bioinformatics tools that may be applied in the field of biodegradation. Several databases, including the University of Minnesota Biocatalysis/Biodegradation database (UM-BBD), a database of biodegradative oxygenases (OxDBase), Biodegradation Network-Molecular Biology Database (Bionemo) MetaCyc, and BioCyc have been developed to enable access to information related to biochemistry and genetics of microbial degradation. In addition, several bioinformatics tools for predicting toxicity and biodegradation of chemicals have been developed. Furthermore, the whole genomes of several potential degrading bacteria have been sequenced and annotated using bioinformatics tools. PMID:24808763

  18. Similarity-based disease risk assessment for personal genomes: proof of concept.

    PubMed

    Woo, Jung Hoon; Lai, Albert M; Chung, Wendy K; Weng, Chunhua

    2011-01-01

    The increasing availability of personal genome data has led to escalating needs by consumers to understand the implications of their gene sequences. At present, poorly integrated genetic knowledge has not met these needs. This proof-of-concept study proposes a similarity-based approach to assess the disease risk predisposition for personal genomes. We hypothesize that the semantic similarity between a personal genome and a disease can indicate the disease risks in the person. We developed a knowledge network that integrates existing knowledge of genes, diseases, and symptoms from six sources using the Semantic Web standard, Resource Description Framework (RDF). We then used latent relationships between genes and diseases derived from our knowledge network to measure the semantic similarity between a personal genome and a genetic disease. For demonstration, we showed the feasibility of assessing the disease risks in one personal genome and discussed related methodology issues.

  19. Bioinformatics analysis of circulating cell-free DNA sequencing data.

    PubMed

    Chan, Landon L; Jiang, Peiyong

    2015-10-01

    The discovery of cell-free DNA molecules in plasma has opened up numerous opportunities in noninvasive diagnosis. Cell-free DNA molecules have become increasingly recognized as promising biomarkers for detection and management of many diseases. The advent of next generation sequencing has provided unprecedented opportunities to scrutinize the characteristics of cell-free DNA molecules in plasma in a genome-wide fashion and at single-base resolution. Consequently, clinical applications of circulating cell-free DNA analysis have not only revolutionized noninvasive prenatal diagnosis but also facilitated cancer detection and monitoring toward an era of blood-based personalized medicine. With the remarkably increasing throughput and lowering cost of next generation sequencing, bioinformatics analysis becomes increasingly demanding to understand the large amount of data generated by these sequencing platforms. In this Review, we highlight the major bioinformatics algorithms involved in the analysis of cell-free DNA sequencing data. Firstly, we briefly describe the biological properties of these molecules and provide an overview of the general bioinformatics approach for the analysis of cell-free DNA. Then, we discuss the specific upstream bioinformatics considerations concerning the analysis of sequencing data of circulating cell-free DNA, followed by further detailed elaboration on each key clinical situation in noninvasive prenatal diagnosis and cancer management where downstream bioinformatics analysis is heavily involved. We also discuss bioinformatics analysis as well as clinical applications of the newly developed massively parallel bisulfite sequencing of cell-free DNA. Finally, we offer our perspectives on the future development of bioinformatics in noninvasive diagnosis.

  20. Genome- wide characterization of Nuclear Factor Y (NF-Y) gene family of sorghum [Sorghum bicolor (L.) Moench]: a bioinformatics approach.

    PubMed

    Malviya, Neha; Jaiswal, Parul; Yadav, Dinesh

    2016-01-01

    Nuclear factor Y (NF-Y) is a heterotrimeric transcription factor (TF) complex with preferential binding to CCAAT elements of promoters, regulating gene expression in most of the higher eukaryotes. The availability of plant genome sequences have revealed multiple number of genes coding for the three subunits, namely NF-YA, NF-YB and NF-YC in contrast to single NF-Y gene for each subunit reported in yeast and animals. A total of 33 NF-YTF comprising of 8 NF-YA, 11 NF-YB and 14 NF-YC subunits were accessed from the sorghum genome. The bioinformatic characterization of NF-Y gene family of sorghum for gene structure, chromosome location, protein motif, phylogeny, gene duplication and in-silico expression under abiotic stresses have been attempted in the present study. The identified SbNF-Y genes are distributed on all the 10 chromosomes of sorghum with variability in the frequency and 18 out of 33 SbNF-Ys were found to be intronless. Segmental duplication event was found to be predominant feature based on gene duplication pattern study. Several orthologs and paralogs groups were disclosed through the comprehensive phylogenetic analysis of SbNF-Y proteins along with 36 Arabidopsis and 28 rice NF-Y proteins. In-silico expression analysis under abiotic stresses using rice transcriptome data revealed several of the sorghum NF-Y genes to be associated with salt, drought, cold and heat stresses.

  1. Genome-wide association study of antisocial personality disorder

    PubMed Central

    Rautiainen, M-R; Paunio, T; Repo-Tiihonen, E; Virkkunen, M; Ollila, H M; Sulkava, S; Jolanki, O; Palotie, A; Tiihonen, J

    2016-01-01

    The pathophysiology of antisocial personality disorder (ASPD) remains unclear. Although the most consistent biological finding is reduced grey matter volume in the frontal cortex, about 50% of the total liability to developing ASPD has been attributed to genetic factors. The contributing genes remain largely unknown. Therefore, we sought to study the genetic background of ASPD. We conducted a genome-wide association study (GWAS) and a replication analysis of Finnish criminal offenders fulfilling DSM-IV criteria for ASPD (N=370, N=5850 for controls, GWAS; N=173, N=3766 for controls and replication sample). The GWAS resulted in suggestive associations of two clusters of single-nucleotide polymorphisms at 6p21.2 and at 6p21.32 at the human leukocyte antigen (HLA) region. Imputation of HLA alleles revealed an independent association with DRB1*01:01 (odds ratio (OR)=2.19 (1.53–3.14), P=1.9 × 10-5). Two polymorphisms at 6p21.2 LINC00951–LRFN2 gene region were replicated in a separate data set, and rs4714329 reached genome-wide significance (OR=1.59 (1.37–1.85), P=1.6 × 10−9) in the meta-analysis. The risk allele also associated with antisocial features in the general population conditioned for severe problems in childhood family (β=0.68, P=0.012). Functional analysis in brain tissue in open access GTEx and Braineac databases revealed eQTL associations of rs4714329 with LINC00951 and LRFN2 in cerebellum. In humans, LINC00951 and LRFN2 are both expressed in the brain, especially in the frontal cortex, which is intriguing considering the role of the frontal cortex in behavior and the neuroanatomical findings of reduced gray matter volume in ASPD. To our knowledge, this is the first study showing genome-wide significant and replicable findings on genetic variants associated with any personality disorder. PMID:27598967

  2. Genome-wide association study of antisocial personality disorder.

    PubMed

    Rautiainen, M-R; Paunio, T; Repo-Tiihonen, E; Virkkunen, M; Ollila, H M; Sulkava, S; Jolanki, O; Palotie, A; Tiihonen, J

    2016-01-01

    The pathophysiology of antisocial personality disorder (ASPD) remains unclear. Although the most consistent biological finding is reduced grey matter volume in the frontal cortex, about 50% of the total liability to developing ASPD has been attributed to genetic factors. The contributing genes remain largely unknown. Therefore, we sought to study the genetic background of ASPD. We conducted a genome-wide association study (GWAS) and a replication analysis of Finnish criminal offenders fulfilling DSM-IV criteria for ASPD (N=370, N=5850 for controls, GWAS; N=173, N=3766 for controls and replication sample). The GWAS resulted in suggestive associations of two clusters of single-nucleotide polymorphisms at 6p21.2 and at 6p21.32 at the human leukocyte antigen (HLA) region. Imputation of HLA alleles revealed an independent association with DRB1*01:01 (odds ratio (OR)=2.19 (1.53-3.14), P=1.9 × 10(-5)). Two polymorphisms at 6p21.2 LINC00951-LRFN2 gene region were replicated in a separate data set, and rs4714329 reached genome-wide significance (OR=1.59 (1.37-1.85), P=1.6 × 10(-9)) in the meta-analysis. The risk allele also associated with antisocial features in the general population conditioned for severe problems in childhood family (β=0.68, P=0.012). Functional analysis in brain tissue in open access GTEx and Braineac databases revealed eQTL associations of rs4714329 with LINC00951 and LRFN2 in cerebellum. In humans, LINC00951 and LRFN2 are both expressed in the brain, especially in the frontal cortex, which is intriguing considering the role of the frontal cortex in behavior and the neuroanatomical findings of reduced gray matter volume in ASPD. To our knowledge, this is the first study showing genome-wide significant and replicable findings on genetic variants associated with any personality disorder. PMID:27598967

  3. Whole Genome Sequencing and a New Bioinformatics Platform Allow for Rapid Gene Identification in D. melanogaster EMS Screens.

    PubMed

    Gonzalez, Michael A; Van Booven, Derek; Hulme, William; Ulloa, Rick H; Lebrigio, Rafael F Acosta; Osterloh, Jeannette; Logan, Mary; Freeman, Marc; Zuchner, Stephan

    2012-01-01

    Forward genetic screens in Drosophila melanogaster using ethyl methanesulfonate (EMS) mutagenesis are a powerful approach for identifying genes that modulate specific biological processes in an in vivo setting. The mapping of genes that contain randomly-induced point mutations has become more efficient in Drosophila thanks to the maturation and availability of many types of genetic tools. However, classic approaches to gene mapping are relatively slow and ultimately require extensive Sanger sequencing of candidate chromosomal loci. With the advent of new high-throughput sequencing techniques, it is increasingly efficient to directly re-sequence the whole genome of model organisms. This approach, in combination with traditional chromosomal mapping, has the potential to greatly simplify and accelerate mutation identification in mutants generated in EMS screens. Here we show that next-generation sequencing (NGS) is an accurate and efficient tool for high-throughput sequencing and mutation discovery in Drosophila melanogaster. As a test case, mutant strains of Drosophila that exhibited long-term survival of severed peripheral axons were identified in a forward EMS mutagenesis. All mutants were recessive and fell into a single lethal complementation group, which suggested that a single gene was responsible for the protective axon degenerative phenotype. Whole genome sequencing of these genomes identified the underlying gene ect4. To improve the process of genome wide mutation identification, we developed Genomes Management Application (GEM.app, https://genomics.med.miami.edu), a graphical online user interface to a custom query framework. Using a custom GEM.app query, we were able to identify that each mutant carried a unique non-sense mutation in the gene ect4 (dSarm), which was recently shown by Osterloh et al. to be essential for the activation of axonal degeneration. Our results demonstrate the current advantages and limitations of NGS in Drosophila and we introduce

  4. Rapid Development of Bioinformatics Education in China

    ERIC Educational Resources Information Center

    Zhong, Yang; Zhang, Xiaoyan; Ma, Jian; Zhang, Liang

    2003-01-01

    As the Human Genome Project experiences remarkable success and a flood of biological data is produced, bioinformatics becomes a very "hot" cross-disciplinary field, yet experienced bioinformaticians are urgently needed worldwide. This paper summarises the rapid development of bioinformatics education in China, especially related undergraduate…

  5. Bioinformatics of prokaryotic RNAs.

    PubMed

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.

  6. Bioinformatics of prokaryotic RNAs

    PubMed Central

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880

  7. Bioinformatics of prokaryotic RNAs.

    PubMed

    Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F

    2014-01-01

    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880

  8. Crowdsourcing for bioinformatics

    PubMed Central

    Good, Benjamin M.; Su, Andrew I.

    2013-01-01

    Motivation: Bioinformatics is faced with a variety of problems that require human involvement. Tasks like genome annotation, image analysis, knowledge-base population and protein structure determination all benefit from human input. In some cases, people are needed in vast quantities, whereas in others, we need just a few with rare abilities. Crowdsourcing encompasses an emerging collection of approaches for harnessing such distributed human intelligence. Recently, the bioinformatics community has begun to apply crowdsourcing in a variety of contexts, yet few resources are available that describe how these human-powered systems work and how to use them effectively in scientific domains. Results: Here, we provide a framework for understanding and applying several different types of crowdsourcing. The framework considers two broad classes: systems for solving large-volume ‘microtasks’ and systems for solving high-difficulty ‘megatasks’. Within these classes, we discuss system types, including volunteer labor, games with a purpose, microtask markets and open innovation contests. We illustrate each system type with successful examples in bioinformatics and conclude with a guide for matching problems to crowdsourcing solutions that highlights the positives and negatives of different approaches. Contact: bgood@scripps.edu PMID:23782614

  9. Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar

    2014-12-01

    Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.

  10. Genomic insights into ayurvedic and western approaches to personalized medicine.

    PubMed

    Prasher, Bhavana; Gibson, Greg; Mukerji, Mitali

    2016-03-01

    Ayurveda, an ancient Indian system of medicine documented and practised since 1500 B.C., follows a systems approach that has interesting parallels with contemporary personalized genomic medicine approaches to the understanding and management of health and disease. It is based on the trisutra, which are the three aspects of causes, features and therapeutics that are interconnected through a common organizing principle termed 'tridosha'. Tridosha comprise three ascertainable physiological entities; vata (kinetic), pitta (metabolic) and kapha (potential) that are pervasive across systems, work in conjunction with each other, respond to the external environment and maintain homeostasis. Each individual is born with a specific proportion of tridosha that are not only genetically determined but also influenced by the environment during foetal development. Jointly they determine a person's basic constitution, which is termed their 'prakriti'. Development and progressi on of different diseases with their subtypes are thought to depend on the origin and mechanism of perturbation of the doshas, and the aim of therapeutic practice is to ensure that the doshas retain their homeostatic state. Similarly, western systems biology epitomized by translational P4 medicine envisages the integration of multiscalar genetic, cellular, physiological and environmental networks to predict phenotypic outcomes of perturbations. In this perspective article, we aim to outline the shape of a unifying scaffold that may allow the two intellectual traditions to enhance one another. Specifically, we illustrate how a unique integrative 'Ayurgenomics' approach can be used to integrate the trisutra concept of Ayurveda with genomics. We observe biochemical and molecular correlates of prakriti and show how these differ significantly in processes that are linked to intermediate patho-phenotypes, known to take different course in diseases. We also observe a significant enr ichment of the highly connected

  11. Genomic insights into ayurvedic and western approaches to personalized medicine.

    PubMed

    Prasher, Bhavana; Gibson, Greg; Mukerji, Mitali

    2016-03-01

    Ayurveda, an ancient Indian system of medicine documented and practised since 1500 B.C., follows a systems approach that has interesting parallels with contemporary personalized genomic medicine approaches to the understanding and management of health and disease. It is based on the trisutra, which are the three aspects of causes, features and therapeutics that are interconnected through a common organizing principle termed 'tridosha'. Tridosha comprise three ascertainable physiological entities; vata (kinetic), pitta (metabolic) and kapha (potential) that are pervasive across systems, work in conjunction with each other, respond to the external environment and maintain homeostasis. Each individual is born with a specific proportion of tridosha that are not only genetically determined but also influenced by the environment during foetal development. Jointly they determine a person's basic constitution, which is termed their 'prakriti'. Development and progressi on of different diseases with their subtypes are thought to depend on the origin and mechanism of perturbation of the doshas, and the aim of therapeutic practice is to ensure that the doshas retain their homeostatic state. Similarly, western systems biology epitomized by translational P4 medicine envisages the integration of multiscalar genetic, cellular, physiological and environmental networks to predict phenotypic outcomes of perturbations. In this perspective article, we aim to outline the shape of a unifying scaffold that may allow the two intellectual traditions to enhance one another. Specifically, we illustrate how a unique integrative 'Ayurgenomics' approach can be used to integrate the trisutra concept of Ayurveda with genomics. We observe biochemical and molecular correlates of prakriti and show how these differ significantly in processes that are linked to intermediate patho-phenotypes, known to take different course in diseases. We also observe a significant enr ichment of the highly connected

  12. A probabilistic disease-gene finder for personal genomes.

    PubMed

    Yandell, Mark; Huff, Chad; Hu, Hao; Singleton, Marc; Moore, Barry; Xing, Jinchuan; Jorde, Lynn B; Reese, Martin G

    2011-09-01

    VAAST (the Variant Annotation, Analysis & Search Tool) is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences. VAAST builds on existing amino acid substitution (AAS) and aggregative approaches to variant prioritization, combining elements of both into a single unified likelihood framework that allows users to identify damaged genes and deleterious variants with greater accuracy, and in an easy-to-use fashion. VAAST can score both coding and noncoding variants, evaluating the cumulative impact of both types of variants simultaneously. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases. VAAST thus has a much greater scope of use than any existing methodology. Here we demonstrate its ability to identify damaged genes using small cohorts (n = 3) of unrelated individuals, wherein no two share the same deleterious variants, and for common, multigenic diseases using as few as 150 cases.

  13. A novel pathway for the biosynthesis of heme in Archaea: genome-based bioinformatic predictions and experimental evidence.

    PubMed

    Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J; Jahn, Dieter; Layer, Gunhild

    2010-12-13

    Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d(1) biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified.

  14. Genome-Wide Profiling of RNA from Dried Blood Spots: Convergence with Bioinformatic Results Derived from Whole Venous Blood and Peripheral Blood Mononuclear Cells.

    PubMed

    McDade, Thomas W; M Ross, Kharah; L Fried, Ruby; Arevalo, Jesusa M G; Ma, Jeffrey; Miller, Gregory E; Cole, Steve W

    2016-01-01

    Genome-wide transcriptional profiling has emerged as a powerful tool for analyzing biological mechanisms underlying social gradients in health, but utilization in population-based studies has been hampered by logistical constraints and costs associated with venipuncture blood sampling. Dried blood spots (DBS) provide a minimally invasive, low-cost alternative to venipuncture, and in this article we evaluate how closely the substantive results from DBS transcriptional profiling correspond to those derived from parallel analyses of gold-standard venous blood samples (PAXgene whole blood and peripheral blood mononuclear cells [PBMC]). Analyses focused on differences in gene expression between African-Americans and Caucasians in a community sample of 82 healthy adults (age 18-70 years; mean 35). Across 19,679 named gene transcripts, DBS-derived values correlated r = .85 with both PAXgene and PBMC values. Results from bioinformatics analyses of gene expression derived from DBS samples were concordant with PAXgene and PBMC samples in identifying increased Type I interferon signaling and up-regulated activity of monocytes and natural killer (NK) cells in African-Americans compared to Caucasian participants. These findings demonstrate the feasibility of DBS in field-based studies of gene expression and encourage future studies of human transcriptome dynamics in larger, more representative samples than are possible with clinic- or lab-based research designs. PMID:27337553

  15. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Xiang, Fang; Ningqiu, Li; Xiaozhe, Fu; Kaibin, Li; Qiang, Lin; Lihui, Liu; Cunbin, Shi; Shuqin, Wu

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects. PMID:26351170

  16. [Construction and application of bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer].

    PubMed

    Xiang, Fang; Ningqiu, Li; Xiaozhe, Fu; Kaibin, Li; Qiang, Lin; Lihui, Liu; Cunbin, Shi; Shuqin, Wu

    2015-07-01

    As a key component of life science, bioinformatics has been widely applied in genomics, transcriptomics, and proteomics. However, the requirement of high-performance computers rather than common personal computers for constructing a bioinformatics platform significantly limited the application of bioinformatics in aquatic science. In this study, we constructed a bioinformatic analysis platform for aquatic pathogen based on the MilkyWay-2 supercomputer. The platform consisted of three functional modules, including genomic and transcriptomic sequencing data analysis, protein structure prediction, and molecular dynamics simulations. To validate the practicability of the platform, we performed bioinformatic analysis on aquatic pathogenic organisms. For example, genes of Flavobacterium johnsoniae M168 were identified and annotated via Blast searches, GO and InterPro annotations. Protein structural models for five small segments of grass carp reovirus HZ-08 were constructed by homology modeling. Molecular dynamics simulations were performed on out membrane protein A of Aeromonas hydrophila, and the changes of system temperature, total energy, root mean square deviation and conformation of the loops during equilibration were also observed. These results showed that the bioinformatic analysis platform for aquatic pathogen has been successfully built on the MilkyWay-2 supercomputer. This study will provide insights into the construction of bioinformatic analysis platform for other subjects.

  17. Bioinformatics Analysis of the Complete Genome Sequence of the Mango Tree Pathogen Pseudomonas syringae pv. syringae UMAF0158 Reveals Traits Relevant to Virulence and Epiphytic Lifestyle

    PubMed Central

    Arrebola, Eva; Carrión, Víctor J.; Gutiérrez-Barranquero, José Antonio; Pérez-García, Alejandro; Ramos, Cayo; Cazorla, Francisco M.; de Vicente, Antonio

    2015-01-01

    The genome sequence of more than 100 Pseudomonas syringae strains has been sequenced to date; however only few of them have been fully assembled, including P. syringae pv. syringae B728a. Different strains of pv. syringae cause different diseases and have different host specificities; so, UMAF0158 is a P. syringae pv. syringae strain related to B728a but instead of being a bean pathogen it causes apical necrosis of mango trees, and the two strains belong to different phylotypes of pv.syringae and clades of P. syringae. In this study we report the complete sequence and annotation of P. syringae pv. syringae UMAF0158 chromosome and plasmid pPSS158. A comparative analysis with the available sequenced genomes of other 25 P. syringae strains, both closed (the reference genomes DC3000, 1448A and B728a) and draft genomes was performed. The 5.8 Mb UMAF0158 chromosome has 59.3% GC content and comprises 5017 predicted protein-coding genes. Bioinformatics analysis revealed the presence of genes potentially implicated in the virulence and epiphytic fitness of this strain. We identified several genetic features, which are absent in B728a, that may explain the ability of UMAF0158 to colonize and infect mango trees: the mangotoxin biosynthetic operon mbo, a gene cluster for cellulose production, two different type III and two type VI secretion systems, and a particular T3SS effector repertoire. A mutant strain defective in the rhizobial-like T3SS Rhc showed no differences compared to wild-type during its interaction with host and non-host plants and worms. Here we report the first complete sequence of the chromosome of a pv. syringae strain pathogenic to a woody plant host. Our data also shed light on the genetic factors that possibly determine the pathogenic and epiphytic lifestyle of UMAF0158. This work provides the basis for further analysis on specific mechanisms that enable this strain to infect woody plants and for the functional analysis of host specificity in the P

  18. Multifunctionality and diversity of GDSL esterase/lipase gene family in rice (Oryza sativa L. japonica) genome: new insights from bioinformatics analysis

    PubMed Central

    2012-01-01

    Background GDSL esterases/lipases are a newly discovered subclass of lipolytic enzymes that are very important and attractive research subjects because of their multifunctional properties, such as broad substrate specificity and regiospecificity. Compared with the current knowledge regarding these enzymes in bacteria, our understanding of the plant GDSL enzymes is very limited, although the GDSL gene family in plant species include numerous members in many fully sequenced plant genomes. Only two genes from a large rice GDSL esterase/lipase gene family were previously characterised, and the majority of the members remain unknown. In the present study, we describe the rice OsGELP (Oryza sativa GDSL esterase/lipase protein) gene family at the genomic and proteomic levels, and use this knowledge to provide insights into the multifunctionality of the rice OsGELP enzymes. Results In this study, an extensive bioinformatics analysis identified 114 genes in the rice OsGELP gene family. A complete overview of this family in rice is presented, including the chromosome locations, gene structures, phylogeny, and protein motifs. Among the OsGELPs and the plant GDSL esterase/lipase proteins of known functions, 41 motifs were found that represent the core secondary structure elements or appear specifically in different phylogenetic subclades. The specification and distribution of identified putative conserved clade-common and -specific peptide motifs, and their location on the predicted protein three dimensional structure may possibly signify their functional roles. Potentially important regions for substrate specificity are highlighted, in accordance with protein three-dimensional model and location of the phylogenetic specific conserved motifs. The differential expression of some representative genes were confirmed by quantitative real-time PCR. The phylogenetic analysis, together with protein motif architectures, and the expression profiling were analysed to predict the

  19. Translational Bioinformatics and Clinical Research (Biomedical) Informatics.

    PubMed

    Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T

    2016-03-01

    Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.

  20. Translational Bioinformatics and Clinical Research (Biomedical) Informatics.

    PubMed

    Sirintrapun, S Joseph; Zehir, Ahmet; Syed, Aijazuddin; Gao, JianJiong; Schultz, Nikolaus; Cheng, Donavan T

    2015-06-01

    Translational bioinformatics and clinical research (biomedical) informatics are the primary domains related to informatics activities that support translational research. Translational bioinformatics focuses on computational techniques in genetics, molecular biology, and systems biology. Clinical research (biomedical) informatics involves the use of informatics in discovery and management of new knowledge relating to health and disease. This article details 3 projects that are hybrid applications of translational bioinformatics and clinical research (biomedical) informatics: The Cancer Genome Atlas, the cBioPortal for Cancer Genomics, and the Memorial Sloan Kettering Cancer Center clinical variants and results database, all designed to facilitate insights into cancer biology and clinical/therapeutic correlations.

  1. CGAT: a model for immersive personalized training in computational genomics.

    PubMed

    Sims, David; Ponting, Chris P; Heger, Andreas

    2016-01-01

    How should the next generation of genomics scientists be trained while simultaneously pursuing high quality and diverse research? CGAT, the Computational Genomics Analysis and Training programme, was set up in 2010 by the UK Medical Research Council to complement its investment in next-generation sequencing capacity. CGAT was conceived around the twin goals of training future leaders in genome biology and medicine, and providing much needed capacity to UK science for analysing genome scale data sets. Here we outline the training programme employed by CGAT and describe how it dovetails with collaborative research projects to launch scientists on the road towards independent research careers in genomics.

  2. CGAT: a model for immersive personalized training in computational genomics

    PubMed Central

    Sims, David; Ponting, Chris P.

    2016-01-01

    How should the next generation of genomics scientists be trained while simultaneously pursuing high quality and diverse research? CGAT, the Computational Genomics Analysis and Training programme, was set up in 2010 by the UK Medical Research Council to complement its investment in next-generation sequencing capacity. CGAT was conceived around the twin goals of training future leaders in genome biology and medicine, and providing much needed capacity to UK science for analysing genome scale data sets. Here we outline the training programme employed by CGAT and describe how it dovetails with collaborative research projects to launch scientists on the road towards independent research careers in genomics. PMID:25981124

  3. Applying genomic and bioinformatic resources to human adenovirus genomes for use in vaccine development and for applications in vector development for gene delivery.

    PubMed

    Seto, Jason; Walsh, Michael P; Mahadevan, Padmanabhan; Zhang, Qiwei; Seto, Donald

    2010-01-01

    Technological advances and increasingly cost-effect methodologies in DNA sequencing and computational analysis are providing genome and proteome data for human adenovirus research. Applying these tools, data and derived knowledge to the development of vaccines against these pathogens will provide effective prophylactics. The same data and approaches can be applied to vector development for gene delivery in gene therapy and vaccine delivery protocols. Examination of several field strain genomes and their analyses provide examples of data that are available using these approaches. An example of the development of HAdV-B3 both as a vaccine and also as a vector is presented.

  4. Integrative bioinformatics analysis of genomic and proteomic approaches to understand the transcriptional regulatory program in coronary artery disease pathways.

    PubMed

    Vangala, Rajani Kanth; Ravindran, Vandana; Ghatge, Madan; Shanker, Jayashree; Arvind, Prathima; Bindu, Hima; Shekar, Meghala; Rao, Veena S

    2013-01-01

    Patients with cardiovascular disease show a panel of differentially regulated serum biomarkers indicative of modulation of several pathways from disease onset to progression. Few of these biomarkers have been proposed for multimarker risk prediction methods. However, the underlying mechanism of the expression changes and modulation of the pathways is not yet addressed in entirety. Our present work focuses on understanding the regulatory mechanisms at transcriptional level by identifying the core and specific transcription factors that regulate the coronary artery disease associated pathways. Using the principles of systems biology we integrated the genomics and proteomics data with computational tools. We selected biomarkers from 7 different pathways based on their association with the disease and assayed 24 biomarkers along with gene expression studies and built network modules which are highly regulated by 5 core regulators PPARG, EGR1, ETV1, KLF7 and ESRRA. These network modules in turn comprise of biomarkers from different pathways showing that the core regulatory transcription factors may work together in differential regulation of several pathways potentially leading to the disease. This kind of analysis can enhance the elucidation of mechanisms in the disease and give better strategies of developing multimarker module based risk predictions.

  5. Generations of interdisciplinarity in bioinformatics

    PubMed Central

    Bartlett, Andrew; Lewis, Jamie; Williams, Matthew L.

    2016-01-01

    Bioinformatics, a specialism propelled into relevance by the Human Genome Project and the subsequent -omic turn in the life science, is an interdisciplinary field of research. Qualitative work on the disciplinary identities of bioinformaticians has revealed the tensions involved in work in this “borderland.” As part of our ongoing work on the emergence of bioinformatics, between 2010 and 2011, we conducted a survey of United Kingdom-based academic bioinformaticians. Building on insights drawn from our fieldwork over the past decade, we present results from this survey relevant to a discussion of disciplinary generation and stabilization. Not only is there evidence of an attitudinal divide between the different disciplinary cultures that make up bioinformatics, but there are distinctions between the forerunners, founders and the followers; as inter/disciplines mature, they face challenges that are both inter-disciplinary and inter-generational in nature. PMID:27453689

  6. Bioinformatic Analyses of Integral Membrane Transport Proteins Encoded Within the Genome of the Planctomycetes species, Rhodopirellula baltica

    PubMed Central

    Paparoditis, Philipp; Vastermark, Ake; Le, Andrew J.; Fuerst, John A.; Saier, Milton H.

    2013-01-01

    Rhodopirellula baltica (R. baltica) is a Planctomycete, known to have intracellular membranes. Because of its unusual cell structure and ecological significance, we have conducted comprehensive analyses of its transmembrane transport proteins. The complete proteome of R. baltica was screened against the Transporter Classification Database (TCDB) to identify recognizable integral membrane transport proteins. 342 proteins were identified with a high degree of confidence, and these fell into several different classes. R. baltica encodes in its genome channels (12%), secondary carriers (33%), and primary active transport proteins (41%) in addition to classes represented in smaller numbers. Relative to most non-marine bacteria, R. baltica possesses a larger number of sodium-dependent symporters but fewer proton-dependent symporters, and it has dimethylsulfoxide (DMSO) and trimethyl-amine-oxide (TMAO) reductases, consistent with its Na+-rich marine environment. R. baltica also possesses a Na+-translocating NADH:quinone dehydrogenase (Na+-NDH), a Na+ efflux decarboxylase, two Na+-exporting ABC pumps, two Na+-translocating F-type ATPases, two Na+:H+ antiporters and two K+:H+ antiporters. Flagellar motility probably depends on the sodium electrochemical gradient. Surprisingly, R. baltica also has a complete set of H+-translocating electron transport complexes similar to those present in β-proteobacteria and eukaryotic mitochondria. The transport proteins identified proved to be typical of the bacterial domain with little or no indication of the presence of eukaryotic-type transporters. However, novel functionally uncharacterized multispanning membrane proteins were identified, some of which are found only in Rhodopirellula species, but others of which are widely distributed in bacteria. The analyses lead to predictions regarding the physiology, ecology and evolution of R. baltica. PMID:23969110

  7. Bioinformatic analyses of integral membrane transport proteins encoded within the genome of the planctomycetes species, Rhodopirellula baltica.

    PubMed

    Paparoditis, Philipp; Västermark, Ake; Le, Andrew J; Fuerst, John A; Saier, Milton H

    2014-01-01

    Rhodopirellula baltica (R. baltica) is a Planctomycete, known to have intracellular membranes. Because of its unusual cell structure and ecological significance, we have conducted comprehensive analyses of its transmembrane transport proteins. The complete proteome of R. baltica was screened against the Transporter Classification Database (TCDB) to identify recognizable integral membrane transport proteins. 342 proteins were identified with a high degree of confidence, and these fell into several different classes. R. baltica encodes in its genome channels (12%), secondary carriers (33%), and primary active transport proteins (41%) in addition to classes represented in smaller numbers. Relative to most non-marine bacteria, R. baltica possesses a larger number of sodium-dependent symporters but fewer proton-dependent symporters, and it has dimethylsulfoxide (DMSO) and trimethyl-amine-oxide (TMAO) reductases, consistent with its Na(+)-rich marine environment. R. baltica also possesses a Na(+)-translocating NADH:quinone dehydrogenase (Na(+)-NDH), a Na(+) efflux decarboxylase, two Na(+)-exporting ABC pumps, two Na(+)-translocating F-type ATPases, two Na(+):H(+) antiporters and two K(+):H(+) antiporters. Flagellar motility probably depends on the sodium electrochemical gradient. Surprisingly, R. baltica also has a complete set of H(+)-translocating electron transport complexes similar to those present in α-proteobacteria and eukaryotic mitochondria. The transport proteins identified proved to be typical of the bacterial domain with little or no indication of the presence of eukaryotic-type transporters. However, novel functionally uncharacterized multispanning membrane proteins were identified, some of which are found only in Rhodopirellula species, but others of which are widely distributed in bacteria. The analyses lead to predictions regarding the physiology, ecology and evolution of R. baltica. PMID:23969110

  8. Bioinformatics meets clinical informatics.

    PubMed

    Smith, Jeremy; Protti, Denis

    2005-01-01

    The field of bioinformatics has exploded over the past decade. Hopes have run high for the impact on preventive, diagnostic, and therapeutic capabilities of genomics and proteomics. As time has progressed, so has our understanding of this field. Although the mapping of the human genome will certainly have an impact on health care, it is a complex web to unweave. Addressing simpler "Single Nucleotide Polymorphisms" (SNPs) is not new, however, the complexity and importance of polygenic disorders and the greater role of the far more complex field of proteomics has become more clear. Proteomics operates much closer to the actual cellular level of human structure and proteins are very sensitive markers of health. Because the proteome, however, is so much more complex than the genome, and changes with time and environmental factors, mapping it and using the data in direct care delivery is even harder than for the genome. For these reasons of complexity, the expected utopia of a single gene chip or protein chip capable of analyzing an individual's genetic make-up and producing a cornucopia of useful diagnostic information appears still a distant hope. When, and if, this happens, perhaps a genetic profile of each individual will be stored with their medical record; however, in the mean time, this type of information is unlikely to prove highly useful on a broad scale. To address the more complex "polygenic" diseases and those related to protein variations, other tools will be developed in the shorter term. "Top-down" analysis of populations and diseases is likely to produce earlier wins in this area. Detailed computer-generated models will map a wide array of human and environmental factors that indicate the presence of a disease or the relative impact of a particular treatment. These models may point to an underlying genomic or proteomic cause, for which genomic or proteomic testing or therapies could then be applied for confirmation and/or treatment. These types of

  9. Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium.

    PubMed

    Linderman, Michael D; Nielsen, Daiva E; Green, Robert C

    2016-03-25

    Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data.

  10. Personal Genome Sequencing in Ostensibly Healthy Individuals and the PeopleSeq Consortium

    PubMed Central

    Linderman, Michael D.; Nielsen, Daiva E.; Green, Robert C.

    2016-01-01

    Thousands of ostensibly healthy individuals have had their exome or genome sequenced, but a much smaller number of these individuals have received any personal genomic results from that sequencing. We term those projects in which ostensibly healthy participants can receive sequencing-derived genetic findings and may also have access to their genomic data as participatory predispositional personal genome sequencing (PPGS). Here we are focused on genome sequencing applied in a pre-symptomatic context and so define PPGS to exclude diagnostic genome sequencing intended to identify the molecular cause of suspected or diagnosed genetic disease. In this report we describe the design of completed and underway PPGS projects, briefly summarize the results reported to date and introduce the PeopleSeq Consortium, a newly formed collaboration of PPGS projects designed to collect much-needed longitudinal outcome data. PMID:27023617

  11. Personalized health care in 2013: a status report on the impact of genomics.

    PubMed

    Snyderman, Ralph

    2013-01-01

    This issue of the NCMJ describes the impact that genomics has had on the practice of medicine in the decade since the full sequencing of the human genome was completed in 2003. Specifically, it reports on how genomics is affecting health care delivery, describes the concept of personalized health care, and discusses the role that genomics plays in such care. The commentaries and sidebars that follow highlight the opportunities and challenges of bringing genomics into clinical practice. Reading these articles will hopefully give clinicians and others a better understanding of the benefits and limitations of genomic technologies. Emerging capabilities, resulting in part from genomic research, are providing an opportunity to move health care from a reactive, disease-focused model to one that is personalized, predictive, proactive, precise, and patient-centered. Genomics and related technologies have already changed many approaches to care, particularly in the field of oncology, and I believe they will help to transform our overall approach to the delivery of health care. With the rapidly accumulating capabilities being developed and the focus on patient-centered and personalized care, I expect that the practice of medicine will become proactive and personalized within the next decade.

  12. Informing the Design of Direct-to-Consumer Interactive Personal Genomics Reports

    PubMed Central

    Shaer, Orit; Okerlund, Johanna; Balestra, Martina; Stowell, Elizabeth; Ascher, Laura; Bi, Joanna; Schlenker, Claire; Ball, Madeleine

    2015-01-01

    Background In recent years, people who sought direct-to-consumer genetic testing services have been increasingly confronted with an unprecedented amount of personal genomic information, which influences their decisions, emotional state, and well-being. However, these users of direct-to-consumer genetic services, who vary in their education and interests, frequently have little relevant experience or tools for understanding, reasoning about, and interacting with their personal genomic data. Online interactive techniques can play a central role in making personal genomic data useful for these users. Objective We sought to (1) identify the needs of diverse users as they make sense of their personal genomic data, (2) consequently develop effective interactive visualizations of genomic trait data to address these users’ needs, and (3) evaluate the effectiveness of the developed visualizations in facilitating comprehension. Methods The first two user studies, conducted with 63 volunteers in the Personal Genome Project and with 36 personal genomic users who participated in a design workshop, respectively, employed surveys and interviews to identify the needs and expectations of diverse users. Building on the two initial studies, the third study was conducted with 730 Amazon Mechanical Turk users and employed a controlled experimental design to examine the effectiveness of different design interventions on user comprehension. Results The first two studies identified searching, comparing, sharing, and organizing data as fundamental to users’ understanding of personal genomic data. The third study demonstrated that interactive and visual design interventions could improve the understandability of personal genomic reports for consumers. In particular, results showed that a new interactive bubble chart visualization designed for the study resulted in the highest comprehension scores, as well as the highest perceived comprehension scores. These scores were significantly

  13. Motivations and Perceptions of Early Adopters of Personalized Genomics: Perspectives from Research Participants

    PubMed Central

    Gollust, S.E.; Gordon, E.S.; Zayac, C.; Griffin, G.; Christman, M.F.; Pyeritz, R.E.; Wawak, L.; Bernhardt, B.A.

    2011-01-01

    Background/Aims: To predict the potential public health impact of personal genomics, empirical research on public perceptions of these services is needed. In this study, ‘early adopters’ of personal genomics were surveyed to assess their motivations, perceptions and intentions. Methods: Participants were recruited from everyone who registered to attend an enrollment event for the Coriell Personalized Medicine Collaborative, a United States-based (Camden, N.J.) research study of the utility of personalized medicine, between March 31, 2009 and April 1, 2010 (n = 369). Participants completed an Internet-based survey about their motivations, awareness of personalized medicine, perceptions of study risks and benefits, and intentions to share results with health care providers. Results: Respondents were motivated to participate for their own curiosity and to find out their disease risk to improve their health. Fewer than 10% expressed deterministic perspectives about genetic risk, but 32% had misperceptions about the research study or personal genomic testing. Most respondents perceived the study to have health-related benefits. Nearly all (92%) intended to share their results with physicians, primarily to request specific medical recommendations. Conclusion: Early adopters of personal genomics are prospectively enthusiastic about using genomic profiling information to improve their health, in close consultation with their physicians. This suggests that early users (i.e. through direct-to-consumer companies or research) may follow up with the health care system. Further research should address whether intentions to seek care match actual behaviors. PMID:21654153

  14. Taking Bioinformatics to Systems Medicine.

    PubMed

    van Kampen, Antoine H C; Moerland, Perry D

    2016-01-01

    Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.

  15. Design, methods, and participant characteristics of the Impact of Personal Genomics (PGen) Study, a prospective cohort study of direct-to-consumer personal genomic testing customers.

    PubMed

    Carere, Deanna Alexis; Couper, Mick P; Crawford, Scott D; Kalia, Sarah S; Duggan, Jake R; Moreno, Tanya A; Mountain, Joanna L; Roberts, J Scott; Green, Robert C

    2014-01-01

    Designed in collaboration with 23andMe and Pathway Genomics, the Impact of Personal Genomics (PGen) Study serves as a model for academic-industry partnership and provides a longitudinal dataset for studying psychosocial, behavioral, and health outcomes related to direct-to-consumer personal genomic testing (PGT). Web-based surveys administered at three time points, and linked to individual-level PGT results, provide data on 1,464 PGT customers, of which 71% completed each follow-up survey and 64% completed all three surveys. The cohort includes 15.7% individuals of non-white ethnicity, and encompasses a range of income, education, and health levels. Over 90% of participants agreed to re-contact for future research. PMID:25484922

  16. Bioinformatics meets parasitology.

    PubMed

    Cantacessi, C; Campbell, B E; Jex, A R; Young, N D; Hall, R S; Ranganathan, S; Gasser, R B

    2012-05-01

    The advent and integration of high-throughput '-omics' technologies (e.g. genomics, transcriptomics, proteomics, metabolomics, glycomics and lipidomics) are revolutionizing the way biology is done, allowing the systems biology of organisms to be explored. These technologies are now providing unique opportunities for global, molecular investigations of parasites. For example, studies of a transcriptome (all transcripts in an organism, tissue or cell) have become instrumental in providing insights into aspects of gene expression, regulation and function in a parasite, which is a major step to understanding its biology. The purpose of this article was to review recent applications of next-generation sequencing technologies and bioinformatic tools to large-scale investigations of the transcriptomes of parasitic nematodes of socio-economic significance (particularly key species of the order Strongylida) and to indicate the prospects and implications of these explorations for developing novel methods of parasite intervention.

  17. Microbial bioinformatics 2020.

    PubMed

    Pallen, Mark J

    2016-09-01

    Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier! PMID:27471065

  18. Genomic medicine, precision medicine, personalized medicine: what's in a name?

    PubMed

    Roden, D M; Tyndale, R F

    2013-08-01

    This issue of Clinical Pharmacology & Therapeutics is devoted to genomic medicine, and a reader may reasonably ask what we mean when we use those words. In the initial issue of the journal Genomics in 1987, McKusick and Ruddle pointed out that the descriptor "genome" had been coined in 1920 as a hybrid of "gene" and "chromosome," and that their new journal would focus on the "newly-developing discipline of mapping/sequencing (including analysis of the information)." A key milestone in the field was the generation of the first draft of a human genome in 2000, but this success really represents only one of many milestones in the journey from Mendel to MiSeq.

  19. Privacy Preserving PCA on Distributed Bioinformatics Datasets

    ERIC Educational Resources Information Center

    Li, Xin

    2011-01-01

    In recent years, new bioinformatics technologies, such as gene expression microarray, genome-wide association study, proteomics, and metabolomics, have been widely used to simultaneously identify a huge number of human genomic/genetic biomarkers, generate a tremendously large amount of data, and dramatically increase the knowledge on human…

  20. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    PubMed

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  1. Basic principles of yeast genomics, a personal recollection.

    PubMed

    Dujon, Bernard

    2015-08-01

    The genomes of many yeast species or strain isolates have now been sequenced with an accelerating momentum that quickly relegates initial data to history, albeit that they are less than two decades old. Today, novel yeast genomes are entirely sequenced for a variety of reasons, often only to identify a few expected genes of specific interest, thus providing a wealth of data, heterogenous in quality and completion but informative about the origin and evolution of this heterogeneous collection of unicellular modern fungi. However, how many scientists fully appreciate the important conceptual and technological roles played by yeasts in the extraordinary development of today's genomics? Novel notions of general significance emerged from the very first eukaryote sequenced, Saccharomyces cerevisiae, and were successively refined and extended over time. Tools with general applications were originally developed with this yeast; and surprises emerged from the results. Here, I have tried to recollect the gradual building up of knowledge as yeast genomics developed, and then briefly summarize our present views about the basic nature of yeast genomes, based on the most recent data.

  2. Personal utility is inherent to direct-to-consumer genomic testing.

    PubMed

    Chung, Matthew Wai Heng; Ng, Joseph Chi Fung

    2016-10-01

    People for and against direct-to-consumer (DTC) genomic tests are arguing around two issues: first, on whether an autonomy-based account can justify the tests; second, on whether the tests bring any personal utility. Bunnik et al, in an article published in this journal, were doubtful on the latter, especially in clinically irrelevant and uninterpretable sequences, and how far this claim could go in the justification. Here we argue that personal utility is inherent to DTC genomic tests and their results. We discuss Bunnik et al's account of personal utility and identify problems in its motivation and application. We then explore concepts like utility and entertainment which suggest that DTC genomic tests bring personal utility to their consumers, both in the motivation and the content of the tests. This points to an alternative account of personal utility which entails that entertainment value alone is adequate to justify DTC genomic tests, given appropriate strategies to communicate tests results with the consumers. It supports the autonomy-based justification of the test by showing that DTC genomic test itself stands as a valuable option and facilitates meaningful choice of the people.

  3. Simultaneous Whole Mitochondrial Genome Sequencing with Short Overlapping Amplicons Suitable for Degraded DNA Using the Ion Torrent Personal Genome Machine

    PubMed Central

    Chaitanya, Lakshmi; Ralf, Arwin; van Oven, Mannis; Kupiec, Tomasz; Chang, Joseph; Lagacé, Robert

    2015-01-01

    ABSTRACT Whole mitochondrial (mt) genome analysis enables a considerable increase in analysis throughput, and improves the discriminatory power to the maximum possible phylogenetic resolution. Most established protocols on the different massively parallel sequencing (MPS) platforms, however, invariably involve the PCR amplification of large fragments, typically several kilobases in size, which may fail due to mtDNA fragmentation in the available degraded materials. We introduce a MPS tiling approach for simultaneous whole human mt genome sequencing using 161 short overlapping amplicons (average 200 bp) with the Ion Torrent Personal Genome Machine. We illustrate the performance of this new method by sequencing 20 DNA samples belonging to different worldwide mtDNA haplogroups. Additional quality control, particularly regarding the potential detection of nuclear insertions of mtDNA (NUMTs), was performed by comparative MPS analysis using the conventional long‐range amplification method. Preliminary sensitivity testing revealed that detailed haplogroup inference was feasible with 100 pg genomic input DNA. Complete mt genome coverage was achieved from DNA samples experimentally degraded down to genomic fragment sizes of about 220 bp, and up to 90% coverage from naturally degraded samples. Overall, we introduce a new approach for whole mt genome MPS analysis from degraded and nondegraded materials relevant to resolve and infer maternal genetic ancestry at complete resolution in anthropological, evolutionary, medical, and forensic applications. PMID:26387877

  4. Simultaneous Whole Mitochondrial Genome Sequencing with Short Overlapping Amplicons Suitable for Degraded DNA Using the Ion Torrent Personal Genome Machine.

    PubMed

    Chaitanya, Lakshmi; Ralf, Arwin; van Oven, Mannis; Kupiec, Tomasz; Chang, Joseph; Lagacé, Robert; Kayser, Manfred

    2015-12-01

    Whole mitochondrial (mt) genome analysis enables a considerable increase in analysis throughput, and improves the discriminatory power to the maximum possible phylogenetic resolution. Most established protocols on the different massively parallel sequencing (MPS) platforms, however, invariably involve the PCR amplification of large fragments, typically several kilobases in size, which may fail due to mtDNA fragmentation in the available degraded materials. We introduce a MPS tiling approach for simultaneous whole human mt genome sequencing using 161 short overlapping amplicons (average 200 bp) with the Ion Torrent Personal Genome Machine. We illustrate the performance of this new method by sequencing 20 DNA samples belonging to different worldwide mtDNA haplogroups. Additional quality control, particularly regarding the potential detection of nuclear insertions of mtDNA (NUMTs), was performed by comparative MPS analysis using the conventional long-range amplification method. Preliminary sensitivity testing revealed that detailed haplogroup inference was feasible with 100 pg genomic input DNA. Complete mt genome coverage was achieved from DNA samples experimentally degraded down to genomic fragment sizes of about 220 bp, and up to 90% coverage from naturally degraded samples. Overall, we introduce a new approach for whole mt genome MPS analysis from degraded and nondegraded materials relevant to resolve and infer maternal genetic ancestry at complete resolution in anthropological, evolutionary, medical, and forensic applications.

  5. [Ethical issues of personal genome: a legal perspective--ethical and legal ramifications of personal genome research].

    PubMed

    Maruyama, Eiji

    2009-06-01

    Whole-genome research projects, especially those involving whole-genome sequencing, tend to raise intractable ethical and legal challenges. In this kind of research, genetic and genomic data obtained by typing or sequencing are usually put in open or limited access scientific databases on the Internet to promote studies by many researchers. Once data become available on the Internet, it will be virtually meaningless to withdraw the information, effectively nullifying participants' right to revoke consent. Although the author favors the governance system that will assure research subjects of the right to withdraw their participation, considering these characteristics of whole-genome research, he finds those recommendations offered in Caulfield T, et al: Research ethics recommendations for whole-genome research: Consensus statement. PLoS Biol 6(3): e73(2008), especially to the effect that the consent process should include information about data security and the governance structure and, in particular, the mechanism for considering future research protocols, well reasoned and acceptable. PMID:19507516

  6. [Ethical issues of personal genome: a legal perspective--ethical and legal ramifications of personal genome research].

    PubMed

    Maruyama, Eiji

    2009-06-01

    Whole-genome research projects, especially those involving whole-genome sequencing, tend to raise intractable ethical and legal challenges. In this kind of research, genetic and genomic data obtained by typing or sequencing are usually put in open or limited access scientific databases on the Internet to promote studies by many researchers. Once data become available on the Internet, it will be virtually meaningless to withdraw the information, effectively nullifying participants' right to revoke consent. Although the author favors the governance system that will assure research subjects of the right to withdraw their participation, considering these characteristics of whole-genome research, he finds those recommendations offered in Caulfield T, et al: Research ethics recommendations for whole-genome research: Consensus statement. PLoS Biol 6(3): e73(2008), especially to the effect that the consent process should include information about data security and the governance structure and, in particular, the mechanism for considering future research protocols, well reasoned and acceptable.

  7. Bioinformatics and Moonlighting Proteins.

    PubMed

    Hernández, Sergio; Franco, Luís; Calvo, Alejandra; Ferragut, Gabriela; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2015-01-01

    Multitasking or moonlighting is the capability of some proteins to execute two or more biochemical functions. Usually, moonlighting proteins are experimentally revealed by serendipity. For this reason, it would be helpful that Bioinformatics could predict this multifunctionality, especially because of the large amounts of sequences from genome projects. In the present work, we analyze and describe several approaches that use sequences, structures, interactomics, and current bioinformatics algorithms and programs to try to overcome this problem. Among these approaches are (a) remote homology searches using Psi-Blast, (b) detection of functional motifs and domains, (c) analysis of data from protein-protein interaction databases (PPIs), (d) match the query protein sequence to 3D databases (i.e., algorithms as PISITE), and (e) mutation correlation analysis between amino acids by algorithms as MISTIC. Programs designed to identify functional motif/domains detect mainly the canonical function but usually fail in the detection of the moonlighting one, Pfam and ProDom being the best methods. Remote homology search by Psi-Blast combined with data from interactomics databases (PPIs) has the best performance. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can only be used in very specific situations - it requires the existence of multialigned family protein sequences - but can suggest how the evolutionary process of second function acquisition took place. The multitasking protein database MultitaskProtDB (http://wallace.uab.es/multitask/), previously published by our group, has been used as a benchmark for the all of the analyses. PMID:26157797

  8. Getting Personal: Head and Neck Cancer Management in the Era of Genomic Medicine

    PubMed Central

    Birkeland, Andrew C.; Uhlmann, Wendy R.; Brenner, J. Chad; Shuman, Andrew G.

    2015-01-01

    Background Genetic testing is rapidly becoming an important tool in the management of patients with head and neck cancer. As we enter the era of genomics and personalized medicine, providers should be aware of testing options, counseling resources, and the benefits, limitations and future of personalized therapy. Methods This manuscript offers a primer to assist clinicians treating patients in anticipating and managing the inherent practical and ethical challenges of cancer care in the genomic era. Results Clinical applications of genomics for head and neck cancer are emerging. We discuss the indications for genetic testing, types of testing available, implications for care, privacy/disclosure concerns and ethical considerations. Hereditary genetic syndromes associated with head and neck neoplasms are reviewed, and online genetics resources are provided. Conclusions This article summarizes and contextualizes the evolving diagnostic and therapeutic options that impact the care of patients with head and neck cancer in the genomic era. PMID:25995036

  9. Genome Science and Personalized Cancer Treatment (LBNL Summer Lecture Series)

    SciTech Connect

    Gray, Joe

    2009-08-04

    Summer Lecture Series 2009: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  10. Genome Science and Personalized Cancer Treatment (LBNL Summer Lecture Series)

    ScienceCinema

    Gray, Joe

    2016-07-12

    Summer Lecture Series 2009: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  11. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

    PubMed

    Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

    2012-10-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.

  12. Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

    PubMed Central

    Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

    2012-01-01

    Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495

  13. Attitudes towards personal genomics among older Swiss adults: An exploratory study

    PubMed Central

    Mählmann, Laura; Röcke, Christina; Brand, Angela; Hafen, Ernst; Vayena, Effy

    2016-01-01

    Objectives To explore attitudes of Swiss older adults towards personal genomics (PG). Methods Using an anonymized voluntary paper-and-pencil survey, data were collected from 151 men and women aged 60–89 years attending the Seniorenuniversität Zurich, Switzerland (Seniors' University). Analyses were conducted using descriptive and inferential statistics. Results One third of the respondents were aware of PG, and more than half indicated interest in undergoing PG testing. The primary motivation provided was respondents' interest in finding out about their own disease risk, followed by willingness to contribute to scientific research. Forty-four percent were not interested in undergoing testing because results might be worrisome, or due to concerns about the validity of the results. Only a minority of respondents mentioned privacy-related concerns. Further, 66% were interested in undergoing clinic-based PG motivated by the opportunity to contribute to scientific research (78%) and 75% of all study participants indicated strong preferences to donate genomic data to public research institutions. Conclusion This study indicates a relatively positive overall attitude towards personal genomic testing among older Swiss adults, a group not typically represented in surveys about personal genomics. Genomic data of older adults can be highly relevant to late life health and maintenance of quality of life. In addition they can be an invaluable source for better understanding of longevity, health and disease. Understanding the attitudes of this population towards genomic analyses, although important, remains under-examined. PMID:27047754

  14. Knowledge and attitudes to personal genomics testing for complex diseases among Nigerians

    PubMed Central

    2014-01-01

    Background The study examined the knowledge and attitudes to personal genomics testing for complex diseases among Nigerians and identified how the knowledge and attitudes vary with gender, age, religion, education and related factors. Methods Data were collected using qualitative method in 2 districts of the Federal Capital Territory. In the study, eight (8) Focused Group Discussions (FGDs) and twenty seven (27) Key Informant Interviews (KIIs) were conducted. Participants for the research were recruited among healthy Nigerians, individuals with complex diseases, health care professionals, community leaders and health policy makers. Result Analysis of the result showed that most respondents in both FGDs and KIIs had limited knowledge about genomics test initially. Their understanding of the test however improved after explanation on its concept. Participants showed positive attitude towards genomics tests. Nevertheless they expressed fear over direct to consumer personal genomics testing, testing unborn babies and disclosure of results to third parties. Culture and religion were found to influence the perspectives of respondents on genomics test particularly those aspects that could either directly contradict their beliefs and practices or lead to actions which contradict them. Conclusion In conclusion, most Nigerians interviewed had limited knowledge of genomics test but with supportive attitude towards its use in predicting future risk of complex diseases after understanding the test concept. Genomics testing for complex diseases was not a common practice in Nigeria. PMID:24766930

  15. Computational biology and bioinformatics in Nigeria.

    PubMed

    Fatumo, Segun A; Adoga, Moses P; Ojo, Opeolu O; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-04-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries.

  16. Computational Biology and Bioinformatics in Nigeria

    PubMed Central

    Fatumo, Segun A.; Adoga, Moses P.; Ojo, Opeolu O.; Oluwagbemi, Olugbenga; Adeoye, Tolulope; Ewejobi, Itunuoluwa; Adebiyi, Marion; Adebiyi, Ezekiel; Bewaji, Clement; Nashiru, Oyekanmi

    2014-01-01

    Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological data generated by the scientific community. The critical need to process and analyze such a deluge of data and turn it into useful knowledge has caused bioinformatics to gain prominence and importance. Bioinformatics is an interdisciplinary research area that applies techniques, methodologies, and tools in computer and information science to solve biological problems. In Nigeria, bioinformatics has recently played a vital role in the advancement of biological sciences. As a developing country, the importance of bioinformatics is rapidly gaining acceptance, and bioinformatics groups comprised of biologists, computer scientists, and computer engineers are being constituted at Nigerian universities and research institutes. In this article, we present an overview of bioinformatics education and research in Nigeria. We also discuss professional societies and academic and research institutions that play central roles in advancing the discipline in Nigeria. Finally, we propose strategies that can bolster bioinformatics education and support from policy makers in Nigeria, with potential positive implications for other developing countries. PMID:24763310

  17. openSNP–A Crowdsourced Web Resource for Personal Genomics

    PubMed Central

    Greshake, Bastian; Bayer, Philipp E.; Rausch, Helge; Reda, Julia

    2014-01-01

    Genome-Wide Association Studies are widely used to correlate phenotypic traits with genetic variants. These studies usually compare the genetic variation between two groups to single out certain Single Nucleotide Polymorphisms (SNPs) that are linked to a phenotypic variation in one of the groups. However, it is necessary to have a large enough sample size to find statistically significant correlations. Direct-To-Consumer (DTC) genetic testing can supply additional data: DTC-companies offer the analysis of a large amount of SNPs for an individual at low cost without the need to consult a physician or geneticist. Over 100,000 people have already been genotyped through Direct-To-Consumer genetic testing companies. However, this data is not public for a variety of reasons and thus cannot be used in research. It seems reasonable to create a central open data repository for such data. Here we present the web platform openSNP, an open database which allows participants of Direct-To-Consumer genetic testing to publish their genetic data at no cost along with phenotypic information. Through this crowdsourced effort of collecting genetic and phenotypic information, openSNP has become a resource for a wide area of studies, including Genome-Wide Association Studies. openSNP is hosted at http://www.opensnp.org, and the code is released under MIT-license at http://github.com/gedankenstuecke/snpr. PMID:24647222

  18. Sequencing and analysis of a South Asian-Indian personal genome

    PubMed Central

    2012-01-01

    Background With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. Results We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance. Conclusions This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population. PMID:22938532

  19. BreCAN-DB: a repository cum browser of personalized DNA breakpoint profiles of cancer genomes.

    PubMed

    Narang, Pankaj; Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    BreCAN-DB (http://brecandb.igib.res.in) is a repository cum browser of whole genome somatic DNA breakpoint profiles of cancer genomes, mapped at single nucleotide resolution using deep sequencing data. These breakpoints are associated with deletions, insertions, inversions, tandem duplications, translocations and a combination of these structural genomic alterations. The current release of BreCAN-DB features breakpoint profiles from 99 cancer-normal pairs, comprising five cancer types. We identified DNA breakpoints across genomes using high-coverage next-generation sequencing data obtained from TCGA and dbGaP. Further, in these cancer genomes, we methodically identified breakpoint hotspots which were significantly enriched with somatic structural alterations. To visualize the breakpoint profiles, a next-generation genome browser was integrated with BreCAN-DB. Moreover, we also included previously reported breakpoint profiles from 138 cancer-normal pairs, spanning 10 cancer types into the browser. Additionally, BreCAN-DB allows one to identify breakpoint hotspots in user uploaded data set. We have also included a functionality to query overlap of any breakpoint profile with regions of user's interest. Users can download breakpoint profiles from the database or may submit their data to be integrated in BreCAN-DB. We believe that BreCAN-DB will be useful resource for genomics scientific community and is a step towards personalized cancer genomics. PMID:26586806

  20. BreCAN-DB: a repository cum browser of personalized DNA breakpoint profiles of cancer genomes

    PubMed Central

    Narang, Pankaj; Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    BreCAN-DB (http://brecandb.igib.res.in) is a repository cum browser of whole genome somatic DNA breakpoint profiles of cancer genomes, mapped at single nucleotide resolution using deep sequencing data. These breakpoints are associated with deletions, insertions, inversions, tandem duplications, translocations and a combination of these structural genomic alterations. The current release of BreCAN-DB features breakpoint profiles from 99 cancer-normal pairs, comprising five cancer types. We identified DNA breakpoints across genomes using high-coverage next-generation sequencing data obtained from TCGA and dbGaP. Further, in these cancer genomes, we methodically identified breakpoint hotspots which were significantly enriched with somatic structural alterations. To visualize the breakpoint profiles, a next-generation genome browser was integrated with BreCAN-DB. Moreover, we also included previously reported breakpoint profiles from 138 cancer-normal pairs, spanning 10 cancer types into the browser. Additionally, BreCAN-DB allows one to identify breakpoint hotspots in user uploaded data set. We have also included a functionality to query overlap of any breakpoint profile with regions of user's interest. Users can download breakpoint profiles from the database or may submit their data to be integrated in BreCAN-DB. We believe that BreCAN-DB will be useful resource for genomics scientific community and is a step towards personalized cancer genomics. PMID:26586806

  1. Bioinformatics: promises and progress.

    PubMed

    Gupta, Shipra; Misra, Gauri; Khurana, S M Paul

    2015-01-01

    Bioinformatics is a multidisciplinary science that solves and analyzes biological problems. With the quantum explosion in biomedical data, the demand of bioinformatics has increased gradually. Present paper provides an overview of various ways through which the biologists or biological researchers in the domain of neurology, structural and functional biology, evolutionary biology, clinical science, etc., use bioinformatics applications for data analysis to summarise their research. A new perspective is used to classify the knowledge available in the field thus will help general audience to understand the application of bioinformatics.

  2. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    PubMed Central

    Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students’ attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  3. A survey of scholarly literature describing the field of bioinformatics education and bioinformatics educational research.

    PubMed

    Magana, Alejandra J; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the potential advancement of research and development in complex biomedical systems has created a need for an educated workforce in bioinformatics. However, effectively integrating bioinformatics education through formal and informal educational settings has been a challenge due in part to its cross-disciplinary nature. In this article, we seek to provide an overview of the state of bioinformatics education. This article identifies: 1) current approaches of bioinformatics education at the undergraduate and graduate levels; 2) the most common concepts and skills being taught in bioinformatics education; 3) pedagogical approaches and methods of delivery for conveying bioinformatics concepts and skills; and 4) assessment results on the impact of these programs, approaches, and methods in students' attitudes or learning. Based on these findings, it is our goal to describe the landscape of scholarly work in this area and, as a result, identify opportunities and challenges in bioinformatics education. PMID:25452484

  4. Genomics and epigenomics: new promises of personalized medicine for cancer patients.

    PubMed

    Schweiger, Michal-Ruth; Barmeyer, Christian; Timmermann, Bernd

    2013-09-01

    Recent years have brought about a marked extension of our understanding of the somatic basis of cancer. Parallel to the large-scale investigation of diverse tumor genomes the knowledge arose that cancer pathologies are most often not restricted to single genomic events. In contrast, a large number of different alterations in the genomes and epigenomes come together and promote the malignant transformation. The combination of mutations, structural variations and epigenetic alterations differs between each tumor, making individual diagnosis and treatment strategies necessary. This view is summarized in the new discipline of personalized medicine. To satisfy the ideas of this approach each tumor needs to be fully characterized and individual diagnostic and therapeutic strategies designed. Here, we will discuss the power of high-throughput sequencing technologies for genomic and epigenomic analyses. We will provide insight into the current status and how these technologies can be transferred to routine clinical usage.

  5. Direct-to-consumer personal genomic testing: a case study and practical recommendations for “genomic counseling”.

    PubMed

    Sturm, Amy C; Manickam, Kandamurugu

    2012-06-01

    Technological advances and information-seeking consumers have pushed forward the movement of direct-to consumer(DTC) genetic testing. Just like with other types of testing, there are potential risks, benefits and limitations. A major limitation of DTC testing is the incomplete view it provides regarding lifetime risk for common, complex diseases,since most tests only analyze 1–2 single nucleotide polymorphisms (SNPs) and do not include evaluation of medical or family histories, which is necessary to risk assessment. Further, it is not currently well-established whether personal genomic testing results will lead toward improved health behaviors, adverse psychological effects or potential overuse of the health care system. To display these and other issues, we present an in-depth case study of an individual who ordered DTC genetic testing and subsequently sought genetic counseling. This case presents a unique learning experience for the field of genomic counseling, as the patient did not fit the typical assumptions regarding ‘early adopters’ of DTC testing. It also allowed the genetics health care providers involved in the case to identify gaps in current genetic counseling practice that need to be filled and approaches to employ for successful delivery of genomic counseling. Based on our experience, we developed practical recommendations for genomic counseling, which include novel approaches to case preparation, use of electronic tools during the counseling session, and focusing on education as the major component of the genomic counseling session, in order to provide patients with the knowledge necessary to independently interpret and understand large amounts of genomic testing information provided to them.

  6. Informed consent in direct-to-consumer personal genome testing: the outline of a model between specific and generic consent.

    PubMed

    Bunnik, Eline M; Janssens, A Cecile J W; Schermer, Maartje H N

    2014-09-01

    Broad genome-wide testing is increasingly finding its way to the public through the online direct-to-consumer marketing of so-called personal genome tests. Personal genome tests estimate genetic susceptibilities to multiple diseases and other phenotypic traits simultaneously. Providers commonly make use of Terms of Service agreements rather than informed consent procedures. However, to protect consumers from the potential physical, psychological and social harms associated with personal genome testing and to promote autonomous decision-making with regard to the testing offer, we argue that current practices of information provision are insufficient and that there is a place--and a need--for informed consent in personal genome testing, also when it is offered commercially. The increasing quantity, complexity and diversity of most testing offers, however, pose challenges for information provision and informed consent. Both specific and generic models for informed consent fail to meet its moral aims when applied to personal genome testing. Consumers should be enabled to know the limitations, risks and implications of personal genome testing and should be given control over the genetic information they do or do not wish to obtain. We present the outline of a new model for informed consent which can meet both the norm of providing sufficient information and the norm of providing understandable information. The model can be used for personal genome testing, but will also be applicable to other, future forms of broad genetic testing or screening in commercial and clinical settings.

  7. Bioinformatics in the information age

    SciTech Connect

    Spengler, Sylvia J.

    2000-02-01

    There is a well-known story about the blind man examining the elephant: the part of the elephant examined determines his perception of the whole beast. Perhaps bioinformatics--the shotgun marriage between biology and mathematics, computer science, and engineering--is like an elephant that occupies a large chair in the scientific living room. Given the demand for and shortage of researchers with the computer skills to handle large volumes of biological data, where exactly does the bioinformatics elephant sit? There are probably many biologists who feel that a major product of this bioinformatics elephant is large piles of waste material. If you have tried to plow through Web sites and software packages in search of a specific tool for analyzing and collating large amounts of research data, you may well feel the same way. But there has been progress with major initiatives to develop more computing power, educate biologists about computers, increase funding, and set standards. For our purposes, bioinformatics is not simply a biologically inclined rehash of information theory (1) nor is it a hodgepodge of computer science techniques for building, updating, and accessing biological data. Rather bioinformatics incorporates both of these capabilities into a broad interdisciplinary science that involves both conceptual and practical tools for the understanding, generation, processing, and propagation of biological information. As such, bioinformatics is the sine qua non of 21st-century biology. Analyzing gene expression using cDNA microarrays immobilized on slides or other solid supports (gene chips) is set to revolutionize biology and medicine and, in so doing, generate vast quantities of data that have to be accurately interpreted (Fig. 1). As discussed at a meeting a few months ago (Microarray Algorithms and Statistical Analysis: Methods and Standards; Tahoe City, California; 9-12 November 1999), experiments with cDNA arrays must be subjected to quality control

  8. Biggest challenges in bioinformatics

    PubMed Central

    Fuller, Jonathan C; Khoueiry, Pierre; Dinkel, Holger; Forslund, Kristoffer; Stamatakis, Alexandros; Barry, Joseph; Budd, Aidan; Soldatos, Theodoros G; Linssen, Katja; Rajput, Abdul Mateen

    2013-01-01

    The third Heidelberg Unseminars in Bioinformatics (HUB) was held on 18th October 2012, at Heidelberg University, Germany. HUB brought together around 40 bioinformaticians from academia and industry to discuss the ‘Biggest Challenges in Bioinformatics' in a ‘World Café' style event. PMID:23492829

  9. A tiered-layered-staged model for informed consent in personal genome testing.

    PubMed

    Bunnik, Eline M; Janssens, A Cecile J W; Schermer, Maartje H N

    2013-06-01

    In recent years, developments in genomics technologies have led to the rise of commercial personal genome testing (PGT): broad genome-wide testing for multiple diseases simultaneously. While some commercial providers require physicians to order a personal genome test, others can be accessed directly. All providers advertise directly to consumers and offer genetic risk information about dozens of diseases in one single purchase. The quantity and the complexity of risk information pose challenges to adequate pre-test and post-test information provision and informed consent. There are currently no guidelines for what should constitute informed consent in PGT or how adequate informed consent can be achieved. In this paper, we propose a tiered-layered-staged model for informed consent. First, the proposed model is tiered as it offers choices between categories of diseases that are associated with distinct ethical, personal or societal issues. Second, the model distinguishes layers of information with a first layer offering minimal, indispensable information that is material to all consumers, and additional layers offering more detailed information made available upon request. Finally, the model stages informed consent as a process by feeding information to consumers in each subsequent stage of the process of undergoing a test, and by accommodating renewed consent for test result updates, resulting from the ongoing development of the science underlying PGT. A tiered-layered-staged model for informed consent with a focus on the consumer perspective can help overcome the ethical problems of information provision and informed consent in direct-to-consumer PGT.

  10. Deep brain stimulation, brain maps and personalized medicine: lessons from the human genome project.

    PubMed

    Fins, Joseph J; Shapiro, Zachary E

    2014-01-01

    Although the appellation of personalized medicine is generally attributed to advanced therapeutics in molecular medicine, deep brain stimulation (DBS) can also be so categorized. Like its medical counterpart, DBS is a highly personalized intervention that needs to be tailored to a patient's individual anatomy. And because of this, DBS like more conventional personalized medicine, can be highly specific where the object of care is an N = 1. But that is where the similarities end. Besides their differing medical and surgical provenances, these two varieties of personalized medicine have had strikingly different impacts. The molecular variant, though of a more recent vintage has thrived and is experiencing explosive growth, while DBS still struggles to find a sustainable therapeutic niche. Despite its promise, and success as a vetted treatment for drug resistant Parkinson's Disease, DBS has lagged in broadening its development, often encountering regulatory hurdles and financial barriers necessary to mount an adequate number of quality trials. In this paper we will consider why DBS-or better yet neuromodulation-has encountered these challenges and contrast this experience with the more successful advance of personalized medicine. We will suggest that personalized medicine and DBS's differential performance can be explained as a matter of timing and complexity. We believe that DBS has struggled because it has been a journey of scientific exploration conducted without a map. In contrast to molecular personalized medicine which followed the mapping of the human genome and the Human Genome Project, DBS preceded plans for the mapping of the human brain. We believe that this sequence has given personalized medicine a distinct advantage and that the fullest potential of DBS will be realized both as a cartographical or electrophysiological probe and as a modality of personalized medicine.

  11. [Applied problems of mathematical biology and bioinformatics].

    PubMed

    Lakhno, V D

    2011-01-01

    Mathematical biology and bioinformatics represent a new and rapidly progressing line of investigations which emerged in the course of work on the project "Human genome". The main applied problems of these sciences are grug design, patient-specific medicine and nanobioelectronics. It is shown that progress in the technology of mass sequencing of the human genome has set the stage for starting the national program on patient-specific medicine.

  12. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    PubMed

    Du, Jiang; Bjornson, Robert D; Zhang, Zhengdong D; Kong, Yong; Snyder, Michael; Gerstein, Mark B

    2009-07-01

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at

  13. Highlighting computations in bioscience and bioinformatics: review of the Symposium of Computations in Bioinformatics and Bioscience (SCBB07)

    PubMed Central

    Lu, Guoqing; Ni, Jun

    2008-01-01

    The Second Symposium on Computations in Bioinformatics and Bioscience (SCBB07) was held in Iowa City, Iowa, USA, on August 13–15, 2007. This annual event attracted dozens of bioinformatics professionals and students, who are interested in solving emerging computational problems in bioscience, from China, Japan, Taiwan and the United States. The Scientific Committee of the symposium selected 18 peer-reviewed papers for publication in this supplemental issue of BMC Bioinformatics. These papers cover a broad spectrum of topics in computational biology and bioinformatics, including DNA, protein and genome sequence analysis, gene expression and microarray analysis, computational proteomics and protein structure classification, systems biology and machine learning. PMID:18541044

  14. Bioinformatics education in India.

    PubMed

    Kulkarni-Kale, Urmila; Sawant, Sangeeta; Chavan, Vishwas

    2010-11-01

    An account of bioinformatics education in India is presented along with future prospects. Establishment of BTIS network by Department of Biotechnology (DBT), Government of India in the 1980s had been a systematic effort in the development of bioinformatics infrastructure in India to provide services to scientific community. Advances in the field of bioinformatics underpinned the need for well-trained professionals with skills in information technology and biotechnology. As a result, programmes for capacity building in terms of human resource development were initiated. Educational programmes gradually evolved from the organisation of short-term workshops to the institution of formal diploma/degree programmes. A case study of the Master's degree course offered at the Bioinformatics Centre, University of Pune is discussed. Currently, many universities and institutes are offering bioinformatics courses at different levels with variations in the course contents and degree of detailing. BioInformatics National Certification (BINC) examination initiated in 2005 by DBT provides a common yardstick to assess the knowledge and skill sets of students passing out of various institutions. The potential for broadening the scope of bioinformatics to transform it into a data intensive discovery discipline is discussed. This necessitates introduction of amendments in the existing curricula to accommodate the upcoming developments.

  15. Targeting the undruggable: immunotherapy meets personalized oncology in the genomic era.

    PubMed

    Martin, S D; Coukos, G; Holt, R A; Nelson, B H

    2015-12-01

    Owing to recent advances in genomic technologies, personalized oncology is poised to fundamentally alter cancer therapy. In this paradigm, the mutational and transcriptional profiles of tumors are assessed, and personalized treatments are designed based on the specific molecular abnormalities relevant to each patient's cancer. To date, such approaches have yielded impressive clinical responses in some patients. However, a major limitation of this strategy has also been revealed: the vast majority of tumor mutations are not targetable by current pharmacological approaches. Immunotherapy offers a promising alternative to exploit tumor mutations as targets for clinical intervention. Mutated proteins can give rise to novel antigens (called neoantigens) that are recognized with high specificity by patient T cells. Indeed, neoantigen-specific T cells have been shown to underlie clinical responses to many standard treatments and immunotherapeutic interventions. Moreover, studies in mouse models targeting neoantigens, and early results from clinical trials, have established proof of concept for personalized immunotherapies targeting next-generation sequencing identified neoantigens. Here, we review basic immunological principles related to T-cell recognition of neoantigens, and we examine recent studies that use genomic data to design personalized immunotherapies. We discuss the opportunities and challenges that lie ahead on the road to improving patient outcomes by incorporating immunotherapy into the paradigm of personalized oncology.

  16. Personalizing dermatology: the future of genomic expression profiling to individualize dermatologic therapy.

    PubMed

    Rizzo, Amilcar Ezequiel; Maibach, Howard I

    2012-06-01

    At the start of the 21st century, the human genome project provided the scientific community with an enormous array of information as genetic blueprints. A landmark period, yet its potential contribution to medicine at the time was limited and unknown. However, with new technological advances, the benefits of identifying genomic profiles became apparent. This article reviews the historical accomplishments made by the human genome project, future applications of genomic expression profiles with the use of microarray gene chip technology, and the pharmacogenomic translational application of these models to dermatology. A new scientific movement in dermatology has begun with intentions of discovering individual genomic profiles responsible for dermatologic disease and drug metabolism, so that medical management can be personalized towards the genome rather than the disease. This review shows how pharmacogenomics has taken the lead in forming a basic framework of revealing specific drug metabolic pathways in the skin that can consequently be altered to maximize and minimize therapeutic efficacy and side effects, respectively. Dermatology as a model field in medicine has started to take advantage of these discoveries upon which deciphering genetic profiles can be used to enhance medical treatment.

  17. The UCSC Genome Browser

    PubMed Central

    Karolchik, Donna; Hinrichs, Angie S.; Kent, W. James

    2011-01-01

    The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation “tracks.” The annotations generated by the UCSC Genome Bioinformatics Group and external collaborators include gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload personal datasets in a wide variety of formats as custom annotation tracks in both browsers for research or educational purposes. PMID:21975940

  18. Pathway analysis of genome-wide association datasets of personality traits.

    PubMed

    Kim, H-N; Kim, B-H; Cho, J; Ryu, S; Shin, H; Sung, J; Shin, C; Cho, N H; Sung, Y A; Choi, B-O; Kim, H-L

    2015-04-01

    Although several genome-wide association (GWA) studies of human personality have been recently published, genetic variants that are highly associated with certain personality traits remain unknown, due to difficulty reproducing results. To further investigate these genetic variants, we assessed biological pathways using GWA datasets. Pathway analysis using GWA data was performed on 1089 Korean women whose personality traits were measured with the Revised NEO Personality Inventory for the 5-factor model of personality. A total of 1042 pathways containing 8297 genes were included in our study. Of these, 14 pathways were highly enriched with association signals that were validated in 1490 independent samples. These pathways include association of: Neuroticism with axon guidance [L1 cell adhesion molecule (L1CAM) interactions]; Extraversion with neuronal system and voltage-gated potassium channels; Agreeableness with L1CAM interaction, neurotransmitter receptor binding and downstream transmission in postsynaptic cells; and Conscientiousness with the interferon-gamma and platelet-derived growth factor receptor beta polypeptide pathways. Several genes that contribute to top-ranked pathways in this study were previously identified in GWA studies or by pathway analysis in schizophrenia or other neuropsychiatric disorders. Here we report the first pathway analysis of all five personality traits. Importantly, our analysis identified novel pathways that contribute to understanding the etiology of personality traits. PMID:25809424

  19. Pathway analysis of genome-wide association datasets of personality traits.

    PubMed

    Kim, H-N; Kim, B-H; Cho, J; Ryu, S; Shin, H; Sung, J; Shin, C; Cho, N H; Sung, Y A; Choi, B-O; Kim, H-L

    2015-04-01

    Although several genome-wide association (GWA) studies of human personality have been recently published, genetic variants that are highly associated with certain personality traits remain unknown, due to difficulty reproducing results. To further investigate these genetic variants, we assessed biological pathways using GWA datasets. Pathway analysis using GWA data was performed on 1089 Korean women whose personality traits were measured with the Revised NEO Personality Inventory for the 5-factor model of personality. A total of 1042 pathways containing 8297 genes were included in our study. Of these, 14 pathways were highly enriched with association signals that were validated in 1490 independent samples. These pathways include association of: Neuroticism with axon guidance [L1 cell adhesion molecule (L1CAM) interactions]; Extraversion with neuronal system and voltage-gated potassium channels; Agreeableness with L1CAM interaction, neurotransmitter receptor binding and downstream transmission in postsynaptic cells; and Conscientiousness with the interferon-gamma and platelet-derived growth factor receptor beta polypeptide pathways. Several genes that contribute to top-ranked pathways in this study were previously identified in GWA studies or by pathway analysis in schizophrenia or other neuropsychiatric disorders. Here we report the first pathway analysis of all five personality traits. Importantly, our analysis identified novel pathways that contribute to understanding the etiology of personality traits.

  20. Identification of transposon insertion polymorphisms by computational comparative analysis of next generation personal genome data

    NASA Astrophysics Data System (ADS)

    Luo, Xuemei; Dehne, Frank; Liang, Ping

    2011-11-01

    Structural variations (SVs) in a genome are now known as a prominent and important type of genetic variation. Among all types of SVs, the identification of transposon insertion polymorphisms (TIPs) is more challenging due to the highly repetitive nature of transposon sequences. We developed a computational method, TIP-finder, to identify TIPs through analysis of next generation personal genome data and their extremely large copy numbers. We tested the efficiency of TIP-finder with simulated data and are able to detect about 88% of TIPs with precision of ≥91%. Using TIP-finder to analyze the Solexa pair-end sequence data at deep coverage for six genomes representing two trio families, we identified a total of 5569 TIPs, consisting of 4881, 456, 91, and 141 insertions from Alu, L1, SVA and HERV, respectively, representing the most comprehensive analysis of such type of genetic variation.

  1. The potential of translational bioinformatics approaches for pharmacology research.

    PubMed

    Li, Lang

    2015-10-01

    The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of 'omics' to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines.

  2. From "Personalized" to "Precision" Medicine: The Ethical and Social Implications of Rhetorical Reform in Genomic Medicine.

    PubMed

    Juengst, Eric; McGowan, Michelle L; Fishman, Jennifer R; Settersten, Richard A

    2016-09-01

    Since the late 1980s, the human genetics and genomics research community has been promising to usher in a "new paradigm for health care"-one that uses molecular profiling to identify human genetic variants implicated in multifactorial health risks. After the completion of the Human Genome Project in 2003, a wide range of stakeholders became committed to this "paradigm shift," creating a confluence of investment, advocacy, and enthusiasm that bears all the marks of a "scientific/intellectual social movement" within biomedicine. Proponents of this movement usually offer four ways in which their approach to medical diagnosis and health care improves upon current practices, arguing that it is more "personalized," "predictive," "preventive," and "participatory" than the medical status quo. Initially, it was personalization that seemed to best sum up the movement's appeal. By 2012, however, powerful opinion leaders were abandoning "personalized medicine" in favor of a new label: "precision medicine." The new label received a decisive seal of approval when, in January 2015, President Obama unveiled plans for a national "precision medicine initiative" to promote the development and use of genomic tools in health care. PMID:27649826

  3. Perceptions of genetic counseling services in direct-to-consumer personal genomic testing.

    PubMed

    Darst, B F; Madlensky, L; Schork, N J; Topol, E J; Bloss, C S

    2013-10-01

    To describe consumers' perceptions of genetic counseling services in the context of direct-to-consumer personal genomic testing is the purpose of this research. Utilizing data from the Scripps Genomic Health Initiative, we assessed direct-to-consumer genomic test consumers' utilization and perceptions of genetic counseling services. At long-term follow-up, approximately 14 months post-testing, participants were asked to respond to several items gauging their interactions, if any, with a Navigenics genetic counselor, and their perceptions of those interactions. Out of 1325 individuals who completed long-term follow-up, 187 (14.1%) indicated that they had spoken with a genetic counselor. The most commonly given reason for not utilizing the counseling service was a lack of need due to the perception of already understanding one's results (55.6%). The most common reasons for utilizing the service included wanting to take advantage of a free service (43.9%) and wanting more information on risk calculations (42.2%). Among those who utilized the service, a large fraction reported that counseling improved their understanding of their results (54.5%) and genetics in general (43.9%). A relatively small proportion of participants utilized genetic counseling after direct-to-consumer personal genomic testing. Among those individuals who did utilize the service, however, a large fraction perceived it to be informative, and thus presumably beneficial.

  4. Eyes wide open: the personal genome project, citizen science and veracity in informed consent.

    PubMed

    Angrist, Misha

    2009-11-01

    I am a close observer of the Personal Genome Project (PGP) and one of the original ten participants. The PGP was originally conceived as a way to test novel DNA sequencing technologies on human samples and to begin to build a database of human genomes and traits. However, its founder, Harvard geneticist George Church, was concerned about the fact that DNA is the ultimate digital identifier - individuals and many of their traits can be identified. Therefore, he believed that promising participants privacy and confidentiality would be impractical and disingenuous. Moreover, deidentification of samples would impoverish both genotypic and phenotypic data. As a result, the PGP has arguably become best known for its unprecedented approach to informed consent. All participants must pass an exam testing their knowledge of genomic science and privacy issues and agree to forgo the privacy and confidentiality of their genomic data and personal health records. Church aims to scale up to 100,000 participants. This special report discusses the impetus for the project, its early history and its potential to have a lasting impact on the treatment of human subjects in biomedical research.

  5. Perceptions of genetic counseling services in direct-to-consumer personal genomic testing.

    PubMed

    Darst, B F; Madlensky, L; Schork, N J; Topol, E J; Bloss, C S

    2013-10-01

    To describe consumers' perceptions of genetic counseling services in the context of direct-to-consumer personal genomic testing is the purpose of this research. Utilizing data from the Scripps Genomic Health Initiative, we assessed direct-to-consumer genomic test consumers' utilization and perceptions of genetic counseling services. At long-term follow-up, approximately 14 months post-testing, participants were asked to respond to several items gauging their interactions, if any, with a Navigenics genetic counselor, and their perceptions of those interactions. Out of 1325 individuals who completed long-term follow-up, 187 (14.1%) indicated that they had spoken with a genetic counselor. The most commonly given reason for not utilizing the counseling service was a lack of need due to the perception of already understanding one's results (55.6%). The most common reasons for utilizing the service included wanting to take advantage of a free service (43.9%) and wanting more information on risk calculations (42.2%). Among those who utilized the service, a large fraction reported that counseling improved their understanding of their results (54.5%) and genetics in general (43.9%). A relatively small proportion of participants utilized genetic counseling after direct-to-consumer personal genomic testing. Among those individuals who did utilize the service, however, a large fraction perceived it to be informative, and thus presumably beneficial. PMID:23590221

  6. Eyes wide open: the personal genome project, citizen science and veracity in informed consent

    PubMed Central

    Angrist, Misha

    2012-01-01

    I am a close observer of the Personal Genome Project (PGP) and one of the original ten participants. The PGP was originally conceived as a way to test novel DNA sequencing technologies on human samples and to begin to build a database of human genomes and traits. However, its founder, Harvard geneticist George Church, was concerned about the fact that DNA is the ultimate digital identifier – individuals and many of their traits can be identified. Therefore, he believed that promising participants privacy and confidentiality would be impractical and disingenuous. Moreover, deidentification of samples would impoverish both genotypic and phenotypic data. As a result, the PGP has arguably become best known for its unprecedented approach to informed consent. All participants must pass an exam testing their knowledge of genomic science and privacy issues and agree to forgo the privacy and confidentiality of their genomic data and personal health records. Church aims to scale up to 100,000 participants. This special report discusses the impetus for the project, its early history and its potential to have a lasting impact on the treatment of human subjects in biomedical research. PMID:22328898

  7. Primary care providers’ experiences with and perceptions of personalized genomic medicine

    PubMed Central

    Carroll, June C.; Makuwaza, Tutsirai; Manca, Donna P.; Sopcak, Nicolette; Permaul, Joanne A.; O’Brien, Mary Ann; Heisey, Ruth; Eisenhauer, Elizabeth A.; Easley, Julie; Krzyzanowska, Monika K.; Miedema, Baukje; Pruthi, Sandhya; Sawka, Carol; Schneider, Nancy; Sussman, Jonathan; Urquhart, Robin; Versaevel, Catarina; Grunfeld, Eva

    2016-01-01

    Abstract Objective To assess primary care providers’ (PCPs’) experiences with, perceptions of, and desired role in personalized medicine, with a focus on cancer. Design Qualitative study involving focus groups. Setting Urban and rural interprofessional primary care team practices in Alberta and Ontario. Participants Fifty-one PCPs. Methods Semistructured focus groups were conducted and audiorecorded. Recordings were transcribed and analyzed using techniques informed by grounded theory including coding, interpretations of patterns in the data, and constant comparison. Main findings Five focus groups with the 51 participants were conducted; 2 took place in Alberta and 3 in Ontario. Primary care providers described limited experience with personalized medicine, citing breast cancer and prenatal care as main areas of involvement. They expressed concern over their lack of knowledge, in some circumstances relying on personal experiences to inform their attitudes and practice. Participants anticipated an inevitable role in personalized medicine primarily because patients seek and trust their advice; however, there was underlying concern about the magnitude of information and pace of discovery in this area, particularly in direct-to-consumer personal genomic testing. Increased knowledge, closer ties to genetics specialists, and relevant, reliable personalized medicine resources accessible at the point of care were reported as important for successful implementation of personalized medicine. Conclusion Primary care providers are prepared to discuss personalized medicine, but they require better resources. Models of care that support a more meaningful relationship between PCPs and genetics specialists should be pursued. Continuing education strategies need to address knowledge gaps including direct-to-consumer genetic testing, a relatively new area provoking PCP concern. Primary care providers should be mindful of using personal experiences to guide care. PMID:27737998

  8. A tiered-layered-staged model for informed consent in personal genome testing.

    PubMed

    Bunnik, Eline M; Janssens, A Cecile J W; Schermer, Maartje H N

    2013-06-01

    In recent years, developments in genomics technologies have led to the rise of commercial personal genome testing (PGT): broad genome-wide testing for multiple diseases simultaneously. While some commercial providers require physicians to order a personal genome test, others can be accessed directly. All providers advertise directly to consumers and offer genetic risk information about dozens of diseases in one single purchase. The quantity and the complexity of risk information pose challenges to adequate pre-test and post-test information provision and informed consent. There are currently no guidelines for what should constitute informed consent in PGT or how adequate informed consent can be achieved. In this paper, we propose a tiered-layered-staged model for informed consent. First, the proposed model is tiered as it offers choices between categories of diseases that are associated with distinct ethical, personal or societal issues. Second, the model distinguishes layers of information with a first layer offering minimal, indispensable information that is material to all consumers, and additional layers offering more detailed information made available upon request. Finally, the model stages informed consent as a process by feeding information to consumers in each subsequent stage of the process of undergoing a test, and by accommodating renewed consent for test result updates, resulting from the ongoing development of the science underlying PGT. A tiered-layered-staged model for informed consent with a focus on the consumer perspective can help overcome the ethical problems of information provision and informed consent in direct-to-consumer PGT. PMID:23169494

  9. A genome-wide scan for Eysenckian personality dimensions in adolescent twin sibships: psychoticism, extraversion, neuroticism, and lie.

    PubMed

    Gillespie, Nathan A; Zhu, Gu; Evans, David M; Medland, Sarah E; Wright, Margie J; Martin, Nick G

    2008-12-01

    We report the first genome-wide scan of adolescent personality. We conducted a genome-wide scan to detect linkage for measures of adolescent Psychoticism, Extraversion, Neuroticism, and Lie from the Junior Eysenck Personality Questionnaire. Data are based on 1,280 genotyped Australian adolescent twins and their siblings. The highest linkage peaks were found on chromosomes 16 and 19 for Neuroticism, on chromosomes 1, 7, 10, 13 m, and 18 for Psychoticism, and on chromosomes 2 and 3 for Extraversion.

  10. Integrated Database And Knowledge Base For Genomic Prospective Cohort Study In Tohoku Medical Megabank Toward Personalized Prevention And Medicine.

    PubMed

    Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun

    2015-01-01

    The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.

  11. Neuroinformatics: from bioinformatics to databasing the brain.

    PubMed

    Morse, Thomas M

    2008-01-01

    Neuroinformatics seeks to create and maintain web-accessible databases of experimental and computational data, together with innovative software tools, essential for understanding the nervous system in its normal function and in neurological disorders. Neuroinformatics includes traditional bioinformatics of gene and protein sequences in the brain; atlases of brain anatomy and localization of genes and proteins; imaging of brain cells; brain imaging by positron emission tomography (PET), functional magnetic resonance imaging (fMRI), electroencephalography (EEG), magnetoencephalography (MEG) and other methods; many electrophysiological recording methods; and clinical neurological data, among others. Building neuroinformatics databases and tools presents difficult challenges because they span a wide range of spatial scales and types of data stored and analyzed. Traditional bioinformatics, by comparison, focuses primarily on genomic and proteomic data (which of course also presents difficult challenges). Much of bioinformatics analysis focus on sequences (DNA, RNA, and protein molecules), as the type of data that are stored, compared, and sometimes modeled. Bioinformatics is undergoing explosive growth with the addition, for example, of databases that catalog interactions between proteins, of databases that track the evolution of genes, and of systems biology databases which contain models of all aspects of organisms. This commentary briefly reviews neuroinformatics with clarification of its relationship to traditional and modern bioinformatics.

  12. Adopting Genetics: Motivations and Outcomes of Personal Genomic Testing in Adult Adoptees

    PubMed Central

    Baptista, Natalie M.; Christensen, Kurt D.; Carere, Deanna Alexis; Broadley, Simon A.; Roberts, J. Scott; Green, Robert C.

    2015-01-01

    Purpose American adult adoptees may possess limited amounts of information about their biological families and turn to direct-to-consumer personal genomic testing (PGT) for genealogical and medical information. We investigated the motivations and outcomes of adoptees undergoing PGT using data from the Impact of Personal Genomics (PGen) Study. Methods The PGen Study surveyed new 23andMe and Pathway Genomics customers prior to and 6 months after receiving PGT results. Exploratory analyses compared adoptees’ and non-adoptees’ PGT attitudes, expectations, and experiences. We evaluated the association of adoption status with motivations for testing and post-disclosure actions using logistic regression models. Results Of 1607 participants, 80 (5%) were adopted. As compared to non-adoptees, adoptees were more likely to cite limited family health history knowledge (OR = 10.1; 95% CI = 5.7–19.5) and the opportunity to learn genetic disease risks (OR = 2.7; 95% CI = 1.6–4.8) as strong motivations for PGT. Of 922 participants who completed 6-month follow-up, there was no significant association between adoption status and PGT-motivated healthcare utilization or health behavior change. Conclusion PGT allows adoptees to gain otherwise inaccessible information about their genetic disease risks and ancestry, helping them to fill the void of an incomplete family health history. PMID:26820063

  13. Towards personalized agriculture: what chemical genomics can bring to plant biotechnology

    PubMed Central

    Stokes, Michael E.; McCourt, Peter

    2014-01-01

    In contrast to the dominant drug paradigm in which compounds were developed to “fit all,” new models focused around personalized medicine are appearing in which treatments are developed and customized for individual patients. The agricultural biotechnology industry (Ag-biotech) should also think about these new personalized models. For example, most common herbicides are generic in action, which led to the development of genetically modified crops to add specificity. The ease and accessibility of modern genomic analysis, when wedded to accessible large chemical space, should facilitate the discovery of chemicals that are more selective in their utility. Is it possible to develop species-selective herbicides and growth regulators? More generally put, is plant research at a stage where chemicals can be developed that streamline plant development and growth to various environments? We believe the advent of chemical genomics now opens up these and other opportunities to “personalize” agriculture. Furthermore, chemical genomics does not necessarily require genetically tractable plant models, which in principle should allow quick translation to practical applications. For this to happen, however, will require collaboration between the Ag-biotech industry and academic labs for early stage research and development, a situation that has proven very fruitful for Big Pharma. PMID:25183965

  14. Genome-wide association study of personality traits in bipolar patients

    PubMed Central

    Alliey-Rodriguez, Ney; Zhang, Dandan; Badner, Judith A.; Lahey, Benjamin B.; Zhang, Xiaotong; Dinwiddie, Stephen; Romanos, Benjamin; Plenys, Natalie; Liu, Chunyu; Gershon, Elliot S.

    2011-01-01

    Objective Genome-wide association study was carried out on personality traits among bipolar patients as possible endophenotypes for gene discovery in bipolar disorder. Methods The subscales of Cloninger’s Temperament and Character Inventory (TCI) and the Zuckerman–Kuhlman Personality Questionnaire (ZKPQ) were used as quantitative phenotypes. The genotyping platform was the Affymetrix 6.0 SNP array. The sample consisted of 944 individuals for TCI and 1007 for ZKPQ, all of European ancestry, diagnosed with bipolar disorder by Diagnostic and Statistical Manual of Mental Disorders-IV criteria. Results Genome-wide significant association was found for two subscales of the TCI, rs10479334 with the ‘Social Acceptance versus Social Intolerance’ subscale (Bonferroni P = 0.014) in an intergenic region, and rs9419788 with the ‘Spiritual Acceptance versus Rational Materialism’ subscale (Bonferroni P = 0.036) in PLCE1 gene. Although genome-wide significance was not reached for ZKPQ scales, lowest P values pinpointed to genes, RXRG for Sensation Seeking, GRM7 and ITK for Neuroticism Anxiety, and SPTLC3 gene for Aggression Hostility. Conclusion After correction for the 25 subscales in TCI and four scales plus two subscales in ZKPQ, phenotype-wide significance was not reached. PMID:21368711

  15. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  16. Analysis of the whole mitochondrial genome: translation of the Ion Torrent Personal Genome Machine system to the diagnostic bench?

    PubMed

    Seneca, Sara; Vancampenhout, Kim; Van Coster, Rudy; Smet, Joél; Lissens, Willy; Vanlander, Arnaud; De Paepe, Boel; Jonckheere, An; Stouffs, Katrien; De Meirleir, Linda

    2015-01-01

    Next-generation sequencing (NGS), an innovative sequencing technology that enables the successful analysis of numerous gene sequences in a massive parallel sequencing approach, has revolutionized the field of molecular biology. Although NGS was introduced in a rather recent past, the technology has already demonstrated its potential and effectiveness in many research projects, and is now on the verge of being introduced into the diagnostic setting of routine laboratories to delineate the molecular basis of genetic disease in undiagnosed patient samples. We tested a benchtop device on retrospective genomic DNA (gDNA) samples of controls and patients with a clinical suspicion of a mitochondrial DNA disorder. This Ion Torrent Personal Genome Machine platform is a high-throughput sequencer with a fast turnaround time and reasonable running costs. We challenged the chemistry and technology with the analysis and processing of a mutational spectrum composed of samples with single-nucleotide substitutions, indels (insertions and deletions) and large single or multiple deletions, occasionally in heteroplasmy. The output data were compared with previously obtained conventional dideoxy sequencing results and the mitochondrial revised Cambridge Reference Sequence (rCRS). We were able to identify the majority of all nucleotide alterations, but three false-negative results were also encountered in the data set. At the same time, the poor performance of the PGM instrument in regions associated with homopolymeric stretches generated many false-positive miscalls demanding additional manual curation of the data.

  17. Genomic research and data-mining technology: implications for personal privacy and informed consent.

    PubMed

    Tavani, Herman T

    2004-01-01

    This essay examines issues involving personal privacy and informed consent that arise at the intersection of information and communication technology (ICT) and population genomics research. I begin by briefly examining the ethical, legal, and social implications (ELSI) program requirements that were established to guide researchers working on the Human Genome Project (HGP). Next I consider a case illustration involving deCODE Genetics, a privately owned genetic company in Iceland, which raises some ethical concerns that are not clearly addressed in the current ELSI guidelines. The deCODE case also illustrates some ways in which an ICT technique known as data mining has both aided and posed special challenges for researchers working in the field of population genomics. On the one hand, data-mining tools have greatly assisted researchers in mapping the human genome and in identifying certain "disease genes" common in specific populations (which, in turn, has accelerated the process of finding cures for diseases tha affect those populations). On the other hand, this technology has significantly threatened the privacy of research subjects participating in population genomics studies, who may, unwittingly, contribute to the construction of new groups (based on arbitrary and non-obvious patterns and statistical correlations) that put those subjects at risk for discrimination and stigmatization. In the final section of this paper I examine some ways in which the use of data mining in the context of population genomics research poses a critical challenge for the principle of informed consent, which traditionally has played a central role in protecting the privacy interests of research subjects participating in epidemiological studies.

  18. Emerging landscape of genomics in the Electronic Health Record for personalized medicine.

    PubMed

    Ullman-Cullere, Mollie H; Mathew, Jomol P

    2011-05-01

    The Information Technology (IT) roadmap for personalized medicine requires Electronic Health Records (EHRs), extension of Healthcare IT (HIT) standards, and understanding of how genetics/genomics should be integrated into the clinical applications. For reduced overall costs and development times, these three initiatives should run in parallel. EHRs must contain structured data and infrastructure that enables quality analysis, Clinical Decision Support (CDS) and messaging within the healthcare information network. Fortunately, as a result of sustained financial commitment to nongenetic-based healthcare, the industry has HIT data standards and understanding of EHR functionality that improves patient safety and outcomes while reducing overall healthcare costs. However, the HIT standards and EHR functional requirements, needed for personalized medicine, are only beginning to support simple genetic tests and need significant extension. In addition, our understanding of the clinical implications of genomic data is evolving and translation of new discovery into clinical care remains a challenge. Therefore, priority areas include CDS, educational resources, and knowledgebases for the EHR, clinical and research data warehouses, messaging frameworks, and continued review of healthcare policies and regulations supporting personalized medicine. Where core infrastructure remains to be developed and implemented, funding is needed for pilot projects, data standards, policy, and stakeholder collaboration. PMID:21309042

  19. A national clinical decision support infrastructure to enable the widespread and consistent practice of genomic and personalized medicine

    PubMed Central

    2009-01-01

    Background In recent years, the completion of the Human Genome Project and other rapid advances in genomics have led to increasing anticipation of an era of genomic and personalized medicine, in which an individual's health is optimized through the use of all available patient data, including data on the individual's genome and its downstream products. Genomic and personalized medicine could transform healthcare systems and catalyze significant reductions in morbidity, mortality, and overall healthcare costs. Discussion Critical to the achievement of more efficient and effective healthcare enabled by genomics is the establishment of a robust, nationwide clinical decision support infrastructure that assists clinicians in their use of genomic assays to guide disease prevention, diagnosis, and therapy. Requisite components of this infrastructure include the standardized representation of genomic and non-genomic patient data across health information systems; centrally managed repositories of computer-processable medical knowledge; and standardized approaches for applying these knowledge resources against patient data to generate and deliver patient-specific care recommendations. Here, we provide recommendations for establishing a national decision support infrastructure for genomic and personalized medicine that fulfills these needs, leverages existing resources, and is aligned with the Roadmap for National Action on Clinical Decision Support commissioned by the U.S. Office of the National Coordinator for Health Information Technology. Critical to the establishment of this infrastructure will be strong leadership and substantial funding from the federal government. Summary A national clinical decision support infrastructure will be required for reaping the full benefits of genomic and personalized medicine. Essential components of this infrastructure include standards for data representation; centrally managed knowledge repositories; and standardized approaches for

  20. Translating Mendelian and complex inheritance of Alzheimer's disease genes for predicting unique personal genome variants

    PubMed Central

    Regan, Kelly; Wang, Kanix; Doughty, Emily; Li, Haiquan; Li, Jianrong; Lee, Younghee; Kann, Maricel G

    2012-01-01

    Objective Although trait-associated genes identified as complex versus single-gene inheritance differ substantially in odds ratio, the authors nonetheless posit that their mechanistic concordance can reveal fundamental properties of the genetic architecture, allowing the automated interpretation of unique polymorphisms within a personal genome. Materials and methods An analytical method, SPADE-gen, spanning three biological scales was developed to demonstrate the mechanistic concordance between Mendelian and complex inheritance of Alzheimer's disease (AD) genes: biological functions (BP), protein interaction modeling, and protein domain implicated in the disease-associated polymorphism. Results Among Gene Ontology (GO) biological processes (BP) enriched at a false detection rate <5% in 15 AD genes of Mendelian inheritance (Online Mendelian Inheritance in Man) and independently in those of complex inheritance (25 host genes of intragenic AD single-nucleotide polymorphisms confirmed in genome-wide association studies), 16 overlapped (empirical p=0.007) and 45 were similar (empirical p<0.009; information theory). SPAN network modeling extended the canonical pathway of AD (KEGG) with 26 new protein interactions (empirical p<0.0001). Discussion The study prioritized new AD-associated biological mechanisms and focused the analysis on previously unreported interactions associated with the biological processes of polymorphisms that affect specific protein domains within characterized AD genes and their direct interactors using (1) concordant GO-BP and (2) domain interactions within STRING protein–protein interactions corresponding to the genomic location of the AD polymorphism (eg, EPHA1, APOE, and CD2AP). Conclusion These results are in line with unique-event polymorphism theory, indicating how disease-associated polymorphisms of Mendelian or complex inheritance relate genetically to those observed as ‘unique personal variants’. They also provide insight for

  1. An Online Bioinformatics Curriculum

    PubMed Central

    Searls, David B.

    2012-01-01

    Online learning initiatives over the past decade have become increasingly comprehensive in their selection of courses and sophisticated in their presentation, culminating in the recent announcement of a number of consortium and startup activities that promise to make a university education on the internet, free of charge, a real possibility. At this pivotal moment it is appropriate to explore the potential for obtaining comprehensive bioinformatics training with currently existing free video resources. This article presents such a bioinformatics curriculum in the form of a virtual course catalog, together with editorial commentary, and an assessment of strengths, weaknesses, and likely future directions for open online learning in this field. PMID:23028269

  2. Bioinformatics software resources.

    PubMed

    Gilbert, Don

    2004-09-01

    This review looks at internet archives, repositories and lists for obtaining popular and useful biology and bioinformatics software. Resources include collections of free software, services for the collaborative development of new programs, software news media and catalogues of links to bioinformatics software and web tools. Problems with such resources arise from needs for continued curator effort to collect and update these, combined with less than optimal community support, funding and collaboration. Despite some problems, the available software repositories provide needed public access to many tools that are a foundation for analyses in bioscience research efforts.

  3. Meta-analysis of genome-wide association studies for personality.

    PubMed

    de Moor, M H M; Costa, P T; Terracciano, A; Krueger, R F; de Geus, E J C; Toshiko, T; Penninx, B W J H; Esko, T; Madden, P A F; Derringer, J; Amin, N; Willemsen, G; Hottenga, J-J; Distel, M A; Uda, M; Sanna, S; Spinhoven, P; Hartman, C A; Sullivan, P; Realo, A; Allik, J; Heath, A C; Pergadia, M L; Agrawal, A; Lin, P; Grucza, R; Nutile, T; Ciullo, M; Rujescu, D; Giegling, I; Konte, B; Widen, E; Cousminer, D L; Eriksson, J G; Palotie, A; Peltonen, L; Luciano, M; Tenesa, A; Davies, G; Lopez, L M; Hansell, N K; Medland, S E; Ferrucci, L; Schlessinger, D; Montgomery, G W; Wright, M J; Aulchenko, Y S; Janssens, A C J W; Oostra, B A; Metspalu, A; Abecasis, G R; Deary, I J; Räikkönen, K; Bierut, L J; Martin, N G; van Duijn, C M; Boomsma, D I

    2012-03-01

    Personality can be thought of as a set of characteristics that influence people's thoughts, feelings and behavior across a variety of settings. Variation in personality is predictive of many outcomes in life, including mental health. Here we report on a meta-analysis of genome-wide association (GWA) data for personality in 10 discovery samples (17,375 adults) and five in silico replication samples (3294 adults). All participants were of European ancestry. Personality scores for Neuroticism, Extraversion, Openness to Experience, Agreeableness and Conscientiousness were based on the NEO Five-Factor Inventory. Genotype data of ≈ 2.4M single-nucleotide polymorphisms (SNPs; directly typed and imputed using HapMap data) were available. In the discovery samples, classical association analyses were performed under an additive model followed by meta-analysis using the weighted inverse variance method. Results showed genome-wide significance for Openness to Experience near the RASA1 gene on 5q14.3 (rs1477268 and rs2032794, P=2.8 × 10(-8) and 3.1 × 10(-8)) and for Conscientiousness in the brain-expressed KATNAL2 gene on 18q21.1 (rs2576037, P=4.9 × 10(-8)). We further conducted a gene-based test that confirmed the association of KATNAL2 to Conscientiousness. In silico replication did not, however, show significant associations of the top SNPs with Openness and Conscientiousness, although the direction of effect of the KATNAL2 SNP on Conscientiousness was consistent in all replication samples. Larger scale GWA studies and alternative approaches are required for confirmation of KATNAL2 as a novel gene affecting Conscientiousness.

  4. Personalized medicine approaches for colon cancer driven by genomics and systems biology: OncoTrack

    PubMed Central

    Henderson, David; Ogilvie, Lesley A; Hoyle, Nicholas; Keilholz, Ulrich; Lange, Bodo; Lehrach, Hans

    2014-01-01

    The post-genomic era promises to pave the way to a personalized understanding of disease processes, with technological and analytical advances helping to solve some of the world's health challenges. Despite extraordinary progress in our understanding of cancer pathogenesis, the disease remains one of the world's major medical problems. New therapies and diagnostic procedures to guide their clinical application are urgently required. OncoTrack, a consortium between industry and academia, supported by the Innovative Medicines Initiative, signifies a new era in personalized medicine, which synthesizes current technological advances in omics techniques, systems biology approaches, and mathematical modeling. A truly personalized molecular imprint of the tumor micro-environment and subsequent diagnostic and therapeutic insight is gained, with the ultimate goal of matching the “right” patient to the “right” drug and identifying predictive biomarkers for clinical application. This comprehensive mapping of the colon cancer molecular landscape in tandem with crucial, clinical functional annotation for systems biology analysis provides unprecedented insight and predictive power for colon cancer management. Overall, we show that major biotechnological developments in tandem with changes in clinical thinking have laid the foundations for the OncoTrack approach and the future clinical application of a truly personalized approach to colon cancer theranostics. PMID:25074435

  5. Bioinformatics strategies for the analysis of lipids.

    PubMed

    Wheelock, Craig E; Goto, Susumu; Yetukuri, Laxman; D'Alexandri, Fabio Luiz; Klukas, Christian; Schreiber, Falk; Oresic, Matej

    2009-01-01

    Owing to their importance in cellular physiology and pathology as well as to recent technological advances, the study of lipids has reemerged as a major research target. However, the structural diversity of lipids presents a number of analytical and informatics challenges. The field of lipidomics is a new postgenome discipline that aims to develop comprehensive methods for lipid analysis, necessitating concomitant developments in bioinformatics. The evolving research paradigm requires that new bioinformatics approaches accommodate genomic as well as high-level perspectives, integrating genome, protein, chemical and network information. The incorporation of lipidomics information into these data structures will provide mechanistic understanding of lipid functions and interactions in the context of cellular and organismal physiology. Accordingly, it is vital that specific bioinformatics methods be developed to analyze the wealth of lipid data being acquired. Herein, we present an overview of the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and application of its tools to the analysis of lipid data. We also describe a series of software tools and databases (KGML-ED, VANTED, MZmine, and LipidDB) that can be used for the processing of lipidomics data and biochemical pathway reconstruction, an important next step in the development of the lipidomics field.

  6. Bioinformatics Methods and Tools to Advance Clinical Care

    PubMed Central

    Lecroq, T.

    2015-01-01

    Summary Objectives To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain and clinical care. Method We provide a synopsis of the articles selected for the IMIA Yearbook 2015, from which we attempt to derive a synthetic overview of current and future activities in the field. As last year, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor has evaluated separately the set of 1,594 articles and the evaluation results were merged for retaining 15 articles for peer-review. Results The selection and evaluation process of this Yearbook’s section on Bioinformatics and Translational Informatics yielded four excellent articles regarding data management and genome medicine that are mainly tool-based papers. In the first article, the authors present PPISURV a tool for uncovering the role of specific genes in cancer survival outcome. The second article describes the classifier PredictSNP which combines six performing tools for predicting disease-related mutations. In the third article, by presenting a high-coverage map of the human proteome using high resolution mass spectrometry, the authors highlight the need for using mass spectrometry to complement genome annotation. The fourth article is also related to patient survival and decision support. The authors present datamining methods of large-scale datasets of past transplants. The objective is to identify chances of survival. Conclusions The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on dedicated tools and methods to advance clinical care. Indeed, there is a need for powerful tools for managing and interpreting complex, large-scale genomic and biological datasets, but also a need for user-friendly tools developed for the clinicians in their

  7. Bioinformatics and School Biology

    ERIC Educational Resources Information Center

    Dalpech, Roger

    2006-01-01

    The rapidly changing field of bioinformatics is fuelling the need for suitably trained personnel with skills in relevant biological "sub-disciplines" such as proteomics, transcriptomics and metabolomics, etc. But because of the complexity--and sheer weight of data--associated with these new areas of biology, many school teachers feel…

  8. Bioinformatics prediction of miRNAs in the Prunus persica genome with validation of their precise sequences by miR-RACE.

    PubMed

    Zhang, Yanping; Bai, Youhuang; Han, Jian; Chen, Ming; Kayesh, Emrul; Jiang, Weibing; Fang, Jinggui

    2013-01-01

    We predicted 262 potential MicroRNAs (miRNAs) belonging to 70 miRNA families from the peach (Prunus persica) genome and two specific 5' and 3' miRNA rapid amplification of cDNA ends (miR-RACE) PCR reactions and sequence-directed cloning were employed to accurately validate 61 unique P. persica miRNAs (Ppe-miRNAs) sequences belonging to 61 families comprising 97 Ppe-miRNAs. Validation of the termini nucleotides in particular can define the real sequences of the Ppe-miRNAs on peach genome. Comparison between predicted and validated Ppe-miRNAs through alignment revealed that 43 unique orthologous sequences were identical, while the remaining 18 exhibited some divergences at their termini nucleotides. Quantitative real-time polymerase chain reaction (qRT-PCR) was further employed to analyze the expression of all the 61 miRNAs and 10 putative targets of 8 randomly selected Ppe-miRNAs in peach leaves, flowers and fruits at different stages of development, where both the miRNAs and the putative target genes showed tissue-specific expression.

  9. Towards a career in bioinformatics

    PubMed Central

    2009-01-01

    The 2009 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia's oldest bioinformatics organisation from 1998, was organized as the 8th International Conference on Bioinformatics (InCoB), Sept. 9-11, 2009 at Biopolis, Singapore. InCoB has actively engaged researchers from the area of life sciences, systems biology and clinicians, to facilitate greater synergy between these groups. To encourage bioinformatics students and new researchers, tutorials and student symposium, the Singapore Symposium on Computational Biology (SYMBIO) were organized, along with the Workshop on Education in Bioinformatics and Computational Biology (WEBCB) and the Clinical Bioinformatics (CBAS) Symposium. However, to many students and young researchers, pursuing a career in a multi-disciplinary area such as bioinformatics poses a Himalayan challenge. A collection to tips is presented here to provide signposts on the road to a career in bioinformatics. An overview of the application of bioinformatics to traditional and emerging areas, published in this supplement, is also presented to provide possible future avenues of bioinformatics investigation. A case study on the application of e-learning tools in undergraduate bioinformatics curriculum provides information on how to go impart targeted education, to sustain bioinformatics in the Asia-Pacific region. The next InCoB is scheduled to be held in Tokyo, Japan, Sept. 26-28, 2010. PMID:19958508

  10. From Molecules to Patients: The Clinical Applications of Translational Bioinformatics

    PubMed Central

    Regan, K.

    2015-01-01

    Summary Objective In order to realize the promise of personalized medicine, Translational Bioinformatics (TBI) research will need to continue to address implementation issues across the clinical spectrum. In this review, we aim to evaluate the expanding field of TBI towards clinical applications, and define common themes and current gaps in order to motivate future research. Methods Here we present the state-of-the-art of clinical implementation of TBI-based tools and resources. Our thematic analyses of a targeted literature search of recent TBI-related articles ranged across topics in genomics, data management, hypothesis generation, molecular epidemiology, diagnostics, therapeutics and personalized medicine. Results Open areas of clinically-relevant TBI research identified in this review include developing data standards and best practices, publicly available resources, integrative systems-level approaches, user-friendly tools for clinical support, cloud computing solutions, emerging technologies and means to address pressing legal, ethical and social issues. Conclusions There is a need for further research bridging the gap from foundational TBI-based theories and methodologies to clinical implementation. We have organized the topic themes presented in this review into four conceptual foci – domain analyses, knowledge engineering, computational architectures and computation methods alongside three stages of knowledge development in order to orient future TBI efforts to accelerate the goals of personalized medicine. PMID:26293863

  11. Genetics, Genomics and Cancer Risk Assessment: State of the art and future directions in the era of personalized medicine

    PubMed Central

    Weitzel, Jeffrey N.; Blazer, Kathleen R.; MacDonald, Deborah J.; Culver, Julie O.; Offit, Kenneth

    2012-01-01

    Scientific and technologic advances are revolutionizing our approach to genetic cancer risk assessment, cancer screening and prevention, and targeted therapy, fulfilling the promise of personalized medicine. In this monograph we review the evolution of scientific discovery in cancer genetics and genomics, and describe current approaches, benefits and barriers to the translation of this information to the practice of preventive medicine. Summaries of known hereditary cancer syndromes and highly penetrant genes are provided and contrasted with recently-discovered genomic variants associated with modest increases in cancer risk. We describe the scope of knowledge, tools, and expertise required for the translation of complex genetic and genomic test information into clinical practice. The challenges of genomic counseling include the need for genetics and genomics professional education and multidisciplinary team training, the need for evidence-based information regarding the clinical utility of testing for genomic variants, the potential dangers posed by premature marketing of first-generation genomic profiles, and the need for new clinical models to improve access to and responsible communication of complex disease-risk information. We conclude that given the experiences and lessons learned in the genetics era, the multidisciplinary model of genetic cancer risk assessment and management will serve as a solid foundation to support the integration of personalized genomic information into the practice of cancer medicine. PMID:21858794

  12. STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.

    PubMed

    Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T

    2014-01-01

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756

  13. STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

    PubMed Central

    Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.

    2014-01-01

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756

  14. Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and Minimum Information About a Bioinformatics investigation (MIABi).

    PubMed

    Tan, Tin Wee; Tong, Joo Chuan; Khan, Asif M; de Silva, Mark; Lim, Kuan Siong; Ranganathan, Shoba

    2010-12-02

    The 2010 International Conference on Bioinformatics, InCoB2010, which is the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet) has agreed to publish conference papers in compliance with the proposed Minimum Information about a Bioinformatics investigation (MIABi), proposed in June 2009. Authors of the conference supplements in BMC Bioinformatics, BMC Genomics and Immunome Research have consented to cooperate in this process, which will include the procedures described herein, where appropriate, to ensure data and software persistence and perpetuity, database and resource re-instantiability and reproducibility of results, author and contributor identity disambiguation and MIABi-compliance. Wherever possible, datasets and databases will be submitted to depositories with standardized terminologies. As standards are evolving, this process is intended as a prelude to the 100 BioDatabases (BioDB100) initiative whereby APBioNet collaborators will contribute exemplar databases to demonstrate the feasibility of standards-compliance and participate in refining the process for peer-review of such publications and validation of scientific claims and standards compliance. This testbed represents another step in advancing standards-based processes in the bioinformatics community which is essential to the growing interoperability of biological data, information, knowledge and computational resources.

  15. Feature selection in bioinformatics

    NASA Astrophysics Data System (ADS)

    Wang, Lipo

    2012-06-01

    In bioinformatics, there are often a large number of input features. For example, there are millions of single nucleotide polymorphisms (SNPs) that are genetic variations which determine the dierence between any two unrelated individuals. In microarrays, thousands of genes can be proled in each test. It is important to nd out which input features (e.g., SNPs or genes) are useful in classication of a certain group of people or diagnosis of a given disease. In this paper, we investigate some powerful feature selection techniques and apply them to problems in bioinformatics. We are able to identify a very small number of input features sucient for tasks at hand and we demonstrate this with some real-world data.

  16. Forensic DNA and bioinformatics.

    PubMed

    Bianchi, Lucia; Liò, Pietro

    2007-03-01

    The field of forensic science is increasingly based on biomolecular data and many European countries are establishing forensic databases to store DNA profiles of crime scenes of known offenders and apply DNA testing. The field is boosted by statistical and technological advances such as DNA microarray sequencing, TFT biosensors, machine learning algorithms, in particular Bayesian networks, which provide an effective way of evidence organization and inference. The aim of this article is to discuss the state of art potentialities of bioinformatics in forensic DNA science. We also discuss how bioinformatics will address issues related to privacy rights such as those raised from large scale integration of crime, public health and population genetic susceptibility-to-diseases databases.

  17. Phylogenetic trees in bioinformatics

    SciTech Connect

    Burr, Tom L

    2008-01-01

    Genetic data is often used to infer evolutionary relationships among a collection of viruses, bacteria, animal or plant species, or other operational taxonomic units (OTU). A phylogenetic tree depicts such relationships and provides a visual representation of the estimated branching order of the OTUs. Tree estimation is unique for several reasons, including: the types of data used to represent each OTU; the use ofprobabilistic nucleotide substitution models; the inference goals involving both tree topology and branch length, and the huge number of possible trees for a given sample of a very modest number of OTUs, which implies that fmding the best tree(s) to describe the genetic data for each OTU is computationally demanding. Bioinformatics is too large a field to review here. We focus on that aspect of bioinformatics that includes study of similarities in genetic data from multiple OTUs. Although research questions are diverse, a common underlying challenge is to estimate the evolutionary history of the OTUs. Therefore, this paper reviews the role of phylogenetic tree estimation in bioinformatics, available methods and software, and identifies areas for additional research and development.

  18. Tapping CD4 T cells for cancer immunotherapy: the choice of personalized genomics.

    PubMed

    Zanetti, Maurizio

    2015-03-01

    Cellular immune responses that protect against tumors typically have been attributed to CD8 T cells. However, CD4 T cells also play a central role. It was shown recently that, in a patient with metastatic cholangiocarcinoma, CD4 T cells specific for a peptide from a mutated region of ERBB2IP could arrest tumor progression. This and other recent findings highlight new opportunities for CD4 T cells in cancer immunotherapy. In this article, I discuss the role and regulation of CD4 T cells in response to tumor Ags. Emphasis is placed on the types of Ags and mechanisms that elicit tumor-protective responses. I discuss the advantages and drawbacks of cancer immunotherapy through personalized genomics. These considerations should help to guide the design of next-generation therapeutic cancer vaccines.

  19. Communication about DTC testing: commentary on a 'family experience of personal genomics'.

    PubMed

    Middleton, Anna

    2012-06-01

    This paper provides a commentary on 'Family Experience of Personal Genomics' (Corpas 2012). An overview is offered on the communication literature available to help support individuals and families to communicate about genetic information. Despite there being a wealth of evidence, built on years of genetic counseling practice, this does not appear to have been translated clearly to the Direct to Consumer (DTC) testing market. In many countries it is possible to order a DTC genetic test without the involvement of any health professional; there has been heated debate about whether this is appropriate or not. Much of the focus surrounding this has been on whether it is necessary to have a health professional available to offer their clinical knowledge and help with interpreting the DTC genetic test data. What has been missed from this debate is the importance of enabling customers of DTC testing services access to the abundance of information about how to communicate their genetic risks to others, including immediate family. Family communication about health and indeed genetics can be fraught with difficulty. Genetic health professionals, specifically genetic counselors, have particular expertise in family communication about genetics. Such information could be incredibly useful to kinships as they grapple with knowing how to communicate their genomic information with relatives.

  20. Agile parallel bioinformatics workflow management using Pwrake

    PubMed Central

    2011-01-01

    Background In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error. Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows. Findings We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows. Conclusions Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability

  1. Bioinformatics Pipeline for Transcriptome Sequencing Analysis.

    PubMed

    Djebali, Sarah; Wucher, Valentin; Foissac, Sylvain; Hitte, Christophe; Corre, Evan; Derrien, Thomas

    2017-01-01

    The development of High Throughput Sequencing (HTS) for RNA profiling (RNA-seq) has shed light on the diversity of transcriptomes. While RNA-seq is becoming a de facto standard for monitoring the population of expressed transcripts in a given condition at a specific time, processing the huge amount of data it generates requires dedicated bioinformatics programs. Here, we describe a standard bioinformatics protocol using state-of-the-art tools, the STAR mapper to align reads onto a reference genome, Cufflinks to reconstruct the transcriptome, and RSEM to quantify expression levels of genes and transcripts. We present the workflow using human transcriptome sequencing data from two biological replicates of the K562 cell line produced as part of the ENCODE3 project. PMID:27662878

  2. Genome-Wide Association Analysis of Eating Disorder-Related Symptoms, Behaviors, and Personality Traits

    PubMed Central

    Boraska, Vesna; Davis, Oliver SP; Cherkas, Lynn F; Helder, Sietske G; Harris, Juliette; Krug, Isabel; Pei-Chi Liao, Thomas; Treasure, Janet; Ntalla, Ioanna; Karhunen, Leila; Keski-Rahkonen, Anna; Christakopoulou, Danai; Raevuori, Anu; Shin, So-Youn; Dedoussis, George V; Kaprio, Jaakko; Soranzo, Nicole; Spector, Tim D; Collier, David A; Zeggini, Eleftheria

    2012-01-01

    Eating disorders (EDs) are common, complex psychiatric disorders thought to be caused by both genetic and environmental factors. They share many symptoms, behaviors, and personality traits, which may have overlapping heritability. The aim of the present study is to perform a genome-wide association scan (GWAS) of six ED phenotypes comprising three symptom traits from the Eating Disorders Inventory 2 [Drive for Thinness (DT), Body Dissatisfaction (BD), and Bulimia], Weight Fluctuation symptom, Breakfast Skipping behavior and Childhood Obsessive-Compulsive Personality Disorder trait (CHIRP). Investigated traits were derived from standardized self-report questionnaires completed by the TwinsUK population-based cohort. We tested 283,744 directly typed SNPs across six phenotypes of interest in the TwinsUK discovery dataset and followed-up signals from various strata using a two-stage replication strategy in two independent cohorts of European ancestry. We meta-analyzed a total of 2,698 individuals for DT, 2,680 for BD, 2,789 (821 cases/1,968 controls) for Bulimia, 1,360 (633 cases/727 controls) for Childhood Obsessive-Compulsive Personality Disorder trait, 2,773 (761 cases/2,012 controls) for Breakfast Skipping, and 2,967 (798 cases/2,169 controls) for Weight Fluctuation symptom. In this GWAS analysis of six ED-related phenotypes, we detected association of eight genetic variants with P < 10−5. Genetic variants that showed suggestive evidence of association were previously associated with several psychiatric disorders and ED-related phenotypes. Our study indicates that larger-scale collaborative studies will be needed to achieve the necessary power to detect loci underlying ED-related traits. © 2012 Wiley Periodicals, Inc. PMID:22911880

  3. Genome-wide association analysis of eating disorder-related symptoms, behaviors, and personality traits.

    PubMed

    Boraska, Vesna; Davis, Oliver S P; Cherkas, Lynn F; Helder, Sietske G; Harris, Juliette; Krug, Isabel; Liao, Thomas Pei-Chi; Treasure, Janet; Ntalla, Ioanna; Karhunen, Leila; Keski-Rahkonen, Anna; Christakopoulou, Danai; Raevuori, Anu; Shin, So-Youn; Dedoussis, George V; Kaprio, Jaakko; Soranzo, Nicole; Spector, Tim D; Collier, David A; Zeggini, Eleftheria

    2012-10-01

    Eating disorders (EDs) are common, complex psychiatric disorders thought to be caused by both genetic and environmental factors. They share many symptoms, behaviors, and personality traits, which may have overlapping heritability. The aim of the present study is to perform a genome-wide association scan (GWAS) of six ED phenotypes comprising three symptom traits from the Eating Disorders Inventory 2 [Drive for Thinness (DT), Body Dissatisfaction (BD), and Bulimia], Weight Fluctuation symptom, Breakfast Skipping behavior and Childhood Obsessive-Compulsive Personality Disorder trait (CHIRP). Investigated traits were derived from standardized self-report questionnaires completed by the TwinsUK population-based cohort. We tested 283,744 directly typed SNPs across six phenotypes of interest in the TwinsUK discovery dataset and followed-up signals from various strata using a two-stage replication strategy in two independent cohorts of European ancestry. We meta-analyzed a total of 2,698 individuals for DT, 2,680 for BD, 2,789 (821 cases/1,968 controls) for Bulimia, 1,360 (633 cases/727 controls) for Childhood Obsessive-Compulsive Personality Disorder trait, 2,773 (761 cases/2,012 controls) for Breakfast Skipping, and 2,967 (798 cases/2,169 controls) for Weight Fluctuation symptom. In this GWAS analysis of six ED-related phenotypes, we detected association of eight genetic variants with P < 10(-5) . Genetic variants that showed suggestive evidence of association were previously associated with several psychiatric disorders and ED-related phenotypes. Our study indicates that larger-scale collaborative studies will be needed to achieve the necessary power to detect loci underlying ED-related traits.

  4. THE PATIENT AS PERSON IN AN INCREASINGLY GENE-CENTRIC UNIVERSE: HOW HEALTHCARE PROFESSIONALS SHOULD THINK ABOUT GENOMICS AND EVOLUTION

    PubMed Central

    Jackson, Timothy P.

    2009-01-01

    In the past, the primary threat to the patient as person was a medical utilitarianism that would sacrifice the individual for the collective, that would coercively (ab)use a person for the sake of an in-group’s health or happiness. Today, the threat is not only from vainglorious social groups but also from valorized genes and genomes. An over-valuation of genes risks making persons seem epiphenomenal. A central thesis of this paper is that religious healthcare professionals have unique resources to combat this. PMID:19170083

  5. Bioinformatics-Aided Venomics

    PubMed Central

    Kaas, Quentin; Craik, David J.

    2015-01-01

    Venomics is a modern approach that combines transcriptomics and proteomics to explore the toxin content of venoms. This review will give an overview of computational approaches that have been created to classify and consolidate venomics data, as well as algorithms that have helped discovery and analysis of toxin nucleic acid and protein sequences, toxin three-dimensional structures and toxin functions. Bioinformatics is used to tackle specific challenges associated with the identification and annotations of toxins. Recognizing toxin transcript sequences among second generation sequencing data cannot rely only on basic sequence similarity because toxins are highly divergent. Mass spectrometry sequencing of mature toxins is challenging because toxins can display a large number of post-translational modifications. Identifying the mature toxin region in toxin precursor sequences requires the prediction of the cleavage sites of proprotein convertases, most of which are unknown or not well characterized. Tracing the evolutionary relationships between toxins should consider specific mechanisms of rapid evolution as well as interactions between predatory animals and prey. Rapidly determining the activity of toxins is the main bottleneck in venomics discovery, but some recent bioinformatics and molecular modeling approaches give hope that accurate predictions of toxin specificity could be made in the near future. PMID:26110505

  6. Pattern recognition in bioinformatics.

    PubMed

    de Ridder, Dick; de Ridder, Jeroen; Reinders, Marcel J T

    2013-09-01

    Pattern recognition is concerned with the development of systems that learn to solve a given problem using a set of example instances, each represented by a number of features. These problems include clustering, the grouping of similar instances; classification, the task of assigning a discrete label to a given instance; and dimensionality reduction, combining or selecting features to arrive at a more useful representation. The use of statistical pattern recognition algorithms in bioinformatics is pervasive. Classification and clustering are often applied to high-throughput measurement data arising from microarray, mass spectrometry and next-generation sequencing experiments for selecting markers, predicting phenotype and grouping objects or genes. Less explicitly, classification is at the core of a wide range of tools such as predictors of genes, protein function, functional or genetic interactions, etc., and used extensively in systems biology. A course on pattern recognition (or machine learning) should therefore be at the core of any bioinformatics education program. In this review, we discuss the main elements of a pattern recognition course, based on material developed for courses taught at the BSc, MSc and PhD levels to an audience of bioinformaticians, computer scientists and life scientists. We pay attention to common problems and pitfalls encountered in applications and in interpretation of the results obtained.

  7. Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills.

    PubMed

    Wightman, Bruce; Hark, Amy T

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student "buy-in" and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative.

  8. The UCSC Genome Browser.

    PubMed

    Karolchik, Donna; Hinrichs, Angie S; Kent, W James

    2012-12-01

    The University of California Santa Cruz (UCSC) Genome Browser is a popular Web-based tool for quickly displaying a requested portion of a genome at any scale, accompanied by a series of aligned annotation "tracks." The annotations generated by the UCSC Genome Bioinformatics Group and external collaborators include gene predictions, mRNA and expressed sequence tag alignments, simple nucleotide polymorphisms, expression and regulatory data, phenotype and variation data, and pairwise and multiple-species comparative genomics data. All information relevant to a region is presented in one window, facilitating biological analysis and interpretation. The database tables underlying the Genome Browser tracks can be viewed, downloaded, and manipulated using another Web-based application, the UCSC Table Browser. Users can upload personal datasets in a wide variety of formats as custom annotation tracks in both browsers for research or educational purposes. This unit describes how to use the Genome Browser and Table Browser for genome analysis, download the underlying database tables, and create and display custom annotation tracks.

  9. Integration of Bioinformatics into an Undergraduate Biology Curriculum and the Impact on Development of Mathematical Skills

    ERIC Educational Resources Information Center

    Wightman, Bruce; Hark, Amy T.

    2012-01-01

    The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this…

  10. Making Bioinformatics Projects a Meaningful Experience in an Undergraduate Biotechnology or Biomedical Science Programme

    ERIC Educational Resources Information Center

    Sutcliffe, Iain C.; Cummings, Stephen P.

    2007-01-01

    Bioinformatics has emerged as an important discipline within the biological sciences that allows scientists to decipher and manage the vast quantities of data (such as genome sequences) that are now available. Consequently, there is an obvious need to provide graduates in biosciences with generic, transferable skills in bioinformatics. We present…

  11. The potential of translational bioinformatics approaches for pharmacology research

    PubMed Central

    Li, Lang

    2015-01-01

    The field of bioinformatics has allowed the interpretation of massive amounts of biological data, ushering in the era of ‘omics’ to biomedical research. Its potential impact on pharmacology research is enormous and it has shown some emerging successes. A full realization of this potential, however, requires standardized data annotation for large health record databases and molecular data resources. Improved standardization will further stimulate the development of system pharmacology models, using translational bioinformatics methods. This new translational bioinformatics paradigm is highly complementary to current pharmacological research fields, such as personalized medicine, pharmacoepidemiology and drug discovery. In this review, I illustrate the application of transformational bioinformatics to research in numerous pharmacology subdisciplines. PMID:25753093

  12. Virtual Bioinformatics Distance Learning Suite

    ERIC Educational Resources Information Center

    Tolvanen, Martti; Vihinen, Mauno

    2004-01-01

    Distance learning as a computer-aided concept allows students to take courses from anywhere at any time. In bioinformatics, computers are needed to collect, store, process, and analyze massive amounts of biological and biomedical data. We have applied the concept of distance learning in virtual bioinformatics to provide university course material…

  13. Channelrhodopsins: a bioinformatics perspective.

    PubMed

    Del Val, Coral; Royuela-Flor, José; Milenkovic, Stefan; Bondar, Ana-Nicoleta

    2014-05-01

    Channelrhodopsins are microbial-type rhodopsins that function as light-gated cation channels. Understanding how the detailed architecture of the protein governs its dynamics and specificity for ions is important, because it has the potential to assist in designing site-directed channelrhodopsin mutants for specific neurobiology applications. Here we use bioinformatics methods to derive accurate alignments of channelrhodopsin sequences, assess the sequence conservation patterns and find conserved motifs in channelrhodopsins, and use homology modeling to construct three-dimensional structural models of channelrhodopsins. The analyses reveal that helices C and D of channelrhodopsins contain Cys, Ser, and Thr groups that can engage in both intra- and inter-helical hydrogen bonds. We propose that these polar groups participate in inter-helical hydrogen-bonding clusters important for the protein conformational dynamics and for the local water interactions. This article is part of a Special Issue entitled: Retinal Proteins - You can teach an old dog new tricks. PMID:24252597

  14. Intrageneric Primer Design: Bringing Bioinformatics Tools to the Class

    ERIC Educational Resources Information Center

    Lima, Andre O. S.; Garces, Sergio P. S.

    2006-01-01

    Bioinformatics is one of the fastest growing scientific areas over the last decade. It focuses on the use of informatics tools for the organization and analysis of biological data. An example of their importance is the availability nowadays of dozens of software programs for genomic and proteomic studies. Thus, there is a growing field (private…

  15. Advancing Pharmacogenomics Education in the Core PharmD Curriculum through Student Personal Genomic Testing

    PubMed Central

    Adams, Solomon M.; Anderson, Kacey B.; Coons, James C.; Smith, Randall B.; Meyer, Susan M.; Parker, Lisa S.

    2016-01-01

    Objective. To develop, implement, and evaluate “Test2Learn” a program to enhance pharmacogenomics education through the use of personal genomic testing (PGT) and real genetic data. Design. One hundred twenty-two second-year doctor of pharmacy (PharmD) students in a required course were offered PGT as part of a larger program approach to teach pharmacogenomics within a robust ethical framework. The program added novel learning objectives, lecture materials, analysis tools, and exercises using individual-level and population-level genetic data. Outcomes were assessed with objective measures and pre/post survey instruments. Assessment. One hundred students (82%) underwent PGT. Knowledge significantly improved on multiple assessments. Genotyped students reported a greater increase in confidence in understanding test results by the end of the course. Similarly, undergoing PGT improved student’s self-perceived ability to empathize with patients compared to those not genotyped. Most students (71%) reported feeling PGT was an important part of the course, and 60% reported they had a better understanding of pharmacogenomics specifically because of the opportunity. Conclusion. Implementation of PGT in the core pharmacy curriculum was feasible, well-received, and enhanced student learning of pharmacogenomics. PMID:26941429

  16. Direct-to-Consumer Genetic Testing and Personal Genomics Services: A Review of Recent Empirical Studies

    PubMed Central

    Ostergren, Jenny

    2013-01-01

    Direct-to-consumer genetic testing (DTC-GT) has sparked much controversy and undergone dramatic changes in its brief history. Debates over appropriate health policies regarding DTC-GT would benefit from empirical research on its benefits, harms, and limitations. We review the recent literature (2011-present) and summarize findings across (1) content analyses of DTC-GT websites, (2) studies of consumer perspectives and experiences, and (3) surveys of relevant health care providers. Findings suggest that neither the health benefits envisioned by DTC-GT proponents (e.g., significant improvements in positive health behaviors) nor the worst fears expressed by its critics (e.g., catastrophic psychological distress and misunderstanding of test results, undue burden on the health care system) have materialized to date. However, research in this area is in its early stages and possesses numerous key limitations. We note needs for future studies to illuminate the impact of DTC-GT and thereby guide practice and policy regarding this rapidly evolving approach to personal genomics. PMID:24058877

  17. Social Networkers’ Attitudes Toward Direct-to-Consumer Personal Genome Testing

    PubMed Central

    McGuire, Amy L.; Diaz, Christina M.; Wang, Tao; Hilsenbeck, Susan G.

    2009-01-01

    Purpose This study explores social networkers’ interest in and attitudes toward personal genome testing (PGT), focusing on expectations related to the clinical integration of PGT results. Methods An online survey of 1,087 social networking users was conducted to assess 1) use and interest in PGT; 2) attitudes toward PGT companies and test results; and 3) expectations for the clinical integration of PGT. Descriptive statistics were calculated to summarize respondents’ characteristics and responses. Results Six percent of respondents have used PGT, 64% would consider using PGT, and 30% would not use PGT. Of those who would consider using PGT, 74% would use it to gain knowledge about disease in their family. Of all respondents, 34% consider the information obtained from PGT to be a medical diagnosis. Of all respondents, 78% of those who would consider PGT would ask their physician for help interpreting test results, and 61% of all respondents believe that physicians have a professional obligation to help individuals interpret PGT results. Conclusion Respondents express interest in using PGT services, primarily for purposes related to their medical care and expect physicians to help interpret PGT results. Physicians should therefore be prepared for patient demands for information and counsel on the basis of PGT results. PMID:19998099

  18. Multiplex Y-STRs analysis using the ion torrent personal genome machine (PGM).

    PubMed

    Zhao, Xueying; Ma, Ke; Li, Hui; Cao, Yu; Liu, Wenbin; Zhou, Huaigu; Ping, Yuan

    2015-11-01

    Massively parallel sequencing (MPS) technologies allow parallel sequencing analyses of many targeted regions of multiple samples at desirable depth of coverage. Routine use of MPS for forensic genetics is on the horizon. In this study, we explore the application of MPS technology in forensic Y-STR analysis. We designed a multiplex assay with 13 Y-STR loci (DYS19, DYS389 I, DYS389 II, DYS390, DYS391, DYS392, DYS437, DYS438, DYS439, DYS448, DYS456, DYS635, GATA-H4) for the purpose of MPS. The multiplex Y-STR assay was amplified in 42 unrelated male individuals and amplicons were sequenced simultaneously using the ion torrent personal genome machine (PGM) system. All loci were detected successfully, except for DYS389 II that exhibited a failure rate of 1.8% due to the relatively long amplicon sizes. We observed 7, 3, 2, 6 and 5 new alleles, respectively in DYS389 II, DYS390, DYS437, DYS448 and DYS635 due to the presence of sub-repeat composition differences, and a new allele in DYS438 because of nucleotide substitution. One allele of DYS390 was inconsistent with allele call from conventional capillary electrophoresis (CE) because of 4 bp deletions upstream of the core repeat unit. This study demonstrates that Y-STR typing by MPS can provide more genetic information, holding the promise for high discriminatory power. PMID:26247785

  19. Erosion of Conserved Binding Sites in Personal Genomes Points to Medical Histories.

    PubMed

    Guturu, Harendra; Chinchali, Sandeep; Clarke, Shoa L; Bejerano, Gill

    2016-02-01

    Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual's variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates "abnormal cardiac output" for a patient with a longstanding family history of heart disease, "decreased circulating sodium level" for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them. PMID:26845687

  20. Erosion of Conserved Binding Sites in Personal Genomes Points to Medical Histories

    PubMed Central

    Guturu, Harendra; Chinchali, Sandeep; Clarke, Shoa L.; Bejerano, Gill

    2016-01-01

    Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual’s variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates “abnormal cardiac output” for a patient with a longstanding family history of heart disease, “decreased circulating sodium level” for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them. PMID:26845687

  1. Genome-wide association study of the five-factor model of personality in young Korean women.

    PubMed

    Kim, Han-Na; Roh, Seung-Ju; Sung, Yeon Ah; Chung, Hye Won; Lee, Jong-Young; Cho, Juhee; Shin, Hocheol; Kim, Hyung-Lae

    2013-10-01

    Personality is a determinant of behavior and lifestyle associated with health and human diseases. Although personality is known to be a heritable trait, its polygenic nature has made the identification of genetic variants elusive. We performed a genome-wide association study on 1089 Korean women aged 18-40 years whose personality traits were measured with the Revised NEO Personality Inventory for the five-factor model of personality. To reduce environmental factors that may influence personality traits, this study was restricted to young adult women. In the discovery phase, we identified variants of PTPRD (protein tyrosine phosphatase, receptor type D) that associated this gene with the Openness domain. Other genes that were previously reported to be associated with neurological phenotypes were also associated with personality traits. In particular, DRD1 and OR1A2 were linked to Neuroticism, NKAIN2 with Extraversion, HTR5A with Openness and DRD3 with Agreeableness. Data from our replication study of 2090 subjects confirmed the association between OR1A2 and Neuroticism. We first identified and confirmed a novel region on OR1A2 associated with Neuroticism [corrected]. Candidate genes for psychiatric disorders were also enriched. These findings contribute to our understanding of the genetic architecture of personality traits and provide critical clues to the neurobiological mechanisms that influence them.

  2. Bioinformatics of cardiovascular miRNA biology.

    PubMed

    Kunz, Meik; Xiao, Ke; Liang, Chunguang; Viereck, Janika; Pachel, Christina; Frantz, Stefan; Thum, Thomas; Dandekar, Thomas

    2015-12-01

    MicroRNAs (miRNAs) are small ~22 nucleotide non-coding RNAs and are highly conserved among species. Moreover, miRNAs regulate gene expression of a large number of genes associated with important biological functions and signaling pathways. Recently, several miRNAs have been found to be associated with cardiovascular diseases. Thus, investigating the complex regulatory effect of miRNAs may lead to a better understanding of their functional role in the heart. To achieve this, bioinformatics approaches have to be coupled with validation and screening experiments to understand the complex interactions of miRNAs with the genome. This will boost the subsequent development of diagnostic markers and our understanding of the physiological and therapeutic role of miRNAs in cardiac remodeling. In this review, we focus on and explain different bioinformatics strategies and algorithms for the identification and analysis of miRNAs and their regulatory elements to better understand cardiac miRNA biology. Starting with the biogenesis of miRNAs, we present approaches such as LocARNA and miRBase for combining sequence and structure analysis including phylogenetic comparisons as well as detailed analysis of RNA folding patterns, functional target prediction, signaling pathway as well as functional analysis. We also show how far bioinformatics helps to tackle the unprecedented level of complexity and systemic effects by miRNA, underlining the strong therapeutic potential of miRNA and miRNA target structures in cardiovascular disease. In addition, we discuss drawbacks and limitations of bioinformatics algorithms and the necessity of experimental approaches for miRNA target identification. This article is part of a Special Issue entitled 'Non-coding RNAs'.

  3. Bioinformatics in High School Biology Curricula: A Study of State Science Standards

    PubMed Central

    Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students. PMID:18316818

  4. Bioinformatics in high school biology curricula: a study of state science standards.

    PubMed

    Wefer, Stephen H; Sheppard, Keith

    2008-01-01

    The proliferation of bioinformatics in modern biology marks a modern revolution in science that promises to influence science education at all levels. This study analyzed secondary school science standards of 49 U.S. states (Iowa has no science framework) and the District of Columbia for content related to bioinformatics. The bioinformatics content of each state's biology standards was analyzed and categorized into nine areas: Human Genome Project/genomics, forensics, evolution, classification, nucleotide variations, medicine, computer use, agriculture/food technology, and science technology and society/socioscientific issues. Findings indicated a generally low representation of bioinformatics-related content, which varied substantially across the different areas, with Human Genome Project/genomics and computer use being the lowest (8%), and evolution being the highest (64%) among states' science frameworks. This essay concludes with recommendations for reworking/rewording existing standards to facilitate the goal of promoting science literacy among secondary school students.

  5. Translational bioinformatics in psychoneuroimmunology: methods and applications.

    PubMed

    Yan, Qing

    2012-01-01

    Translational bioinformatics plays an indispensable role in transforming psychoneuroimmunology (PNI) into personalized medicine. It provides a powerful method to bridge the gaps between various knowledge domains in PNI and systems biology. Translational bioinformatics methods at various systems levels can facilitate pattern recognition, and expedite and validate the discovery of systemic biomarkers to allow their incorporation into clinical trials and outcome assessments. Analysis of the correlations between genotypes and phenotypes including the behavioral-based profiles will contribute to the transition from the disease-based medicine to human-centered medicine. Translational bioinformatics would also enable the establishment of predictive models for patient responses to diseases, vaccines, and drugs. In PNI research, the development of systems biology models such as those of the neurons would play a critical role. Methods based on data integration, data mining, and knowledge representation are essential elements in building health information systems such as electronic health records and computerized decision support systems. Data integration of genes, pathophysiology, and behaviors are needed for a broad range of PNI studies. Knowledge discovery approaches such as network-based systems biology methods are valuable in studying the cross-talks among pathways in various brain regions involved in disorders such as Alzheimer's disease.

  6. The Information Technology Infrastructure for the Translational Genomics Core and the Partners Biobank at Partners Personalized Medicine.

    PubMed

    Boutin, Natalie; Holzbach, Ana; Mahanta, Lisa; Aldama, Jackie; Cerretani, Xander; Embree, Kevin; Leon, Irene; Rathi, Neeta; Vickers, Matilde

    2016-01-01

    The Biobank and Translational Genomics core at Partners Personalized Medicine requires robust software and hardware. This Information Technology (IT) infrastructure enables the storage and transfer of large amounts of data, drives efficiencies in the laboratory, maintains data integrity from the time of consent to the time that genomic data is distributed for research, and enables the management of complex genetic data. Here, we describe the functional components of the research IT infrastructure at Partners Personalized Medicine and how they integrate with existing clinical and research systems, review some of the ways in which this IT infrastructure maintains data integrity and security, and discuss some of the challenges inherent to building and maintaining such infrastructure.

  7. The Information Technology Infrastructure for the Translational Genomics Core and the Partners Biobank at Partners Personalized Medicine

    PubMed Central

    Boutin, Natalie; Holzbach, Ana; Mahanta, Lisa; Aldama, Jackie; Cerretani, Xander; Embree, Kevin; Leon, Irene; Rathi, Neeta; Vickers, Matilde

    2016-01-01

    The Biobank and Translational Genomics core at Partners Personalized Medicine requires robust software and hardware. This Information Technology (IT) infrastructure enables the storage and transfer of large amounts of data, drives efficiencies in the laboratory, maintains data integrity from the time of consent to the time that genomic data is distributed for research, and enables the management of complex genetic data. Here, we describe the functional components of the research IT infrastructure at Partners Personalized Medicine and how they integrate with existing clinical and research systems, review some of the ways in which this IT infrastructure maintains data integrity and security, and discuss some of the challenges inherent to building and maintaining such infrastructure. PMID:26805892

  8. CattleTickBase: An integrated Internet-based bioinformatics resource for Rhipicephalus (Boophilus) microplus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Rhipicephalus microplus genome is large and complex in structure, making a genome sequence difficult to assemble and costly to resource the required bioinformatics. In light of this, a consortium of international collaborators was formed to pool resources to begin sequencing this genome. We have...

  9. Bioethical and clinical dilemmas of direct-to-consumer personal genomic testing: the problem of misattributed equivalence.

    PubMed

    Eng, Charis; Sharp, Richard R

    2010-02-01

    A number of for-profit companies now provide personal genomic testing services to clients directly, without input from a physician or other health care provider, and the results of these tests include predictions about a broad spectrum of disease risks and traits. Validated clinical genetic testing and direct-to-consumer (DTC) genomic tests differ substantially in their reliability and usefulness, raising many clinical, ethical, and societal challenges, which are discussed in this Commentary. Of special concern is the problem of misattributed equivalence, which occurs when a patient or physician mistakenly views alternative methods of genetic evaluation as equivalent in their results and analytic rigor. Despite the many challenges raised by DTC genomic testing, we are reminded that commercial interests have sometimes acted as a disruptive force or technology that drives nonconventional approaches to difficult problems.

  10. Bioethical and clinical dilemmas of direct-to-consumer personal genomic testing: the problem of misattributed equivalence.

    PubMed

    Eng, Charis; Sharp, Richard R

    2010-02-01

    A number of for-profit companies now provide personal genomic testing services to clients directly, without input from a physician or other health care provider, and the results of these tests include predictions about a broad spectrum of disease risks and traits. Validated clinical genetic testing and direct-to-consumer (DTC) genomic tests differ substantially in their reliability and usefulness, raising many clinical, ethical, and societal challenges, which are discussed in this Commentary. Of special concern is the problem of misattributed equivalence, which occurs when a patient or physician mistakenly views alternative methods of genetic evaluation as equivalent in their results and analytic rigor. Despite the many challenges raised by DTC genomic testing, we are reminded that commercial interests have sometimes acted as a disruptive force or technology that drives nonconventional approaches to difficult problems. PMID:20371476

  11. Bioinformatic Primer for Clinical and Translational Science

    PubMed Central

    Faustino, Randolph S.; Chiriac, Anca; Terzic, Andre

    2009-01-01

    The advent of high-throughput technologies has accelerated generation and expansion of genomic, transcriptomic, and proteomic data. Acquisition of high-dimensional datasets requires archival systems that permit efficiency of storage and retrieval, and so, multiple electronic repositories have been initiated and maintained to meet this demand. Bioinformatic science has evolved, from these intricate bodies of dynamically updated information and the tools to manage them, as a necessity to harness and decipher the inherent complexity of high-volume data. Large datasets are associated with a variable degree of stochastic noise that contributes to the balance of an ordered, multistable state with the capacity to evolve in response to stimulus, thus exhibiting a hallmark feature of biological criticality. In this context, the network theory has become an invaluable tool to map relationships that integrate discrete elements that collectively direct global function within a particular –omic category, and indeed, the prioritized focus on the functional whole of the genomic, transcriptomic, or proteomic strata over single molecules is a primary tenet of systems biology analyses. This new biology perspective allows inspection and prediction of disease conditions, not limited to a monogenic challenge, but as a combination of individualized molecular permutations acting in concert to effect a phenotypic outcome. Bioinformatic integration of multidimensional data within and between biological layers thus harbors the potential to identify unique biological signatures, providing an enabling platform for advances in clinical and translational science. PMID:19690627

  12. Bioinformatics for cancer immunology and immunotherapy.

    PubMed

    Charoentong, Pornpimol; Angelova, Mihaela; Efremova, Mirjana; Gallasch, Ralf; Hackl, Hubert; Galon, Jerome; Trajanoski, Zlatko

    2012-11-01

    Recent mechanistic insights obtained from preclinical studies and the approval of the first immunotherapies has motivated increasing number of academic investigators and pharmaceutical/biotech companies to further elucidate the role of immunity in tumor pathogenesis and to reconsider the role of immunotherapy. Additionally, technological advances (e.g., next-generation sequencing) are providing unprecedented opportunities to draw a comprehensive picture of the tumor genomics landscape and ultimately enable individualized treatment. However, the increasing complexity of the generated data and the plethora of bioinformatics methods and tools pose considerable challenges to both tumor immunologists and clinical oncologists. In this review, we describe current concepts and future challenges for the management and analysis of data for cancer immunology and immunotherapy. We first highlight publicly available databases with specific focus on cancer immunology including databases for somatic mutations and epitope databases. We then give an overview of the bioinformatics methods for the analysis of next-generation sequencing data (whole-genome and exome sequencing), epitope prediction tools as well as methods for integrative data analysis and network modeling. Mathematical models are powerful tools that can predict and explain important patterns in the genetic and clinical progression of cancer. Therefore, a survey of mathematical models for tumor evolution and tumor-immune cell interaction is included. Finally, we discuss future challenges for individualized immunotherapy and suggest how a combined computational/experimental approaches can lead to new insights into the molecular mechanisms of cancer, improved diagnosis, and prognosis of the disease and pinpoint novel therapeutic targets.

  13. Personality.

    PubMed

    Funder, D C

    2001-01-01

    Personality psychology is as active today as at any point in its history. The classic psychoanalytic and trait paradigms are active areas of research, the behaviorist paradigm has evolved into a new social-cognitive paradigm, and the humanistic paradigm is a basis of current work on cross-cultural psychology. Biology and evolutionary theory have also attained the status of new paradigms for personality. Three challenges for the next generation of research are to integrate these disparate approaches to personality (particularly the trait and social-cognitive paradigms), to remedy the imbalance in the person-situation-behavior triad by conceptualizing the basic properties of situations and behaviors, and to add to personality psychology's thin inventory of basic facts concerning the relations between personality and behavior.

  14. Incorporating a Collaborative Web-Based Virtual Laboratory in an Undergraduate Bioinformatics Course

    ERIC Educational Resources Information Center

    Weisman, David

    2010-01-01

    Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a…

  15. Utilizing the Molecular Gateway: The Path to Personalized Cancer Management

    PubMed Central

    Overdevest, Jonathan B.; Theodorescu, Dan; Lee, Jae K.

    2015-01-01

    BACKGROUND Personalized medicine is the provision of focused prevention, detection, prognostic, and therapeutic efforts according to an individual’s genetic composition. The actualization of personalized medicine will require combining a patient’s conventional clinical data with bioinformatics-based molecular-assessment profiles. This synergistic approach offers tangible benefits, such as heightened specificity in the molecular classification of cancer subtypes, improved prognostic accuracy, targeted development of new therapies, novel applications for old therapies, and tailored selection and delivery of chemotherapeutics. CONTENT Our ability to personalize cancer management is rapidly expanding through biotechnological advances in the postgenomic era. The platforms of genomics, proteomics, single-nucleotide polymorphism profiling and haplotype mapping, high-throughput genomic sequencing, and pharmacogenomics constitute the mechanisms for the molecular assessment of a patient’s tumor. The complementary data derived during these assessments is processed through bioinformatics analysis to offer unique insights for linking expression profiles to disease detection, tumor response to chemotherapy, and patient survival. Together, these approaches permit improved physician capacity to assess risk, target therapies, and tailor a chemotherapeutic treatment course. SUMMARY Personalized medicine is poised for rapid growth as the insights provided by new bioinformatics models are integrated with current procedures for assessing and treating cancer patients. Integration of these biological platforms will require refinement of tissue-processing and analysis techniques, particularly in clinical pathology, to overcome obstacles in customizing our ability to treat cancer. PMID:19246616

  16. Borderline personality disorder and childhood maltreatment: a genome-wide methylation analysis.

    PubMed

    Prados, J; Stenz, L; Courtet, P; Prada, P; Nicastro, R; Adouan, W; Guillaume, S; Olié, E; Aubry, J-M; Dayer, A; Perroud, N

    2015-02-01

    Early life adversity plays a critical role in the emergence of borderline personality disorder (BPD) and this could occur through epigenetic programming. In this perspective, we aimed to determine whether childhood maltreatment could durably modify epigenetic processes by the means of a whole-genome methylation scan of BPD subjects. Using the Illumina Infinium® HumanMethylation450 BeadChip, global methylation status of DNA extracted from peripheral blood leucocytes was correlated to the severity of childhood maltreatment in 96 BPD subjects suffering from a high level of child adversity and 93 subjects suffering from major depressive disorder (MDD) and reporting a low rate of child maltreatment. Several CpGs within or near the following genes (IL17RA, miR124-3, KCNQ2, EFNB1, OCA2, MFAP2, RPH3AL, WDR60, CST9L, EP400, A2ML1, NT5DC2, FAM163A and SPSB2) were found to be differently methylated, either in BPD compared with MDD or in relation to the severity of childhood maltreatment. A highly relevant biological result was observed for cg04927004 close to miR124-3 that was significantly associated with BPD and severity of childhood maltreatment. miR124-3 codes for a microRNA (miRNA) targeting several genes previously found to be associated with BPD such as NR3C1. Our results highlight the potentially important role played by miRNAs in the etiology of neuropsychiatric disorders such as BPD and the usefulness of using methylome-wide association studies to uncover such candidate genes. Moreover, they offer new understanding of the impact of maltreatments on biological processes leading to diseases and may ultimately result in the identification of relevant biomarkers. PMID:25612291

  17. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software.

    PubMed

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians.

  18. Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software

    PubMed Central

    Lawlor, Brendan; Walsh, Paul

    2015-01-01

    There is a lack of software engineering skills in bioinformatic contexts. We discuss the consequences of this lack, examine existing explanations and remedies to the problem, point out their shortcomings, and propose alternatives. Previous analyses of the problem have tended to treat the use of software in scientific contexts as categorically different from the general application of software engineering in commercial settings. In contrast, we describe bioinformatic software engineering as a specialization of general software engineering, and examine how it should be practiced. Specifically, we highlight the difference between programming and software engineering, list elements of the latter and present the results of a survey of bioinformatic practitioners which quantifies the extent to which those elements are employed in bioinformatics. We propose that the ideal way to bring engineering values into research projects is to bring engineers themselves. We identify the role of Bioinformatic Engineer and describe how such a role would work within bioinformatic research teams. We conclude by recommending an educational emphasis on cross-training software engineers into life sciences, and propose research on Domain Specific Languages to facilitate collaboration between engineers and bioinformaticians. PMID:25996054

  19. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  20. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  1. The Scientific Foundation for Personal Genomics: Recommendations from a National Institutes of Health–Centers for Disease Control and Prevention Multidisciplinary Workshop

    PubMed Central

    Khoury, Muin J.; McBride, Colleen M.; Schully, Sheri D.; Ioannidis, John P. A.; Feero, W. Gregory; Janssens, A. Cecile J. W.; Gwinn, Marta; Simons-Morton, Denise G.; Bernhardt, Jay M.; Cargill, Michele; Chanock, Stephen J.; Church, George M.; Coates, Ralph J.; Collins, Francis S.; Croyle, Robert T.; Davis, Barry R.; Downing, Gregory J.; DuRoss, Amy; Friedman, Susan; Gail, Mitchell H.; Ginsburg, Geoffrey S.; Green, Robert C.; Greene, Mark H.; Greenland, Philip; Gulcher, Jeffrey R.; Hsu, Andro; Hudson, Kathy L.; Kardia, Sharon L. R.; Kimmel, Paul L.; Lauer, Michael S.; Miller, Amy M.; Offit, Kenneth; Ransohoff, David F.; Roberts, J. Scott; Rasooly, Rebekah S.; Stefansson, Kari; Terry, Sharon F.; Teutsch, Steven M.; Trepanier, Angela; Wanke, Kay L.; Witte, John S.; Xu, Jianfeng

    2010-01-01

    The increasing availability of personal genomic tests has led to discussions about the validity and utility of such tests and the balance of benefits and harms. A multidisciplinary workshop was convened by the National Institutes of Health and the Centers for Disease Control and Prevention to review the scientific foundation for using personal genomics in risk assessment and disease prevention and to develop recommendations for targeted research. The clinical validity and utility of personal genomics is a moving target with rapidly developing discoveries but little translation research to close the gap between discoveries and health impact. Workshop participants made recommendations in five domains: (1) developing and applying scientific standards for assessing personal genomic tests; (2) developing and applying a multidisciplinary research agenda, including observational studies and clinical trials to fill knowledge gaps in clinical validity and utility; (3) enhancing credible knowledge synthesis and information dissemination to clinicians and consumers; (4) linking scientific findings to evidence-based recommendations for use of personal genomics; and (5) assessing how the concept of personal utility can affect health benefits, costs, and risks by developing appropriate metrics for evaluation. To fulfill the promise of personal genomics, a rigorous multidisciplinary research agenda is needed. PMID:19617843

  2. Personalization.

    ERIC Educational Resources Information Center

    Shore, Rebecca Martin

    1996-01-01

    Describes how a typical high school in Huntington Beach, California, curbed disruptive student behavior by personalizing the school experience for "problem" students. Through mostly volunteer efforts, an adopt-a-kid program was initiated that matched kids' learning styles to adults' personality styles and resulted in fewer suspensions and numerous…

  3. Bioinformatic Identification of Conserved Cis-Sequences in Coregulated Genes.

    PubMed

    Bülow, Lorenz; Hehl, Reinhard

    2016-01-01

    Bioinformatics tools can be employed to identify conserved cis-sequences in sets of coregulated plant genes because more and more gene expression and genomic sequence data become available. Knowledge on the specific cis-sequences, their enrichment and arrangement within promoters, facilitates the design of functional synthetic plant promoters that are responsive to specific stresses. The present chapter illustrates an example for the bioinformatic identification of conserved Arabidopsis thaliana cis-sequences enriched in drought stress-responsive genes. This workflow can be applied for the identification of cis-sequences in any sets of coregulated genes. The workflow includes detailed protocols to determine sets of coregulated genes, to extract the corresponding promoter sequences, and how to install and run a software package to identify overrepresented motifs. Further bioinformatic analyses that can be performed with the results are discussed. PMID:27557771

  4. A services oriented system for bioinformatics applications on the grid.

    PubMed

    Aloisio, Giovanni; Cafaro, Massimo; Epicoco, Italo; Fiore, Sandro; Mirto, Maria

    2007-01-01

    This paper describes the evolution of the main services of the ProGenGrid (Proteomics & Genomics Grid) system, a distributed and ubiquitous grid environment ("virtual laboratory"), based on Workflow and supporting the design, execution and monitoring of "in silico" experiments in bioinformatics.ProGenGrid is a Grid-based Problem Solving Environment that allows the composition of data sources and bioinformatics programs wrapped as Web Services (WS). The use of WS provides ease of use and fosters re-use. The resulting workflow of WS is then scheduled on the Grid, leveraging Grid-middleware services. In particular, ProGenGrid offers a modular bag of services and currently is focused on the biological simulation of two important bioinformatics problems: prediction of the secondary structure of proteins, and sequence alignment of proteins. Both services are based on an enhanced data access service.

  5. Proteomics, genomics and the future of medical education.

    PubMed

    Pike, Linda J; Sadler, J Evan

    2004-01-01

    The completion of the human genome project in 2003 ushered in the era of genomics, the systematic study of our DNA sequence. Proteomics, the study of the full complement of proteins present in a cell, is a natural extension of genomics. Together, the information obtainable through genomics and proteomics has tremendous potential to change clinical practice. The application of such information to medical diagnosis and treatment will require significant changes in the training of physicians. All students and physicians in training will need to acquire enough knowledge of the underlying science, including medical genetics, epidemiology, bioinformatics and statistics, so they will intuitively understand the technology and recognize the strengths and limitations of genomic/proteomic tests. Because genomic or proteomic testing may yield extensive information about a person's genetic makeup and disease risks, consideration will need to be given throughout the medical curriculum to the ethical issues raised by the application of this new technology to the diagnosis and treatment of patients.

  6. GIW and InCoB, two premier bioinformatics conferences in Asia with a combined 40 years of history.

    PubMed

    Schönbach, Christian; Horton, Paul; Yiu, Siu-Ming; Tan, Tin Wee; Ranganathan, Shoba

    2015-01-01

    Knowledge discovery in bioinformatics thrives on joint and inclusive efforts of stakeholders. Similarly, knowledge dissemination is expected to be more effective and scalable through joint efforts. Therefore, the International Conference on Bioinformatics (InCoB) and the International Conference on Genome Informatics (GIW) were organized as a joint conference for the first time in 13 years of coexistence. The Asia-Pacific Bioinformatics Network (APBioNet) and the Japanese Society for Bioinformatics (JSBi) collaborated to host GIW/InCoB2015 in Tokyo, September 9-11, 2015. The joint endeavour yielded 51 research articles published in seven journals, 78 poster and 89 oral presentations, showcasing bioinformatics research in the Asia-Pacific region. Encouraged by the results and reduced organizational overheads, APBioNet will collaborate with other bioinformatics societies in organizing co-located bioinformatics research and training meetings in the future. InCoB2016 will be hosted in Singapore, September 21-23, 2016.

  7. Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project.

    PubMed

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Diaz, George A; Zinberg, Randi E; Ferryman, Kadija; Wasserstein, Melissa; Kasarskis, Andrew; Schadt, Eric E

    2016-01-01

    Whole exome/genome sequencing (WES/WGS) is increasingly offered to ostensibly healthy individuals. Understanding the motivations and concerns of research participants seeking out personal WGS and their preferences regarding return-of-results and data sharing will help optimize protocols for WES/WGS. Baseline interviews including both qualitative and quantitative components were conducted with research participants (n=35) in the HealthSeq project, a longitudinal cohort study of individuals receiving personal WGS results. Data sharing preferences were recorded during informed consent. In the qualitative interview component, the dominant motivations that emerged were obtaining personal disease risk information, satisfying curiosity, contributing to research, self-exploration and interest in ancestry, and the dominant concern was the potential psychological impact of the results. In the quantitative component, 57% endorsed concerns about privacy. Most wanted to receive all personal WGS results (94%) and their raw data (89%); a third (37%) consented to having their data shared to the Database of Genotypes and Phenotypes (dbGaP). Early adopters of personal WGS in the HealthSeq project express a variety of health- and non-health-related motivations. Almost all want all available findings, while also expressing concerns about the psychological impact and privacy of their results.

  8. Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project.

    PubMed

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Diaz, George A; Zinberg, Randi E; Ferryman, Kadija; Wasserstein, Melissa; Kasarskis, Andrew; Schadt, Eric E

    2016-01-01

    Whole exome/genome sequencing (WES/WGS) is increasingly offered to ostensibly healthy individuals. Understanding the motivations and concerns of research participants seeking out personal WGS and their preferences regarding return-of-results and data sharing will help optimize protocols for WES/WGS. Baseline interviews including both qualitative and quantitative components were conducted with research participants (n=35) in the HealthSeq project, a longitudinal cohort study of individuals receiving personal WGS results. Data sharing preferences were recorded during informed consent. In the qualitative interview component, the dominant motivations that emerged were obtaining personal disease risk information, satisfying curiosity, contributing to research, self-exploration and interest in ancestry, and the dominant concern was the potential psychological impact of the results. In the quantitative component, 57% endorsed concerns about privacy. Most wanted to receive all personal WGS results (94%) and their raw data (89%); a third (37%) consented to having their data shared to the Database of Genotypes and Phenotypes (dbGaP). Early adopters of personal WGS in the HealthSeq project express a variety of health- and non-health-related motivations. Almost all want all available findings, while also expressing concerns about the psychological impact and privacy of their results. PMID:26036856

  9. Motivations, concerns and preferences of personal genome sequencing research participants: Baseline findings from the HealthSeq project

    PubMed Central

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Diaz, George A; Zinberg, Randi E; Ferryman, Kadija; Wasserstein, Melissa; Kasarskis, Andrew; Schadt, Eric E

    2016-01-01

    Whole exome/genome sequencing (WES/WGS) is increasingly offered to ostensibly healthy individuals. Understanding the motivations and concerns of research participants seeking out personal WGS and their preferences regarding return-of-results and data sharing will help optimize protocols for WES/WGS. Baseline interviews including both qualitative and quantitative components were conducted with research participants (n=35) in the HealthSeq project, a longitudinal cohort study of individuals receiving personal WGS results. Data sharing preferences were recorded during informed consent. In the qualitative interview component, the dominant motivations that emerged were obtaining personal disease risk information, satisfying curiosity, contributing to research, self-exploration and interest in ancestry, and the dominant concern was the potential psychological impact of the results. In the quantitative component, 57% endorsed concerns about privacy. Most wanted to receive all personal WGS results (94%) and their raw data (89%); a third (37%) consented to having their data shared to the Database of Genotypes and Phenotypes (dbGaP). Early adopters of personal WGS in the HealthSeq project express a variety of health- and non-health-related motivations. Almost all want all available findings, while also expressing concerns about the psychological impact and privacy of their results. PMID:26036856

  10. Refining genome-wide linkage intervals using a meta-analysis of genome-wide association studies identifies loci influencing personality dimensions.

    PubMed

    Amin, Najaf; Hottenga, Jouke-Jan; Hansell, Narelle K; Janssens, A Cecile J W; de Moor, Marleen H M; Madden, Pamela A F; Zorkoltseva, Irina V; Penninx, Brenda W; Terracciano, Antonio; Uda, Manuela; Tanaka, Toshiko; Esko, Tonu; Realo, Anu; Ferrucci, Luigi; Luciano, Michelle; Davies, Gail; Metspalu, Andres; Abecasis, Goncalo R; Deary, Ian J; Raikkonen, Katri; Bierut, Laura J; Costa, Paul T; Saviouk, Viatcheslav; Zhu, Gu; Kirichenko, Anatoly V; Isaacs, Aaron; Aulchenko, Yurii S; Willemsen, Gonneke; Heath, Andrew C; Pergadia, Michele L; Medland, Sarah E; Axenovich, Tatiana I; de Geus, Eco; Montgomery, Grant W; Wright, Margaret J; Oostra, Ben A; Martin, Nicholas G; Boomsma, Dorret I; van Duijn, Cornelia M

    2013-08-01

    Personality traits are complex phenotypes related to psychosomatic health. Individually, various gene finding methods have not achieved much success in finding genetic variants associated with personality traits. We performed a meta-analysis of four genome-wide linkage scans (N=6149 subjects) of five basic personality traits assessed with the NEO Five-Factor Inventory. We compared the significant regions from the meta-analysis of linkage scans with the results of a meta-analysis of genome-wide association studies (GWAS) (N∼17 000). We found significant evidence of linkage of neuroticism to chromosome 3p14 (rs1490265, LOD=4.67) and to chromosome 19q13 (rs628604, LOD=3.55); of extraversion to 14q32 (ATGG002, LOD=3.3); and of agreeableness to 3p25 (rs709160, LOD=3.67) and to two adjacent regions on chromosome 15, including 15q13 (rs970408, LOD=4.07) and 15q14 (rs1055356, LOD=3.52) in the individual scans. In the meta-analysis, we found strong evidence of linkage of extraversion to 4q34, 9q34, 10q24 and 11q22, openness to 2p25, 3q26, 9p21, 11q24, 15q26 and 19q13 and agreeableness to 4q34 and 19p13. Significant evidence of association in the GWAS was detected between openness and rs677035 at 11q24 (P-value=2.6 × 10(-06), KCNJ1). The findings of our linkage meta-analysis and those of the GWAS suggest that 11q24 is a susceptible locus for openness, with KCNJ1 as the possible candidate gene. PMID:23211697

  11. Group-based and personalized care in an age of genomic and evidence-based medicine: a reappraisal.

    PubMed

    Maglo, Koffi N

    2012-01-01

    This article addresses the philosophical and moral foundations of group-based and individualized therapy in connection with population care equality. The U.S. Food and Drug Administration (FDA) recently modified its public health policy by seeking to enhance the efficacy and equality of care through the approval of group-specific prescriptions and doses for some drugs. In the age of genomics, when individualization of care increasingly has become a major concern, investigating the relationship between population health, stratified medicine, and personalized therapy can improve our understanding of the ethical and biomedical implications of genomic medicine. I suggest that the need to optimize population health through population substructure-sensitive research and the need to individualize care through genetically targeted therapies are not necessarily incompatible. Accordingly, the article reconceptualizes a unified goal for modern scientific medicine in terms of individualized equal care.

  12. Group-based and personalized care in an age of genomic and evidence-based medicine: a reappraisal.

    PubMed

    Maglo, Koffi N

    2012-01-01

    This article addresses the philosophical and moral foundations of group-based and individualized therapy in connection with population care equality. The U.S. Food and Drug Administration (FDA) recently modified its public health policy by seeking to enhance the efficacy and equality of care through the approval of group-specific prescriptions and doses for some drugs. In the age of genomics, when individualization of care increasingly has become a major concern, investigating the relationship between population health, stratified medicine, and personalized therapy can improve our understanding of the ethical and biomedical implications of genomic medicine. I suggest that the need to optimize population health through population substructure-sensitive research and the need to individualize care through genetically targeted therapies are not necessarily incompatible. Accordingly, the article reconceptualizes a unified goal for modern scientific medicine in terms of individualized equal care. PMID:22643722

  13. Bioinformatics Education—Perspectives and Challenges out of Africa

    PubMed Central

    Adebiyi, Ezekiel F.; Alzohairy, Ahmed M.; Everett, Dean; Ghedira, Kais; Ghouila, Amel; Kumuthini, Judit; Mulder, Nicola J.; Panji, Sumir; Patterton, Hugh-G.

    2015-01-01

    The discipline of bioinformatics has developed rapidly since the complete sequencing of the first genomes in the 1990s. The development of many high-throughput techniques during the last decades has ensured that bioinformatics has grown into a discipline that overlaps with, and is required for, the modern practice of virtually every field in the life sciences. This has placed a scientific premium on the availability of skilled bioinformaticians, a qualification that is extremely scarce on the African continent. The reasons for this are numerous, although the absence of a skilled bioinformatician at academic institutions to initiate a training process and build sustained capacity seems to be a common African shortcoming. This dearth of bioinformatics expertise has had a knock-on effect on the establishment of many modern high-throughput projects at African institutes, including the comprehensive and systematic analysis of genomes from African populations, which are among the most genetically diverse anywhere on the planet. Recent funding initiatives from the National Institutes of Health and the Wellcome Trust are aimed at ameliorating this shortcoming. In this paper, we discuss the problems that have limited the establishment of the bioinformatics field in Africa, as well as propose specific actions that will help with the education and training of bioinformaticians on the continent. This is an absolute requirement in anticipation of a boom in high-throughput approaches to human health issues unique to data from African populations. PMID:24990350

  14. ExPASy: SIB bioinformatics resource portal.

    PubMed

    Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz

    2012-07-01

    ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.

  15. The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database.

    PubMed

    Choi, Jeongseok; Kim, Jaekwon; Lee, Dong Kyun; Jang, Kwang Soo; Kim, Dai-Jin; Choi, In Young

    2016-03-01

    Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA. PMID:27103887

  16. The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database.

    PubMed

    Choi, Jeongseok; Kim, Jaekwon; Lee, Dong Kyun; Jang, Kwang Soo; Kim, Dai-Jin; Choi, In Young

    2016-03-01

    Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA.

  17. The OAuth 2.0 Web Authorization Protocol for the Internet Addiction Bioinformatics (IABio) Database

    PubMed Central

    Choi, Jeongseok; Kim, Jaekwon; Lee, Dong Kyun; Jang, Kwang Soo; Kim, Dai-Jin

    2016-01-01

    Internet addiction (IA) has become a widespread and problematic phenomenon as smart devices pervade society. Moreover, internet gaming disorder leads to increases in social expenditures for both individuals and nations alike. Although the prevention and treatment of IA are getting more important, the diagnosis of IA remains problematic. Understanding the neurobiological mechanism of behavioral addictions is essential for the development of specific and effective treatments. Although there are many databases related to other addictions, a database for IA has not been developed yet. In addition, bioinformatics databases, especially genetic databases, require a high level of security and should be designed based on medical information standards. In this respect, our study proposes the OAuth standard protocol for database access authorization. The proposed IA Bioinformatics (IABio) database system is based on internet user authentication, which is a guideline for medical information standards, and uses OAuth 2.0 for access control technology. This study designed and developed the system requirements and configuration. The OAuth 2.0 protocol is expected to establish the security of personal medical information and be applied to genomic research on IA. PMID:27103887

  18. A critical appraisal of the scientific basis of commercial genomic profiles used to assess health risks and personalize health interventions.

    PubMed

    Janssens, A Cecile J W; Gwinn, Marta; Bradley, Linda A; Oostra, Ben A; van Duijn, Cornelia M; Khoury, Muin J

    2008-03-01

    Predictive genomic profiling used to produce personalized nutrition and other lifestyle health recommendations is currently offered directly to consumers. By examining previous meta-analyses and HuGE reviews, we assessed the scientific evidence supporting the purported gene-disease associations for genes included in genomic profiles offered online. We identified seven companies that offer predictive genomic profiling. We searched PubMed for meta-analyses and HuGE reviews of studies of gene-disease associations published from 2000 through June 2007 in which the genotypes of people with a disease were compared with those of a healthy or general-population control group. The seven companies tested at least 69 different polymorphisms in 56 genes. Of the 56 genes tested, 24 (43%) were not reviewed in meta-analyses. For the remaining 32 genes, we found 260 meta-analyses that examined 160 unique polymorphism-disease associations, of which only 60 (38%) were found to be statistically significant. Even the 60 significant associations, which involved 29 different polymorphisms and 28 different diseases, were generally modest, with synthetic odds ratios ranging from 0.54 to 0.88 for protective variants and from 1.04 to 3.2 for risk variants. Furthermore, genes in cardiogenomic profiles were more frequently associated with noncardiovascular diseases than with cardiovascular diseases, and though two of the five genes of the osteogenomic profiles did show significant associations with disease, the associations were not with bone diseases. There is insufficient scientific evidence to conclude that genomic profiles are useful in measuring genetic risk for common diseases or in developing personalized diet and lifestyle recommendations for disease prevention.

  19. Extending information retrieval methods to personalized genomic-based studies of disease.

    PubMed

    Ye, Shuyun; Dawson, John A; Kendziorski, Christina

    2014-01-01

    Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference. PMID:25733795

  20. Extending information retrieval methods to personalized genomic-based studies of disease.

    PubMed

    Ye, Shuyun; Dawson, John A; Kendziorski, Christina

    2014-01-01

    Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.

  1. Bioinformatics by Example: From Sequence to Target

    NASA Astrophysics Data System (ADS)

    Kossida, Sophia; Tahri, Nadia; Daizadeh, Iraj

    2002-12-01

    With the completion of the human genome, and the imminent completion of other large-scale sequencing and structure-determination projects, computer-assisted bioscience is aimed to become the new paradigm for conducting basic and applied research. The presence of these additional bioinformatics tools stirs great anxiety for experimental researchers (as well as for pedagogues), since they are now faced with a wider and deeper knowledge of differing disciplines (biology, chemistry, physics, mathematics, and computer science). This review targets those individuals who are interested in using computational methods in their teaching or research. By analyzing a real-life, pharmaceutical, multicomponent, target-based example the reader will experience this fascinating new discipline.

  2. Rapid Bioinformatic Identification of Thermostabilizing Mutations

    PubMed Central

    Sauer, David B.; Karpowich, Nathan K.; Song, Jin Mei; Wang, Da-Neng

    2015-01-01

    Ex vivo stability is a valuable protein characteristic but is laborious to improve experimentally. In addition to biopharmaceutical and industrial applications, stable protein is important for biochemical and structural studies. Taking advantage of the large number of available genomic sequences and growth temperature data, we present two bioinformatic methods to identify a limited set of amino acids or positions that likely underlie thermostability. Because these methods allow thousands of homologs to be examined in silico, they have the advantage of providing both speed and statistical power. Using these methods, we introduced, via mutation, amino acids from thermoadapted homologs into an exemplar mesophilic membrane protein, and demonstrated significantly increased thermostability while preserving protein activity. PMID:26445442

  3. The European Bioinformatics Institute's data resources.

    PubMed

    Brooksbank, Catherine; Camon, Evelyn; Harris, Midori A; Magrane, Michele; Martin, Maria Jesus; Mulder, Nicola; O'Donovan, Claire; Parkinson, Helen; Tuli, Mary Ann; Apweiler, Rolf; Birney, Ewan; Brazma, Alvis; Henrick, Kim; Lopez, Rodrigo; Stoesser, Guenter; Stoehr, Peter; Cameron, Graham

    2003-01-01

    As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk. PMID:12519944

  4. The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community

    PubMed Central

    D'Elia, Domenica; Gisel, Andreas; Eriksson, Nils-Einar; Kossida, Sophia; Mattila, Kimmo; Klucar, Lubos; Bongcam-Rudloff, Erik

    2009-01-01

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in. PMID:19534734

  5. Evolving Strategies for the Incorporation of Bioinformatics Within the Undergraduate Cell Biology Curriculum

    PubMed Central

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in three courses, beginning with an introductory course in cell biology. The exercises and projects that were used to help students develop literacy in bioinformatics are described. In a recently offered course in bioinformatics, students developed their own simple sequence analysis tool using the Perl programming language. These experiences are described from the point of view of the instructor as well as the students. A preliminary assessment has been made of the degree to which students had developed a working knowledge of bioinformatics concepts and methods. Finally, some conclusions have been drawn from these courses that may be helpful to instructors wishing to introduce bioinformatics within the undergraduate biology curriculum. PMID:14673489

  6. Genomic and molecular aberrations in malignant peripheral nerve sheath tumor and their roles in personalized target therapy.

    PubMed

    Yang, Jilong; Du, Xiaoling

    2013-09-01

    Malignant peripheral nerve sheath tumors (MPNSTs) are malignant tumors with a high rate of local recurrence and a significant tendency to metastasize. Its dismal outcome points to the urgent need to establish better therapeutic strategies for patients harboring MPNSTs. The investigations of genomic and molecular aberrations in MPNSTs which detect many chromosomal aberrations, pathway abnormalities, and specific molecular aberrant events would supply multiple potential therapy targets and contribute to achievement of personalized medicine. The involved genes in the significant gains aberrations include BIRC5, CCNE2, DAB2, DDX15, EGFR, DAB2, MSH2, CDK6, HGF, ITGB4, KCNK12, LAMA3, LOXL2, MET, and PDGFRA. The involved genes in the significant deletion aberrations include CDH1, GLTSCR2, EGR1, CTSB, GATA3, SULT2A1, GLTSCR2, HMMR/RHAMM, LICAM2, MMP13, p16/INK4a, RASSF2, NM-23H1, and TP53. These genetic aberrations involve in several important signaling pathways such as TFF, EGFR, ARF, IGF1R signaling pathways. The genomic and molecular aberrations of EGFR, IGF1R, SOX9, EYA4, TOP2A, ETV4, and BIRC5 exhibit great promise as personalized therapeutic targets for MPNST patients. PMID:23830351

  7. A new approach to assessing affect and the emotional implications of personal genomic testing for common disease risk

    PubMed Central

    O'Neill, Suzanne C.; Tercyak, Kenneth P.; Baytop, Chanza; Alford, Sharon Hensley; McBride, Colleen M.

    2015-01-01

    Aims Personal genomic testing (PGT) for common disease risk is becoming increasingly frequent, but little is known about people's array of emotional reactions to learning their genomic risk profiles and the psychological harms/benefits of PGT. We conducted a study of post-PGT affect, including positive, neutral, and negative states that may arise after testing. Methods Two hundred twenty-eight healthy adults received PGT for common disease variants and completed a semi-structured research interview within two weeks of disclosure. Study participants reported how PGT results made them feel in their own words. Using an iterative coding process, responses were organized into three broad affective categories (Negative, Neutral, and Positive affect). Results Neutral affect was the most prevalent response (53.9%), followed by Positive affect (26.9%) and Negative affect (19.2%). We found no differences by gender, race or education. Conclusions While <20% of participants reported negative affect in response to learning their genomic risk profile for common disease, a majority experience either neutral or positive emotions. These findings contribute to the growing evidence that PGT does not impose significant psychological harms. Moreover, they point to a need to better link theories and assessments in both emotional and cognitive processing to capitalize on PGT information for healthy behavior change. PMID:25612474

  8. [A review on the bioinformatics pipelines for metagenomic research].

    PubMed

    Ye, Dan-Dan; Fan, Meng-Meng; Guan, Qiong; Chen, Hong-Ju; Ma, Zhan-Shan

    2012-12-01

    Metagenome, a term first dubbed by Handelsman in 1998 as "the genomes of the total microbiota found in nature", refers to sequence data directly sampled from the environment (which may be any habitat in which microbes live, such as the guts of humans and animals, milk, soil, lakes, glaciers, and oceans). Metagenomic technologies originated from environmental microbiology studies and their wide application has been greatly facilitated by next-generation high throughput sequencing technologies. Like genomics studies, the bottle neck of metagenomic research is how to effectively and efficiently analyze the gigantic amount of metagenomic sequence data using the bioinformatics pipelines to obtain meaningful biological insights. In this article, we briefly review the state-of-the-art bioinformatics software tools in metagenomic research. Due to the differences between the metagenomic data obtained from whole genome sequencing (i.e., shotgun metagenomics) and amplicon sequencing (i.e., 16S-rRNA and gene-targeted metagenomics) methods, there are significant differences between the corresponding bioinformatics tools for these data; accordingly, we review the computational pipelines separately for these two types of data. PMID:23266976

  9. Computational intelligence techniques in bioinformatics.

    PubMed

    Hassanien, Aboul Ella; Al-Shammari, Eiman Tamah; Ghali, Neveen I

    2013-12-01

    Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included. PMID:23891719

  10. Visualising "Junk" DNA through Bioinformatics

    ERIC Educational Resources Information Center

    Elwess, Nancy L.; Latourelle, Sandra M.; Cauthorn, Olivia

    2005-01-01

    One of the hottest areas of science today is the field in which biology, information technology,and computer science are merged into a single discipline called bioinformatics. This field enables the discovery and analysis of biological data, including nucleotide and amino acid sequences that are easily accessed through the use of computers. As…

  11. Reproducible Bioinformatics Research for Biologists

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This book chapter describes the current Big Data problem in Bioinformatics and the resulting issues with performing reproducible computational research. The core of the chapter provides guidelines and summaries of current tools/techniques that a noncomputational researcher would need to learn to pe...

  12. Bioinformatics and the Undergraduate Curriculum

    ERIC Educational Resources Information Center

    Maloney, Mark; Parker, Jeffrey; LeBlanc, Mark; Woodard, Craig T.; Glackin, Mary; Hanrahan, Michael

    2010-01-01

    Recent advances involving high-throughput techniques for data generation and analysis have made familiarity with basic bioinformatics concepts and programs a necessity in the biological sciences. Undergraduate students increasingly need training in methods related to finding and retrieving information stored in vast databases. The rapid rise of…

  13. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa

    PubMed Central

    Mulder, Nicola J.; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M.; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C. Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-01-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  14. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-02-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet. PMID:26627985

  15. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa.

    PubMed

    Mulder, Nicola J; Adebiyi, Ezekiel; Alami, Raouf; Benkahla, Alia; Brandful, James; Doumbia, Seydou; Everett, Dean; Fadlelmola, Faisal M; Gaboun, Fatima; Gaseitsiwe, Simani; Ghazal, Hassan; Hazelhurst, Scott; Hide, Winston; Ibrahimi, Azeddine; Jaufeerally Fakim, Yasmina; Jongeneel, C Victor; Joubert, Fourie; Kassim, Samar; Kayondo, Jonathan; Kumuthini, Judit; Lyantagaye, Sylvester; Makani, Julie; Mansour Alzohairy, Ahmed; Masiga, Daniel; Moussa, Ahmed; Nash, Oyekanmi; Ouwe Missi Oukem-Boyer, Odile; Owusu-Dabo, Ellis; Panji, Sumir; Patterton, Hugh; Radouani, Fouzia; Sadki, Khalid; Seghrouchni, Fouad; Tastan Bishop, Özlem; Tiffin, Nicki; Ulenga, Nzovu

    2016-02-01

    The application of genomics technologies to medicine and biomedical research is increasing in popularity, made possible by new high-throughput genotyping and sequencing technologies and improved data analysis capabilities. Some of the greatest genetic diversity among humans, animals, plants, and microbiota occurs in Africa, yet genomic research outputs from the continent are limited. The Human Heredity and Health in Africa (H3Africa) initiative was established to drive the development of genomic research for human health in Africa, and through recognition of the critical role of bioinformatics in this process, spurred the establishment of H3ABioNet, a pan-African bioinformatics network for H3Africa. The limitations in bioinformatics capacity on the continent have been a major contributory factor to the lack of notable outputs in high-throughput biology research. Although pockets of high-quality bioinformatics teams have existed previously, the majority of research institutions lack experienced faculty who can train and supervise bioinformatics students. H3ABioNet aims to address this dire need, specifically in the area of human genetics and genomics, but knock-on effects are ensuring this extends to other areas of bioinformatics. Here, we describe the emergence of genomics research and the development of bioinformatics in Africa through H3ABioNet.

  16. Genomic profiling of murine mammary tumors identifies potential personalized drug targets for p53-deficient mammary cancers

    PubMed Central

    Agrawal, Yash N.; Koboldt, Daniel C.; Kanchi, Krishna L.; Herschkowitz, Jason I.; Mardis, Elaine R.; Rosen, Jeffrey M.; Perou, Charles M.

    2016-01-01

    ABSTRACT Targeted therapies against basal-like breast tumors, which are typically ‘triple-negative breast cancers (TNBCs)’, remain an important unmet clinical need. Somatic TP53 mutations are the most common genetic event in basal-like breast tumors and TNBC. To identify additional drivers and possible drug targets of this subtype, a comparative study between human and murine tumors was performed by utilizing a murine Trp53-null mammary transplant tumor model. We show that two subsets of murine Trp53-null mammary transplant tumors resemble aspects of the human basal-like subtype. DNA-microarray, whole-genome and exome-based sequencing approaches were used to interrogate the secondary genetic aberrations of these tumors, which were then compared to human basal-like tumors to identify conserved somatic genetic features. DNA copy-number variation produced the largest number of conserved candidate personalized drug targets. These candidates were filtered using a DNA-RNA Pearson correlation cut-off and a requirement that the gene was deemed essential in at least 5% of human breast cancer cell lines from an RNA-mediated interference screen database. Five potential personalized drug target genes, which were spontaneously amplified loci in both murine and human basal-like tumors, were identified: Cul4a, Lamp1, Met, Pnpla6 and Tubgcp3. As a proof of concept, inhibition of Met using crizotinib caused Met-amplified murine tumors to initially undergo complete regression. This study identifies Met as a promising drug target in a subset of murine Trp53-null tumors, thus identifying a potential shared driver with a subset of human basal-like breast cancers. Our results also highlight the importance of comparative genomic studies for discovering personalized drug targets and for providing a preclinical model for further investigations of key tumor signaling pathways. PMID:27149990

  17. Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

    PubMed

    Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

    2011-06-01

    Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth.

  18. No-boundary thinking in bioinformatics research.

    PubMed

    Huang, Xiuzhen; Bruce, Barry; Buchan, Alison; Congdon, Clare Bates; Cramer, Carole L; Jennings, Steven F; Jiang, Hongmei; Li, Zenglu; McClure, Gail; McMullen, Rick; Moore, Jason H; Nanduri, Bindu; Peckham, Joan; Perkins, Andy; Polson, Shawn W; Rekepalli, Bhanu; Salem, Saeed; Specker, Jennifer; Wunsch, Donald; Xiong, Donghai; Zhang, Shuzhong; Zhao, Zhongming

    2013-11-06

    Currently there are definitions from many agencies and research societies defining "bioinformatics" as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT).

  19. Genomic and metabolomic advances in the identification of disease and adverse event biomarkers.

    PubMed

    Mendrick, Donna L; Schnackenberg, Laura

    2009-10-01

    Incomplete knowledge of tissue pathogenesis is hampering the identification of biomarkers for the appropriate therapeutic targets to prevent or inhibit disease processes, and the prediction and diagnosis of injury due to disease and adverse events of drug therapy. The revolution in genomics and metabolomics, combined with advanced bioinformatics and computational methods for mining such large, complex data sets, are beginning to provide critical insights into tissue injury. Such results will move us closer to the promise of personalized medicine.

  20. Broad issues to consider for library involvement in bioinformatics*

    PubMed Central

    Geer, Renata C.

    2006-01-01

    Background: The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. The traditional role of libraries to collect, organize, and provide access to information can expand naturally to encompass these new data domains. Methods: This paper discusses the current and potential role of libraries in bioinformatics using empirical evidence and experience from eleven years of work in user services at the National Center for Biotechnology Information. Findings: Medical and science libraries over the last decade have begun to establish educational and support programs to address the challenges users face in the effective and efficient use of a plethora of molecular biology databases and retrieval and analysis tools. As more libraries begin to establish a role in this area, the issues they face include assessment of user needs and skills, identification of existing services, development of plans for new services, recruitment and training of specialized staff, and establishment of collaborations with bioinformatics centers at their institutions. Conclusions: Increasing library involvement in bioinformatics can help address information needs of a broad range of students, researchers, and clinicians and ultimately help realize the power of bioinformatics resources in making new biological discoveries. PMID:16888662

  1. Robust enzyme design: bioinformatic tools for improved protein stability.

    PubMed

    Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas

    2015-03-01

    The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.

  2. A Bioinformatics Facility for NASA

    NASA Technical Reports Server (NTRS)

    Schweighofer, Karl; Pohorille, Andrew

    2006-01-01

    Building on an existing prototype, we have fielded a facility with bioinformatics technologies that will help NASA meet its unique requirements for biological research. This facility consists of a cluster of computers capable of performing computationally intensive tasks, software tools, databases and knowledge management systems. Novel computational technologies for analyzing and integrating new biological data and already existing knowledge have been developed. With continued development and support, the facility will fulfill strategic NASA s bioinformatics needs in astrobiology and space exploration. . As a demonstration of these capabilities, we will present a detailed analysis of how spaceflight factors impact gene expression in the liver and kidney for mice flown aboard shuttle flight STS-108. We have found that many genes involved in signal transduction, cell cycle, and development respond to changes in microgravity, but that most metabolic pathways appear unchanged.

  3. USDA Stakeholder Workshop on Animal Bioinformatics: Summary and Recommendations.

    PubMed

    Hamernik, Debora L; Adelson, David L

    2003-01-01

    An electronic workshop was conducted on 4 November-13 December 2002 to discuss current issues and needs in animal bioinformatics. The electronic (e-mail listserver) format was chosen to provide a relatively speedy process that is broad in scope, cost-efficient and easily accessible to all participants. Approximately 40 panelists with diverse species and discipline expertise communicated through the panel e-mail listserver. The panel included scientists from academia, industry and government, in the USA, Australia and the UK. A second 'stakeholder' e-mail listserver was used to obtain input from a broad audience with general interests in animal genomics. The objectives of the electronic workshop were: (a) to define priorities for animal genome database development; and (b) to recommend ways in which the USDA could provide leadership in the area of animal genome database development. E-mail messages from panelists and stakeholders are archived at http://genome.cvm.umn.edu/bioinfo/. Priorities defined for animal genome database development included: (a) data repository; (b) tools for genome analysis; (c) annotation; (d) practical application of genomic data; and (e) a biological framework for DNA sequence. A stable source of funding, such as the USDA Agricultural Research Service (ARS), was recommended to support maintenance of data repositories and data curation. Continued support for competitive grants programs within the USDA Cooperative State Research, Education and Extension Service (CSREES) was recommended for tool development and hypothesis-driven research projects in genome analysis. Additional stakeholder input will be required to continuously refine priorities and maximize the use of limited resources for animal bioinformatics within the USDA. PMID:18629125

  4. Protein bioinformatics applied to virology.

    PubMed

    Mohabatkar, Hassan; Keyhanfar, Mehrnaz; Behbahani, Mandana

    2012-09-01

    Scientists have united in a common search to sequence, store and analyze genes and proteins. In this regard, rapidly evolving bioinformatics methods are providing valuable information on these newly-discovered molecules. Understanding what has been done and what we can do in silico is essential in designing new experiments. The unbalanced situation between sequence-known proteins and attribute-known proteins, has called for developing computational methods or high-throughput automated tools for fast and reliably predicting or identifying various characteristics of uncharacterized proteins. Taking into consideration the role of viruses in causing diseases and their use in biotechnology, the present review describes the application of protein bioinformatics in virology. Therefore, a number of important features of viral proteins like epitope prediction, protein docking, subcellular localization, viral protease cleavage sites and computer based comparison of their aspects have been discussed. This paper also describes several tools, principally developed for viral bioinformatics. Prediction of viral protein features and learning the advances in this field can help basic understanding of the relationship between a virus and its host.

  5. Balancing Benefits and Risks of Immortal Data: Participants' Views of Open Consent in the Personal Genome Project.

    PubMed

    Zarate, Oscar A; Brody, Julia Green; Brown, Phil; Ramirez-Andreotta, Mónica D; Perovich, Laura; Matz, Jacob

    2016-01-01

    An individual's health, genetic, or environmental-exposure data, placed in an online repository, creates a valuable shared resource that can accelerate biomedical research and even open opportunities for crowd-sourcing discoveries by members of the public. But these data become "immortalized" in ways that may create lasting risk as well as benefit. Once shared on the Internet, the data are difficult or impossible to redact, and identities may be revealed by a process called data linkage, in which online data sets are matched to each other. Reidentification (re-ID), the process of associating an individual's name with data that were considered deidentified, poses risks such as insurance or employment discrimination, social stigma, and breach of the promises often made in informed-consent documents. At the same time, re-ID poses risks to researchers and indeed to the future of science, should re-ID end up undermining the trust and participation of potential research participants. The ethical challenges of online data sharing are heightened as so-called big data becomes an increasingly important research tool and driver of new research structures. Big data is shifting research to include large numbers of researchers and institutions as well as large numbers of participants providing diverse types of data, so the participants' consent relationship is no longer with a person or even a research institution. In addition, consent is further transformed because big data analysis often begins with descriptive inquiry and generation of a hypothesis, and the research questions cannot be clearly defined at the outset and may be unforeseeable over the long term. In this article, we consider how expanded data sharing poses new challenges, illustrated by genomics and the transition to new models of consent. We draw on the experiences of participants in an open data platform-the Personal Genome Project-to allow study participants to contribute their voices to inform ethical consent

  6. Balancing Benefits and Risks of Immortal Data: Participants' Views of Open Consent in the Personal Genome Project.

    PubMed

    Zarate, Oscar A; Brody, Julia Green; Brown, Phil; Ramirez-Andreotta, Mónica D; Perovich, Laura; Matz, Jacob

    2016-01-01

    An individual's health, genetic, or environmental-exposure data, placed in an online repository, creates a valuable shared resource that can accelerate biomedical research and even open opportunities for crowd-sourcing discoveries by members of the public. But these data become "immortalized" in ways that may create lasting risk as well as benefit. Once shared on the Internet, the data are difficult or impossible to redact, and identities may be revealed by a process called data linkage, in which online data sets are matched to each other. Reidentification (re-ID), the process of associating an individual's name with data that were considered deidentified, poses risks such as insurance or employment discrimination, social stigma, and breach of the promises often made in informed-consent documents. At the same time, re-ID poses risks to researchers and indeed to the future of science, should re-ID end up undermining the trust and participation of potential research participants. The ethical challenges of online data sharing are heightened as so-called big data becomes an increasingly important research tool and driver of new research structures. Big data is shifting research to include large numbers of researchers and institutions as well as large numbers of participants providing diverse types of data, so the participants' consent relationship is no longer with a person or even a research institution. In addition, consent is further transformed because big data analysis often begins with descriptive inquiry and generation of a hypothesis, and the research questions cannot be clearly defined at the outset and may be unforeseeable over the long term. In this article, we consider how expanded data sharing poses new challenges, illustrated by genomics and the transition to new models of consent. We draw on the experiences of participants in an open data platform-the Personal Genome Project-to allow study participants to contribute their voices to inform ethical consent

  7. Genomic Analysis as the First Step toward Personalized Treatment in Renal Cell Carcinoma

    PubMed Central

    Bielecka, Zofia Felicja; Czarnecka, Anna Małgorzata; Szczylik, Cezary

    2014-01-01

    Drug resistance mechanisms in renal cell carcinoma (RCC) still remain elusive. Although most patients initially respond to targeted therapy, acquired resistance can still develop eventually. Most of the patients suffer from intrinsic (genetic) resistance as well, suggesting that there is substantial need to broaden our knowledge in the field of RCC genetics. As molecular abnormalities occur for various reasons, ranging from single nucleotide polymorphisms to large chromosomal defects, conducting whole-genome association studies using high-throughput techniques seems inevitable. In principle, data obtained via genome-wide research should be continued and performed on a large scale for the purposes of drug development and identification of biological pathways underlying cancerogenesis. Genetic alterations are mostly unique for each histological RCC subtype. According to recently published data, RCC is a highly heterogeneous tumor. In this paper, the authors discuss the following: (1) current state-of-the-art knowledge on the potential biomarkers of RCC subtypes; (2) significant obstacles encountered in the translational research on RCC; and (3) recent molecular findings that may have a crucial impact on future therapeutic approaches. PMID:25120953

  8. Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry

    PubMed Central

    Kessler, Michael D.; Yerges-Armstrong, Laura; Taub, Margaret A.; Shetty, Amol C.; Maloney, Kristin; Jeng, Linda Jo Bone; Ruczinski, Ingo; Levin, Albert M.; Williams, L. Keoki; Beaty, Terri H.; Mathias, Rasika A.; Barnes, Kathleen C.; Boorgula, Meher Preethi; Campbell, Monica; Chavan, Sameer; Ford, Jean G.; Foster, Cassandra; Gao, Li; Hansel, Nadia N.; Horowitz, Edward; Huang, Lili; Ortiz, Romina; Potee, Joseph; Rafaels, Nicholas; Scott, Alan F.; Vergara, Candelaria; Gao, Jingjing; Hu, Yijuan; Johnston, Henry Richard; Qin, Zhaohui S.; Padhukasahasram, Badri; Dunston, Georgia M.; Faruque, Mezbah U.; Kenny, Eimear E.; Gietzen, Kimberly; Hansen, Mark; Genuario, Rob; Bullis, Dave; Lawley, Cindy; Deshpande, Aniket; Grus, Wendy E.; Locke, Devin P.; Foreman, Marilyn G.; Avila, Pedro C.; Grammer, Leslie; Kim, Kwang-YounA; Kumar, Rajesh; Schleimer, Robert; Bustamante, Carlos; De La Vega, Francisco M.; Gignoux, Chris R.; Shringarpure, Suyash S.; Musharoff, Shaila; Wojcik, Genevieve; Burchard, Esteban G.; Eng, Celeste; Gourraud, Pierre-Antoine; Hernandez, Ryan D.; Lizee, Antoine; Pino-Yanes, Maria; Torgerson, Dara G.; Szpiech, Zachary A.; Torres, Raul; Nicolae, Dan L.; Ober, Carole; Olopade, Christopher O.; Olopade, Olufunmilayo; Oluwole, Oluwafemi; Arinola, Ganiyu; Song, Wei; Abecasis, Goncalo; Correa, Adolfo; Musani, Solomon; Wilson, James G.; Lange, Leslie A.; Akey, Joshua; Bamshad, Michael; Chong, Jessica; Fu, Wenqing; Nickerson, Deborah; Reiner, Alexander; Hartert, Tina; Ware, Lorraine B.; Bleecker, Eugene; Meyers, Deborah; Ortega, Victor E.; Pissamai, Maul R. N.; Trevor, Maul R. N.; Watson, Harold; Araujo, Maria Ilma; Oliveira, Ricardo Riccio; Caraballo, Luis; Marrugo, Javier; Martinez, Beatriz; Meza, Catherine; Ayestas, Gerardo; Herrera-Paz, Edwin Francisco; Landaverde-Torres, Pamela; Erazo, Said Omar Leiva; Martinez, Rosella; Mayorga, Alvaro; Mayorga, Luis F.; Mejia-Mejia, Delmy-Aracely; Ramos, Hector; Saenz, Allan; Varela, Gloria; Vasquez, Olga Marina; Ferguson, Trevor; Knight-Madden, Jennifer; Samms-Vaughan, Maureen; Wilks, Rainford J.; Adegnika, Akim; Ateba-Ngoa, Ulysse; Yazdanbakhsh, Maria; O'Connor, Timothy D.

    2016-01-01

    To characterize the extent and impact of ancestry-related biases in precision genomic medicine, we use 642 whole-genome sequences from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) project to evaluate typical filters and databases. We find significant correlations between estimated African ancestry proportions and the number of variants per individual in all variant classification sets but one. The source of these correlations is highlighted in more detail by looking at the interaction between filtering criteria and the ClinVar and Human Gene Mutation databases. ClinVar's correlation, representing African ancestry-related bias, has changed over time amidst monthly updates, with the most extreme switch happening between March and April of 2014 (r=0.733 to r=−0.683). We identify 68 SNPs as the major drivers of this change in correlation. As long as ancestry-related bias when using these clinical databases is minimally recognized, the genetics community will face challenges with implementation, interpretation and cost-effectiveness when treating minority populations. PMID:27725664

  9. Type 2 diabetes, genomics, and nursing: necessary next steps to advance the science into improved, personalized care.

    PubMed

    Underwood, Patricia C

    2011-01-01

    Type 2 diabetes mellitus (T2DM) is an inherited, chronic disorder with long-term complications; including cardiovascular disease the leading cause of mortality in the United States. The prevalence of T2DM and its complications are on the rise in the United States, highlighting the need for improved individualized prevention and treatment strategies. Exciting advancements in the field of genomics has led to the recent discovery of numerous genetic markers for T2DM; completing a promising first step toward improved, individualized prevention and treatment strategies for T2DM. These genomic markers, identified using genome-wide association studies (GWAS), candidate gene, and rare variant methodology, identify new physiologic pathways underlying the development of T2DM. Much more work is needed to successfully translate the identification of genetic markers for T2DM into improved, individualized prevention and treatment strategies. As front line providers and leaders of prevention and treatment strategies for chronic disease, nurses, nurse practitioners, and nurse scientists must contribute to this translational effort. Thus, it is important for nurses at all levels to (a) be aware of the current science of genetics and T2DM and (b) participate in the translation of this genetic information into improved, personalized patient care. The aim of this review is to (a) provide an overview of the current state of the science of genetic markers and T2DM and (b) highlight essential next steps to successfully translate the identification of genetic markers for T2DM into improved prevention and treatment strategies; focusing particularly on the role of nursing in this process.

  10. BioWarehouse: a bioinformatics database warehouse toolkit

    PubMed Central

    Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

    2006-01-01

    Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for

  11. Data Mining for Grammatical Inference with Bioinformatics Criteria

    NASA Astrophysics Data System (ADS)

    López, Vivian F.; Aguilar, Ramiro; Alonso, Luis; Moreno, María N.; Corchado, Juan M.

    In this paper we describe both theoretical and practical results of a novel data mining process that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics to generate grammatical structures of a specific language. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, such as Bioinformatic. The tool allows the complexity of the obtained grammar to be measured automatically from textual data. A technique of incremental discovery of sequential patterns is presented to obtain simplified production rules, and compacted with bioinformatics criteria to make up a grammar.

  12. A survey on evolutionary algorithm based hybrid intelligence in bioinformatics.

    PubMed

    Li, Shan; Kang, Liying; Zhao, Xing-Ming

    2014-01-01

    With the rapid advance in genomics, proteomics, metabolomics, and other types of omics technologies during the past decades, a tremendous amount of data related to molecular biology has been produced. It is becoming a big challenge for the bioinformatists to analyze and interpret these data with conventional intelligent techniques, for example, support vector machines. Recently, the hybrid intelligent methods, which integrate several standard intelligent approaches, are becoming more and more popular due to their robustness and efficiency. Specifically, the hybrid intelligent approaches based on evolutionary algorithms (EAs) are widely used in various fields due to the efficiency and robustness of EAs. In this review, we give an introduction about the applications of hybrid intelligent methods, in particular those based on evolutionary algorithm, in bioinformatics. In particular, we focus on their applications to three common problems that arise in bioinformatics, that is, feature selection, parameter estimation, and reconstruction of biological networks.

  13. A Survey on Evolutionary Algorithm Based Hybrid Intelligence in Bioinformatics

    PubMed Central

    Li, Shan; Zhao, Xing-Ming

    2014-01-01

    With the rapid advance in genomics, proteomics, metabolomics, and other types of omics technologies during the past decades, a tremendous amount of data related to molecular biology has been produced. It is becoming a big challenge for the bioinformatists to analyze and interpret these data with conventional intelligent techniques, for example, support vector machines. Recently, the hybrid intelligent methods, which integrate several standard intelligent approaches, are becoming more and more popular due to their robustness and efficiency. Specifically, the hybrid intelligent approaches based on evolutionary algorithms (EAs) are widely used in various fields due to the efficiency and robustness of EAs. In this review, we give an introduction about the applications of hybrid intelligent methods, in particular those based on evolutionary algorithm, in bioinformatics. In particular, we focus on their applications to three common problems that arise in bioinformatics, that is, feature selection, parameter estimation, and reconstruction of biological networks. PMID:24729969

  14. The Roots of Bioinformatics in Theoretical Biology

    PubMed Central

    Hogeweg, Paulien

    2011-01-01

    From the late 1980s onward, the term “bioinformatics” mostly has been used to refer to computational methods for comparative analysis of genome data. However, the term was originally more widely defined as the study of informatic processes in biotic systems. In this essay, I will trace this early history (from a personal point of view) and I will argue that the original meaning of the term is re-emerging. PMID:21483479

  15. Bioinformatics in Africa: The Rise of Ghana?

    PubMed

    Karikari, Thomas K

    2015-09-01

    Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics.

  16. Bioinformatics in Africa: The Rise of Ghana?

    PubMed Central

    Karikari, Thomas K.

    2015-01-01

    Until recently, bioinformatics, an important discipline in the biological sciences, was largely limited to countries with advanced scientific resources. Nonetheless, several developing countries have lately been making progress in bioinformatics training and applications. In Africa, leading countries in the discipline include South Africa, Nigeria, and Kenya. However, one country that is less known when it comes to bioinformatics is Ghana. Here, I provide a first description of the development of bioinformatics activities in Ghana and how these activities contribute to the overall development of the discipline in Africa. Over the past decade, scientists in Ghana have been involved in publications incorporating bioinformatics analyses, aimed at addressing research questions in biomedical science and agriculture. Scarce research funding and inadequate training opportunities are some of the challenges that need to be addressed for Ghanaian scientists to continue developing their expertise in bioinformatics. PMID:26378921

  17. Where are we in genomics?

    PubMed

    Hocquette, J F

    2005-06-01

    Genomic studies provide scientists with methods to quickly analyse genes and their products en masse. The first high-throughput techniques to be developed were sequencing methods. A great number of genomes from different organisms have thus been sequenced. Genomics is now shifting to the study of gene expression and function. In the past 5-10 years genomics, proteomics and high-throughput microarray technologies have fundamentally changed our ability to study the molecular basis of cells and tissues in health and diseases, giving a new comprehensive view. For example, in cancer research we have seen new diagnostic opportunities for tumour classification, and prognostication. A new exciting development is metabolomics and lab-on-a-chip techniques (which combine miniaturization and automation) for metabolic studies. However, to interpret the large amount of data, extensive computational development is required. In the coming years, we will see the study of biological networks dominating the scene in Physiology. The great accumulation of genomics information will be used in computer programs to simulate biologic processes. Originally developed for genome analysis, bioinformatics now encompasses a wide range of fields in biology from gene studies to integrated biology (i.e. combination of different data sets from genes to metabolites). This is systems biology which aims to study biological organisms as a whole. In medicine, scientific results and applied biotechnologies arising from genomics will be used for effective prediction of diseases and risk associated with drugs. Preventive medicine and medical therapy will be personalized. Widespread applications of genomics for personalized medicine will require associations of gene expression pattern with diagnoses, treatment and clinical data. This will help in the discovery and development of drugs. In agriculture and animal science, the outcomes of genomics will include improvement in food safety, in crop yield, in

  18. [Bioinformatics: a key role in oncology].

    PubMed

    Olivier, Timothée; Chappuis, Pierre; Tsantoulis, Petros

    2016-05-18

    Bioinformatics is essential in clinical oncology and research. Combining biology, computer science and mathematics, bioinformatics aims to derive useful information from clinical and biological data, often poorly structured, at a large scale. Bioinformatics approaches have reclassified certain cancers based on their molecular and biological presentation, improving treatment selection. Many molecular signatures have been developed and, after validation, some are now usable in clinical practice. Other applications could facilitate daily practice, reduce the risk of error and increase the precision of medical decision-making. Bioinformatics must evolve in accordance with ethical considerations and requires multidisciplinary collaboration. Its application depends on a sound technical foundation that meets strict quality requirements.

  19. [Bioinformatics: a key role in oncology].

    PubMed

    Olivier, Timothée; Chappuis, Pierre; Tsantoulis, Petros

    2016-05-18

    Bioinformatics is essential in clinical oncology and research. Combining biology, computer science and mathematics, bioinformatics aims to derive useful information from clinical and biological data, often poorly structured, at a large scale. Bioinformatics approaches have reclassified certain cancers based on their molecular and biological presentation, improving treatment selection. Many molecular signatures have been developed and, after validation, some are now usable in clinical practice. Other applications could facilitate daily practice, reduce the risk of error and increase the precision of medical decision-making. Bioinformatics must evolve in accordance with ethical considerations and requires multidisciplinary collaboration. Its application depends on a sound technical foundation that meets strict quality requirements. PMID:27424424

  20. Race, risk, and recreation in personal genomics: the limits of play.

    PubMed

    Lee, Sandra Soo-Jin

    2013-12-01

    Despite the mantra that genetics has moved beyond race, the burgeoning industry of genetic ancestry reveals how genetics has offered new technology through which individuals can link to intersections in time and space in complex ways that recapitulate understandings of racial order, origins, and group membership. This article focuses on the trope of "recreation" asserted in the marketing of ancestry genetic tests and examines the suggestion of self-discovery through the recovery of lost kin. Themes of recreation and re-creation paradoxically suggest both passivity of self-revelation and the power to re-act and re-create one's self in light of a different, more enlightened future. Direct-to-consumer personal genetics testing companies play guardian to this consumer play, providing tailored genetic scripts and highlighting how consumers might use their information. This article critically examines the play with concepts of ancestry, ethnicity, and genetic variation and their implications for public understanding of the relationship between race and genetics.

  1. Bioinformatics and Microarray Data Analysis on the Cloud.

    PubMed

    Calabrese, Barbara; Cannataro, Mario

    2016-01-01

    High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data. PMID:25863787

  2. Bioinformatics and Microarray Data Analysis on the Cloud.

    PubMed

    Calabrese, Barbara; Cannataro, Mario

    2016-01-01

    High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.

  3. Bioinformatics in the secondary science classroom: A study of state content standards and students' perceptions of, and performance in, bioinformatics lessons

    NASA Astrophysics Data System (ADS)

    Wefer, Stephen H.

    The proliferation of bioinformatics in modern Biology marks a new revolution in science, which promises to influence science education at all levels. This thesis examined state standards for content that articulated bioinformatics, and explored secondary students' affective and cognitive perceptions of, and performance in, a bioinformatics mini-unit. The results are presented as three studies. The first study analyzed secondary science standards of 49 U.S States (Iowa has no science framework) and the District of Columbia for content related to bioinformatics at the introductory high school biology level. The bionformatics content of each state's Biology standards were categorized into nine areas and the prevalence of each area documented. The nine areas were: The Human Genome Project, Forensics, Evolution, Classification, Nucleotide Variations, Medicine, Computer Use, Agriculture/Food Technology, and Science Technology and Society/Socioscientific Issues (STS/SSI). Findings indicated a generally low representation of bioinformatics related content, which varied substantially across the different areas. Recommendations are made for reworking existing standards to incorporate bioinformatics and to facilitate the goal of promoting science literacy in this emerging new field among secondary school students. The second study examined thirty-two students' affective responses to, and content mastery of, a two-week bioinformatics mini-unit. The findings indicate that the students generally were positive relative to their interest level, the usefulness of the lessons, the difficulty level of the lessons, likeliness to engage in additional bioinformatics, and were overall successful on the assessments. A discussion of the results and significance is followed by suggestions for future research and implementation for transferability. The third study presents a case study of individual differences among ten secondary school students, whose cognitive and affective percepts were

  4. Genomic Measures to Predict Adaptation to Novel Sensorimotor Environments and Improve Personalization of Countermeasure Design

    NASA Technical Reports Server (NTRS)

    Kreutzberg, G. A.; Zanello, S.; Seidler, R. D.; Peters, B.; De Dios, Y. E.; Gadd, N. E.; Bloomberg, J. J.; Mulavara, A. P.

    2016-01-01

    Introduction. Astronauts experience sensorimotor disturbances during their initial exposure to microgravity and during the re-adaptation phase following a return to an Earth-gravitational environment. These alterations may affect crewmembers' ability to perform mission-critical functional tasks. Interestingly, astronauts have shown significant inter-subject variation in adaptive capability during gravitational transitions. The ability to predict the manner and degree to which individual astronauts would be affected would improve the efficacy of personalized countermeasure training programs designed to enhance sensorimotor adaptability. The success of such an approach depends on the development of predictive measures of sensorimotor adaptation, which would ascertain each crewmember's adaptive capacity. The goal of this study is to determine whether specific genetic polymorphisms have significant influence on sensorimotor adaptability, which can help inform the design of personalized training countermeasures. Methods. Subjects (n=15) were tested on their ability to negotiate a complex obstacle course for ten test trials while wearing up-down vision-displacing goggles. This presented a visuomotor challenge while doing a full body task. The first test trial time and the recovery rate over the ten trials were used as adaptability performance metrics. Four single nucleotide polymorphisms (SNPs) were selected for their role in neural pathways underlying sensorimotor adaptation and were identified in subjects' DNA extracted from saliva samples: catechol-O-methyl transferase (COMT, rs4680), dopamine receptor D2 (DRD2, rs1076560), brain-derived neurotrophic factor genes (BDNF, rs6265), and the DraI polymorphism of the alpha-2 adrenergic receptor. The relationship between the SNPs and test performance was assessed by assigning subjects a rank score based on their adaptability performance metrics and comparing gene expression between the top half and bottom half performers

  5. Biophysics and bioinformatics of transcription regulation in bacteria and bacteriophages

    NASA Astrophysics Data System (ADS)

    Djordjevic, Marko

    2005-11-01

    Due to rapid accumulation of biological data, bioinformatics has become a very important branch of biological research. In this thesis, we develop novel bioinformatic approaches and aid design of biological experiments by using ideas and methods from statistical physics. Identification of transcription factor binding sites within the regulatory segments of genomic DNA is an important step towards understanding of the regulatory circuits that control expression of genes. We propose a novel, biophysics based algorithm, for the supervised detection of transcription factor (TF) binding sites. The method classifies potential binding sites by explicitly estimating the sequence-specific binding energy and the chemical potential of a given TF. In contrast with the widely used information theory based weight matrix method, our approach correctly incorporates saturation in the transcription factor/DNA binding probability. This results in a significant reduction in the number of expected false positives, and in the explicit appearance---and determination---of a binding threshold. The new method was used to identify likely genomic binding sites for the Escherichia coli TFs, and to examine the relationship between TF binding specificity and degree of pleiotropy (number of regulatory targets). We next address how parameters of protein-DNA interactions can be obtained from data on protein binding to random oligos under controlled conditions (SELEX experiment data). We show that 'robust' generation of an appropriate data set is achieved by a suitable modification of the standard SELEX procedure, and propose a novel bioinformatic algorithm for analysis of such data. Finally, we use quantitative data analysis, bioinformatic methods and kinetic modeling to analyze gene expression strategies of bacterial viruses. We study bacteriophage Xp10 that infects rice pathogen Xanthomonas oryzae. Xp10 is an unusual bacteriophage, which has morphology and genome organization that most closely

  6. An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.

    PubMed

    Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir

    2013-01-01

    DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation.

  7. An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.

    PubMed

    Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir

    2013-01-01

    DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation. PMID:24110283

  8. Systems genetics, bioinformatics and eQTL mapping.

    PubMed

    Li, Hong; Deng, Hongwen

    2010-10-01

    Jansen and Nap (Trends Genet 17(7):388-391, 2001) and Jansen (Nat Rev Genet 4:145-151, 2003) first proposed the concept of genetical genomics, or genome-wide genetic analysis of gene expression data, which is also called transcriptome mapping. In this approach, microarrays are used for measuring gene expression levels across genetic mapping populations. These gene expression patterns have been used for genome-wide association analysis, an analysis referred to as expression QTL (eQTL) mapping. Recent progress in genomics and experimental biology has brought exponential growth of the biological information available for computational analysis in public genomics databases. Bioinformatics is essential to genome-wide analysis of gene expression data and used as an effective tool for eQTL mapping. The use of Plabsoft database, EcoTILLING, GNARE and FastMap allowed for dramatic reduction of time in genome analysis. Some web-based tools (e.g., Lirnet, eQTL Viewer) provide efficient and intuitive ways for biologists to explore transcriptional regulation patterns, and to generate hypotheses on the genetic basis of transcriptional regulations. Expression quantitative trait loci (eQTL) mapping concerns finding genomic variation to elucidate variation of expression traits. This problem poses significant challenges due to high dimensionality of both the gene expression and the genomic marker data. The core challenges in understanding and explaining eQTL associations are the fine mapping and the lack of mechanistic explanation. But with the development of genetical genomics and computer technology, many new approaches for eQTL mapping will emerge. The statistical methods used for the analysis of expression QTL will become mature in the future.

  9. A Mathematical Optimization Problem in Bioinformatics

    ERIC Educational Resources Information Center

    Heyer, Laurie J.

    2008-01-01

    This article describes the sequence alignment problem in bioinformatics. Through examples, we formulate sequence alignment as an optimization problem and show how to compute the optimal alignment with dynamic programming. The examples and sample exercises have been used by the author in a specialized course in bioinformatics, but could be adapted…

  10. Using "Arabidopsis" Genetic Sequences to Teach Bioinformatics

    ERIC Educational Resources Information Center

    Zhang, Xiaorong

    2009-01-01

    This article describes a new approach to teaching bioinformatics using "Arabidopsis" genetic sequences. Several open-ended and inquiry-based laboratory exercises have been designed to help students grasp key concepts and gain practical skills in bioinformatics, using "Arabidopsis" leucine-rich repeat receptor-like kinase (LRR RLK) genetic…

  11. Incorporating a New Bioinformatics Component into Genetics at a Historically Black College: Outcomes and Lessons

    ERIC Educational Resources Information Center

    Holtzclaw, J. David; Eisen, Arri; Whitney, Erika M.; Penumetcha, Meera; Hoey, J. Joseph; Kimbro, K. Sean

    2006-01-01

    Many students at minority-serving institutions are underexposed to Internet resources such as the human genome project, PubMed, NCBI databases, and other Web-based technologies because of a lack of financial resources. To change this, we designed and implemented a new bioinformatics component to supplement the undergraduate Genetics course at…

  12. An "in silico" Bioinformatics Laboratory Manual for Bioscience Departments: "Prediction of Glycosylation Sites in Phosphoethanolamine Transferases"

    ERIC Educational Resources Information Center

    Alyuruk, Hakan; Cavas, Levent

    2014-01-01

    Genomics and proteomics projects have produced a huge amount of raw biological data including DNA and protein sequences. Although these data have been stored in data banks, their evaluation is strictly dependent on bioinformatics tools. These tools have been developed by multidisciplinary experts for fast and robust analysis of biological data.…

  13. Evolving Strategies for the Incorporation of Bioinformatics within the Undergraduate Cell Biology Curriculum

    ERIC Educational Resources Information Center

    Honts, Jerry E.

    2003-01-01

    Recent advances in genomics and structural biology have resulted in an unprecedented increase in biological data available from Internet-accessible databases. In order to help students effectively use this vast repository of information, undergraduate biology students at Drake University were introduced to bioinformatics software and databases in…

  14. Strategies for Using Peer-Assisted Learning Effectively in an Undergraduate Bioinformatics Course

    ERIC Educational Resources Information Center

    Shapiro, Casey; Ayon, Carlos; Moberg-Parker, Jordan; Levis-Fitzgerald, Marc; Sanders, Erin R.

    2013-01-01

    This study used a mixed methods approach to evaluate hybrid peer-assisted learning approaches incorporated into a bioinformatics tutorial for a genome annotation research project. Quantitative and qualitative data were collected from undergraduates who enrolled in a research-based laboratory course during two different academic terms at UCLA.…

  15. Highlights of the 2 nd Bioinformatics Student Symposium by ISCB RSG-UK

    PubMed Central

    White, Benjamen; Fatima, Vayani; Fatima, Nazeefa; Das, Sayoni; Rahman, Farzana; Hassan, Mehedi

    2016-01-01

    Following the success of the 1 st Student Symposium by ISCB RSG-UK, a 2 nd Student Symposium took place on 7 th October 2015 at The Genome Analysis Centre, Norwich, UK. This short report summarizes the main highlights from the 2 nd Bioinformatics Student Symposium. PMID:27239284

  16. The 2016 Bioinformatics Open Source Conference (BOSC)

    PubMed Central

    Harris, Nomi L.; Cock, Peter J.A.; Chapman, Brad; Fields, Christopher J.; Hokamp, Karsten; Lapp, Hilmar; Muñoz-Torres, Monica; Wiencko, Heather

    2016-01-01

    Message from the ISCB: The Bioinformatics Open Source Conference (BOSC) is a yearly meeting organized by the Open Bioinformatics Foundation (OBF), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development and Open Science within the biological research community. BOSC has been run since 2000 as a two-day Special Interest Group (SIG) before the annual ISMB conference. The 17th annual BOSC ( http://www.open-bio.org/wiki/BOSC_2016) took place in Orlando, Florida in July 2016. As in previous years, the conference was preceded by a two-day collaborative coding event open to the bioinformatics community. The conference brought together nearly 100 bioinformatics researchers, developers and users of open source software to interact and share ideas about standards, bioinformatics software development, and open and reproducible science. PMID:27781083

  17. Incorporating a collaborative web-based virtual laboratory in an undergraduate bioinformatics course.

    PubMed

    Weisman, David

    2010-01-01

    Face-to-face bioinformatics courses commonly include a weekly, in-person computer lab to facilitate active learning, reinforce conceptual material, and teach practical skills. Similarly, fully-online bioinformatics courses employ hands-on exercises to achieve these outcomes, although students typically perform this work offsite. Combining a face-to-face lecture course with a web-based virtual laboratory presents new opportunities for collaborative learning of the conceptual material, and for fostering peer support of technical bioinformatics questions. To explore this combination, an in-person lecture-only undergraduate bioinformatics course was augmented with a remote web-based laboratory, and tested with a large class. This study hypothesized that the collaborative virtual lab would foster active learning and peer support, and tested this hypothesis by conducting a student survey near the end of the semester. Respondents broadly reported strong benefits from the online laboratory, and strong benefits from peer-provided technical support. In comparison with traditional in-person teaching labs, students preferred the virtual lab by a factor of two. Key aspects of the course architecture and design are described to encourage further experimentation in teaching collaborative online bioinformatics laboratories. PMID:21567782

  18. Development of Bioinformatics Pipeline for Analyzing Clinical Pediatric NGS Data

    PubMed Central

    Crowgey, Erin L.; Kolb, Anders; Wu, Cathy H.

    2015-01-01

    Using an Illumina exome sequencing dataset generated from pediatric Acute Myeloid Leukemia patients (AML; type FLT3/ITD+) a comprehensive bioinformatics pipeline was developed to aid in a better clinical understanding of the genetic data associated with the clinical phenotype. The pipeline starts with raw next generation sequencing reads and using both publicly available resources and custom scripts, analyzes the genomic data for variants associated with pediatric AML. By incorporating functional information such as Gene Ontology annotation and protein-protein interactions, the methodology prioritizes genomic variants and returns disease specific results and knowledge maps. Furthermore, it compares the somatic mutations at diagnosis with the somatic mutations at relapse and outputs variants and functional annotations that are specific for the relapse state. PMID:26306272

  19. When cloud computing meets bioinformatics: a review.

    PubMed

    Zhou, Shuigeng; Liao, Ruiqi; Guan, Jihong

    2013-10-01

    In the past decades, with the rapid development of high-throughput technologies, biology research has generated an unprecedented amount of data. In order to store and process such a great amount of data, cloud computing and MapReduce were applied to many fields of bioinformatics. In this paper, we first introduce the basic concepts of cloud computing and MapReduce, and their applications in bioinformatics. We then highlight some problems challenging the applications of cloud computing and MapReduce to bioinformatics. Finally, we give a brief guideline for using cloud computing in biology research.

  20. Bioinformatics for Diagnostics, Forensics, and Virulence Characterization and Detection

    SciTech Connect

    Gardner, S; Slezak, T

    2005-04-05

    We summarize four of our group's high-risk/high-payoff research projects funded by the Intelligence Technology Innovation Center (ITIC) in conjunction with our DHS-funded pathogen informatics activities. These are (1) quantitative assessment of genomic sequencing needs to predict high quality DNA and protein signatures for detection, and comparison of draft versus finished sequences for diagnostic signature prediction; (2) development of forensic software to identify SNP and PCR-RFLP variations from a large number of viral pathogen sequences and optimization of the selection of markers for maximum discrimination of those sequences; (3) prediction of signatures for the detection of virulence, antibiotic resistance, and toxin genes and genetic engineering markers in bacteria; (4) bioinformatic characterization of virulence factors to rapidly screen genomic data for potential genes with similar functions and to elucidate potential health threats in novel organisms. The results of (1) are being used by policy makers to set national sequencing priorities. Analyses from (2) are being used in collaborations with the CDC to genotype and characterize many variola strains, and reports from these collaborations have been made to the President. We also determined SNPs for serotype and strain discrimination of 126 foot and mouth disease virus (FMDV) genomes. For (3), currently >1000 probes have been predicted for the specific detection of >4000 virulence, antibiotic resistance, and genetic engineering vector sequences, and we expect to complete the bioinformatic design of a comprehensive ''virulence detection chip'' by August 2005. Results of (4) will be a system to rapidly predict potential virulence pathways and phenotypes in organisms based on their genomic sequences.

  1. Bioinformatic challenges in targeted proteomics.

    PubMed

    Reker, Daniel; Malmström, Lars

    2012-09-01

    Selected reaction monitoring mass spectrometry is an emerging targeted proteomics technology that allows for the investigation of complex protein samples with high sensitivity and efficiency. It requires extensive knowledge about the sample for the many parameters needed to carry out the experiment to be set appropriately. Most studies today rely on parameter estimation from prior studies, public databases, or from measuring synthetic peptides. This is efficient and sound, but in absence of prior data, de novo parameter estimation is necessary. Computational methods can be used to create an automated framework to address this problem. However, the number of available applications is still small. This review aims at giving an orientation on the various bioinformatical challenges. To this end, we state the problems in classical machine learning and data mining terms, give examples of implemented solutions and provide some room for alternatives. This will hopefully lead to an increased momentum for the development of algorithms and serve the needs of the community for computational methods. We note that the combination of such methods in an assisted workflow will ease both the usage of targeted proteomics in experimental studies as well as the further development of computational approaches. PMID:22866949

  2. Evolution in bioinformatic resources: 2009 update on the Bioinformatics Links Directory.

    PubMed

    Brazas, Michelle D; Yamada, Joseph Tadashi; Ouellette, B F Francis

    2009-07-01

    All of the life science research web servers published in this and previous issues of Nucleic Acids Research, together with other useful tools, databases and resources for bioinformatics and molecular biology research are freely accessible online through the Bioinformatics Links Directory, http://bioinformatics.ca/links_directory/. Entirely dependent on user feedback and community input, the Bioinformatics Links Directory exemplifies an open access research tool and resource. With 112 websites featured in the July 2009 Web Server Issue of Nucleic Acids Research, the 2009 update brings the total number of servers listed in the Bioinformatics Links Directory close to an impressive 1400 links. A complete list of all links listed in this Nucleic Acids Research 2009 Web Server Issue can be accessed online at http://bioinfomatics.ca/links_directory/narweb2009/. The 2009 update of the Bioinformatics Links Directory, which includes the Web Server list and summaries, is also available online at the Nucleic Acids Research website, http://nar.oxfordjournals.org/.

  3. Bioinformatics and its applications in plant biology.

    PubMed

    Rhee, Seung Yon; Dickerson, Julie; Xu, Dong

    2006-01-01

    Bioinformatics plays an essential role in today's plant science. As the amount of data grows exponentially, there is a parallel growth in the demand for tools and methods in data management, visualization, integration, analysis, modeling, and prediction. At the same time, many researchers in biology are unfamiliar with available bioinformatics methods, tools, and databases, which could lead to missed opportunities or misinterpretation of the information. In this review, we describe some of the key concepts, methods, software packages, and databases used in bioinformatics, with an emphasis on those relevant to plant science. We also cover some fundamental issues related to biological sequence analyses, transcriptome analyses, computational proteomics, computational metabolomics, bio-ontologies, and biological databases. Finally, we explore a few emerging research topics in bioinformatics.

  4. Bioinformatics Visualisation Tools: An Unbalanced Picture.

    PubMed

    Broască, Laura; Ancuşa, Versavia; Ciocârlie, Horia

    2016-01-01

    Visualization tools represent a key element in triggering human creativity while being supported with the analysis power of the machine. This paper analyzes free network visualization tools for bioinformatics, frames them in domain specific requirements and compares them. PMID:27577488

  5. Bioinformatics in Italy: BITS2011, the Eighth Annual Meeting of the Italian Society of Bioinformatics

    PubMed Central

    2012-01-01

    The BITS2011 meeting, held in Pisa on June 20-22, 2011, brought together more than 120 Italian researchers working in the field of Bioinformatics, as well as students in Bioinformatics, Computational Biology, Biology, Computer Sciences, and Engineering, representing a landscape of Italian bioinformatics research. This preface provides a brief overview of the meeting and introduces the peer-reviewed manuscripts that were accepted for publication in this Supplement. PMID:22536954

  6. No-boundary thinking in bioinformatics research

    PubMed Central

    2013-01-01

    Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT). PMID:24192339

  7. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  8. Challenges of the next decade for the Asia Pacific region: 2010 International Conference in Bioinformatics (InCoB 2010)

    PubMed Central

    2010-01-01

    The 2010 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia’s oldest bioinformatics organisation formed in 1998, was organized as the 9th International Conference on Bioinformatics (InCoB), Sept. 26-28, 2010 in Tokyo, Japan. Initially, APBioNet created InCoB as forum to foster bioinformatics in the Asia Pacific region. Given the growing importance of interdisciplinary research, InCoB2010 included topics targeting scientists in the fields of genomic medicine, immunology and chemoinformatics, supporting translational research. Peer-reviewed manuscripts that were accepted for publication in this supplement, represent key areas of research interests that have emerged in our region. We also highlight some of the current challenges bioinformatics is facing in the Asia Pacific region and conclude our report with the announcement of APBioNet’s 100 BioDatabases (BioDB100) initiative. BioDB100 will comply with the database criteria set out earlier in our proposal for Minimum Information about a Bioinformatics and Investigation (MIABi), setting the standards for biocuration and bioinformatics research, on which we will report at the next InCoB, Nov. 27 – Dec. 2, 2011 at Kuala Lumpur, Malaysia. PMID:21143792

  9. GIW and InCoB, two premier bioinformatics conferences in Asia with a combined 40 years of history

    PubMed Central

    2015-01-01

    Knowledge discovery in bioinformatics thrives on joint and inclusive efforts of stakeholders. Similarly, knowledge dissemination is expected to be more effective and scalable through joint efforts. Therefore, the International Conference on Bioinformatics (InCoB) and the International Conference on Genome Informatics (GIW) were organized as a joint conference for the first time in 13 years of coexistence. The Asia-Pacific Bioinformatics Network (APBioNet) and the Japanese Society for Bioinformatics (JSBi) collaborated to host GIW/InCoB2015 in Tokyo, September 9-11, 2015. The joint endeavour yielded 51 research articles published in seven journals, 78 poster and 89 oral presentations, showcasing bioinformatics research in the Asia-Pacific region. Encouraged by the results and reduced organizational overheads, APBioNet will collaborate with other bioinformatics societies in organizing co-located bioinformatics research and training meetings in the future. InCoB2016 will be hosted in Singapore, September 21-23, 2016. PMID:26679412

  10. Will solid-state drives accelerate your bioinformatics? In-depth profiling, performance analysis and beyond.

    PubMed

    Lee, Sungmin; Min, Hyeyoung; Yoon, Sungroh

    2016-07-01

    A wide variety of large-scale data have been produced in bioinformatics. In response, the need for efficient handling of biomedical big data has been partly met by parallel computing. However, the time demand of many bioinformatics programs still remains high for large-scale practical uses because of factors that hinder acceleration by parallelization. Recently, new generations of storage devices have emerged, such as NAND flash-based solid-state drives (SSDs), and with the renewed interest in near-data processing, they are increasingly becoming acceleration methods that can accompany parallel processing. In certain cases, a simple drop-in replacement of hard disk drives by SSDs results in dramatic speedup. Despite the various advantages and continuous cost reduction of SSDs, there has been little review of SSD-based profiling and performance exploration of important but time-consuming bioinformatics programs. For an informative review, we perform in-depth profiling and analysis of 23 key bioinformatics programs using multiple types of devices. Based on the insight we obtain from this research, we further discuss issues related to design and optimize bioinformatics algorithms and pipelines to fully exploit SSDs. The programs we profile cover traditional and emerging areas of importance, such as alignment, assembly, mapping, expression analysis, variant calling and metagenomics. We explain how acceleration by parallelization can be combined with SSDs for improved performance and also how using SSDs can expedite important bioinformatics pipelines, such as variant calling by the Genome Analysis Toolkit and transcriptome analysis using RNA sequencing. We hope that this review can provide useful directions and tips to accompany future bioinformatics algorithm design procedures that properly consider new generations of powerful storage devices. PMID:26330577

  11. Regulatory bioinformatics for food and drug safety.

    PubMed

    Healy, Marion J; Tong, Weida; Ostroff, Stephen; Eichler, Hans-Georg; Patak, Alex; Neuspiel, Margaret; Deluyker, Hubert; Slikker, William

    2016-10-01

    "Regulatory Bioinformatics" strives to develop and implement a standardized and transparent bioinformatic framework to support the implementation of existing and emerging technologies in regulatory decision-making. It has great potential to improve public health through the development and use of clinically important medical products and tools to manage the safety of the food supply. However, the application of regulatory bioinformatics also poses new challenges and requires new knowledge and skill sets. In the latest Global Coalition on Regulatory Science Research (GCRSR) governed conference, Global Summit on Regulatory Science (GSRS2015), regulatory bioinformatics principles were presented with respect to global trends, initiatives and case studies. The discussion revealed that datasets, analytical tools, skills and expertise are rapidly developing, in many cases via large international collaborative consortia. It also revealed that significant research is still required to realize the potential applications of regulatory bioinformatics. While there is significant excitement in the possibilities offered by precision medicine to enhance treatments of serious and/or complex diseases, there is a clear need for further development of mechanisms to securely store, curate and share data, integrate databases, and standardized quality control and data analysis procedures. A greater understanding of the biological significance of the data is also required to fully exploit vast datasets that are becoming available. The application of bioinformatics in the microbiological risk analysis paradigm is delivering clear benefits both for the investigation of food borne pathogens and for decision making on clinically important treatments. It is recognized that regulatory bioinformatics will have many beneficial applications by ensuring high quality data, validated tools and standardized processes, which will help inform the regulatory science community of the requirements

  12. Regulatory bioinformatics for food and drug safety.

    PubMed

    Healy, Marion J; Tong, Weida; Ostroff, Stephen; Eichler, Hans-Georg; Patak, Alex; Neuspiel, Margaret; Deluyker, Hubert; Slikker, William

    2016-10-01

    "Regulatory Bioinformatics" strives to develop and implement a standardized and transparent bioinformatic framework to support the implementation of existing and emerging technologies in regulatory decision-making. It has great potential to improve public health through the development and use of clinically important medical products and tools to manage the safety of the food supply. However, the application of regulatory bioinformatics also poses new challenges and requires new knowledge and skill sets. In the latest Global Coalition on Regulatory Science Research (GCRSR) governed conference, Global Summit on Regulatory Science (GSRS2015), regulatory bioinformatics principles were presented with respect to global trends, initiatives and case studies. The discussion revealed that datasets, analytical tools, skills and expertise are rapidly developing, in many cases via large international collaborative consortia. It also revealed that significant research is still required to realize the potential applications of regulatory bioinformatics. While there is significant excitement in the possibilities offered by precision medicine to enhance treatments of serious and/or complex diseases, there is a clear need for further development of mechanisms to securely store, curate and share data, integrate databases, and standardized quality control and data analysis procedures. A greater understanding of the biological significance of the data is also required to fully exploit vast datasets that are becoming available. The application of bioinformatics in the microbiological risk analysis paradigm is delivering clear benefits both for the investigation of food borne pathogens and for decision making on clinically important treatments. It is recognized that regulatory bioinformatics will have many beneficial applications by ensuring high quality data, validated tools and standardized processes, which will help inform the regulatory science community of the requirements

  13. Providing web servers and training in Bioinformatics: 2010 update on the Bioinformatics Links Directory.

    PubMed

    Brazas, Michelle D; Yamada, Joseph T; Ouellette, B F Francis

    2010-07-01

    The Links Directory at Bioinformatics.ca continues its collaboration with Nucleic Acids Research to jointly publish and compile a freely accessible, online collection of tools, databases and resource materials for bioinformatics and molecular biology research. The July 2010 Web Server issue of Nucleic Acids Research adds an additional 115 web server tools and 7 updates to the directory at http://bioinformatics.ca/links_directory/, bringing the total number of servers listed close to an impressive 1500 links. The Bioinformatics Links Directory represents an excellent community resource for locating bioinformatic tools and databases to aid one's research, and in this context bioinformatic education needs and initiatives are discussed. A complete list of all links featured in this Nucleic Acids Research 2010 Web Server issue can be accessed online at http://bioinformatics.ca/links_directory/narweb2010/. The 2010 update of the Bioinformatics Links Directory, which includes the Web Server list and summaries, is also available online at the Nucleic Acids Research website, http://nar.oxfordjournals.org/.

  14. A Guide to Bioinformatics for Immunologists

    PubMed Central

    Whelan, Fiona J.; Yap, Nicholas V. L.; Surette, Michael G.; Golding, G. Brian; Bowdish, Dawn M. E.

    2013-01-01

    Bioinformatics includes a suite of methods, which are cheap, approachable, and many of which are easily accessible without any sort of specialized bioinformatic training. Yet, despite this, bioinformatic tools are under-utilized by immunologists. Herein, we review a representative set of publicly available, easy-to-use bioinformatic tools using our own research on an under-annotated human gene, SCARA3, as an example. SCARA3 shares an evolutionary relationship with the class A scavenger receptors, but preliminary research showed that it was divergent enough that its function remained unclear. In our quest for more information about this gene – did it share gene sequence similarities to other scavenger receptors? Did it contain conserved protein domains? Where was it expressed in the human body? – we discovered the power and informative potential of publicly available bioinformatic tools designed for the novice in mind, which allowed us to hypothesize on the regulation, structure, and function of this protein. We argue that these tools are largely applicable to many facets of immunology research. PMID:24363654

  15. The MPI Bioinformatics Toolkit for protein sequence analysis

    PubMed Central

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N.

    2006-01-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at . PMID:16845021

  16. Can bioinformatics help in the identification of moonlighting proteins?

    PubMed

    Hernández, Sergio; Calvo, Alejandra; Ferragut, Gabriela; Franco, Luís; Hermoso, Antoni; Amela, Isaac; Gómez, Antonio; Querol, Enrique; Cedano, Juan

    2014-12-01

    Protein multitasking or moonlighting is the capability of certain proteins to execute two or more unique biological functions. This ability to perform moonlighting functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Usually, moonlighting proteins are revealed experimentally by serendipity, and the proteins described probably represent just the tip of the iceberg. It would be helpful if bioinformatics could predict protein multifunctionality, especially because of the large amounts of sequences coming from genome projects. In the present article, we describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. The sequence analysis has been performed: (i) by remote homology searches using PSI-BLAST, (ii) by the detection of functional motifs, and (iii) by the co-evolutionary relationship between amino acids. Programs designed to identify functional motifs/domains are basically oriented to detect the main function, but usually fail in the detection of secondary ones. Remote homology searches such as PSI-BLAST seem to be more versatile in this task, and it is a good complement for the information obtained from protein-protein interaction (PPI) databases. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can be used only in very restricted situations, but can suggest how the evolutionary process of the acquisition of the second function took place. PMID:25399591

  17. The MPI Bioinformatics Toolkit for protein sequence analysis.

    PubMed

    Biegert, Andreas; Mayer, Christian; Remmert, Michael; Söding, Johannes; Lupas, Andrei N

    2006-07-01

    The MPI Bioinformatics Toolkit is an interactive web service which offers access to a great variety of public and in-house bioinformatics tools. They are grouped into different sections that support sequence searches, multiple alignment, secondary and tertiary structure prediction and classification. Several public tools are offered in customized versions that extend their functionality. For example, PSI-BLAST can be run against regularly updated standard databases, customized user databases or selectable sets of genomes. Another tool, Quick2D, integrates the results of various secondary structure, transmembrane and disorder prediction programs into one view. The Toolkit provides a friendly and intuitive user interface with an online help facility. As a key feature, various tools are interconnected so that the results of one tool can be forwarded to other tools. One could run PSI-BLAST, parse out a multiple alignment of selected hits and send the results to a cluster analysis tool. The Toolkit framework and the tools developed in-house will be packaged and freely available under the GNU Lesser General Public Licence (LGPL). The Toolkit can be accessed at http://toolkit.tuebingen.mpg.de.

  18. Building International Genomics Collaboration for Global Health Security

    PubMed Central

    Cui, Helen H.; Erkkila, Tracy; Chain, Patrick S. G.; Vuyisich, Momchilo

    2015-01-01

    Genome science and technologies are transforming life sciences globally in many ways and becoming a highly desirable area for international collaboration to strengthen global health. The Genome Science Program at the Los Alamos National Laboratory is leveraging a long history of expertise in genomics research to assist multiple partner nations in advancing their genomics and bioinformatics capabilities. The capability development objectives focus on providing a molecular genomics-based scientific approach for pathogen detection, characterization, and biosurveillance applications. The general approaches include introduction of basic principles in genomics technologies, training on laboratory methodologies and bioinformatic analysis of resulting data, procurement, and installation of next-generation sequencing instruments, establishing bioinformatics software capabilities, and exploring collaborative applications of the genomics capabilities in public health. Genome centers have been established with public health and research institutions in the Republic of Georgia, Kingdom of Jordan, Uganda, and Gabon; broader collaborations in genomics applications have also been developed with research institutions in many other countries. PMID:26697418

  19. Building international genomics collaboration for global health security

    SciTech Connect

    Cui, Helen H.; Erkkila, Tracy; Chain, Patrick S. G.; Vuyisich, Momchilo

    2015-12-07

    Genome science and technologies are transforming life sciences globally in many ways and becoming a highly desirable area for international collaboration to strengthen global health. The Genome Science Program at the Los Alamos National Laboratory is leveraging a long history of expertise in genomics research to assist multiple partner nations in advancing their genomics and bioinformatics capabilities. The capability development objectives focus on providing a molecular genomics-based scientific approach for pathogen detection, characterization, and biosurveillance applications. The general approaches include introduction of basic principles in genomics technologies, training on laboratory methodologies and bioinformatic analysis of resulting data, procurement, and installation of next-generation sequencing instruments, establishing bioinformatics software capabilities, and exploring collaborative applications of the genomics capabilities in public health. Genome centers have been established with public health and research institutions in the Republic of Georgia, Kingdom of Jordan, Uganda, and Gabon; broader collaborations in genomics applications have also been developed with research institutions in many other countries.

  20. Molecular Dynamics: New Frontier in Personalized Medicine.

    PubMed

    Sneha, P; Doss, C George Priya

    2016-01-01

    The field of drug discovery has witnessed infinite development over the last decade with the demand for discovery of novel efficient lead compounds. Although the development of novel compounds in this field has seen large failure, a breakthrough in this area might be the establishment of personalized medicine. The trend of personalized medicine has shown stupendous growth being a hot topic after the successful completion of Human Genome Project and 1000 genomes pilot project. Genomic variant such as SNPs play a vital role with respect to inter individual's disease susceptibility and drug response. Hence, identification of such genetic variants has to be performed before administration of a drug. This process requires high-end techniques to understand the complexity of the molecules which might bring an insight to understand the compounds at their molecular level. To sustenance this, field of bioinformatics plays a crucial role in revealing the molecular mechanism of the mutation and thereby designing a drug for an individual in fast and affordable manner. High-end computational methods, such as molecular dynamics (MD) simulation has proved to be a constitutive approach to detecting the minor changes associated with an SNP for better understanding of the structural and functional relationship. The parameters used in molecular dynamic simulation elucidate different properties of a macromolecule, such as protein stability and flexibility. MD along with docking analysis can reveal the synergetic effect of an SNP in protein-ligand interaction and provides a foundation for designing a particular drug molecule for an individual. This compelling application of computational power and the advent of other technologies have paved a promising way toward personalized medicine. In this in-depth review, we tried to highlight the different wings of MD toward personalized medicine. PMID:26827606

  1. Molecular Dynamics: New Frontier in Personalized Medicine.

    PubMed

    Sneha, P; Doss, C George Priya

    2016-01-01

    The field of drug discovery has witnessed infinite development over the last decade with the demand for discovery of novel efficient lead compounds. Although the development of novel compounds in this field has seen large failure, a breakthrough in this area might be the establishment of personalized medicine. The trend of personalized medicine has shown stupendous growth being a hot topic after the successful completion of Human Genome Project and 1000 genomes pilot project. Genomic variant such as SNPs play a vital role with respect to inter individual's disease susceptibility and drug response. Hence, identification of such genetic variants has to be performed before administration of a drug. This process requires high-end techniques to understand the complexity of the molecules which might bring an insight to understand the compounds at their molecular level. To sustenance this, field of bioinformatics plays a crucial role in revealing the molecular mechanism of the mutation and thereby designing a drug for an individual in fast and affordable manner. High-end computational methods, such as molecular dynamics (MD) simulation has proved to be a constitutive approach to detecting the minor changes associated with an SNP for better understanding of the structural and functional relationship. The parameters used in molecular dynamic simulation elucidate different properties of a macromolecule, such as protein stability and flexibility. MD along with docking analysis can reveal the synergetic effect of an SNP in protein-ligand interaction and provides a foundation for designing a particular drug molecule for an individual. This compelling application of computational power and the advent of other technologies have paved a promising way toward personalized medicine. In this in-depth review, we tried to highlight the different wings of MD toward personalized medicine.

  2. One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies.

    PubMed

    Yuan, Shuai; Johnston, H Richard; Zhang, Guosheng; Li, Yun; Hu, Yi-Juan; Qin, Zhaohui S

    2015-08-01

    With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.

  3. The genetic association between personality and major depression or bipolar disorder. A polygenic score analysis using genome-wide association data

    PubMed Central

    Middeldorp, C M; de Moor, M H M; McGrath, L M; Gordon, S D; Blackwood, D H; Costa, P T; Terracciano, A; Krueger, R F; de Geus, E J C; Nyholt, D R; Tanaka, T; Esko, T; Madden, P A F; Derringer, J; Amin, N; Willemsen, G; Hottenga, J-J; Distel, M A; Uda, M; Sanna, S; Spinhoven, P; Hartman, C A; Ripke, S; Sullivan, P F; Realo, A; Allik, J; Heath, A C; Pergadia, M L; Agrawal, A; Lin, P; Grucza, R A; Widen, E; Cousminer, D L; Eriksson, J G; Palotie, A; Barnett, J H; Lee, P H; Luciano, M; Tenesa, A; Davies, G; Lopez, L M; Hansell, N K; Medland, S E; Ferrucci, L; Schlessinger, D; Montgomery, G W; Wright, M J; Aulchenko, Y S; Janssens, A C J W; Oostra, B A; Metspalu, A; Abecasis, G R; Deary, I J; Räikkönen, K; Bierut, L J; Martin, N G; Wray, N R; van Duijn, C M; Smoller, J W; Penninx, B W J H; Boomsma, D I

    2011-01-01

    The relationship between major depressive disorder (MDD) and bipolar disorder (BD) remains controversial. Previous research has reported differences and similarities in risk factors for MDD and BD, such as predisposing personality traits. For example, high neuroticism is related to both disorders, whereas openness to experience is specific for BD. This study examined the genetic association between personality and MDD and BD by applying polygenic scores for neuroticism, extraversion, openness to experience, agreeableness and conscientiousness to both disorders. Polygenic scores reflect the weighted sum of multiple single-nucleotide polymorphism alleles associated with the trait for an individual and were based on a meta-analysis of genome-wide association studies for personality traits including 13 835 subjects. Polygenic scores were tested for MDD in the combined Genetic Association Information Network (GAIN-MDD) and MDD2000+ samples (N=8921) and for BD in the combined Systematic Treatment Enhancement Program for Bipolar Disorder and Wellcome Trust Case–Control Consortium samples (N=6329) using logistic regression analyses. At the phenotypic level, personality dimensions were associated with MDD and BD. Polygenic neuroticism scores were significantly positively associated with MDD, whereas polygenic extraversion scores were significantly positively associated with BD. The explained variance of MDD and BD, ∼0.1%, was highly comparable to the variance explained by the polygenic personality scores in the corresponding personality traits themselves (between 0.1 and 0.4%). This indicates that the proportions of variance explained in mood disorders are at the upper limit of what could have been expected. This study suggests shared genetic risk factors for neuroticism and MDD on the one hand and for extraversion and BD on the other. PMID:22833196

  4. Adapting bioinformatics curricula for big data.

    PubMed

    Greene, Anna C; Giffin, Kristine A; Greene, Casey S; Moore, Jason H

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs.

  5. Bioinformatic characterization of plant networks

    SciTech Connect

    McDermott, Jason E.; Samudrala, Ram

    2008-06-30

    Cells and organisms are governed by networks of interactions, genetic, physical and metabolic. Large-scale experimental studies of interactions between components of biological systems have been performed for a variety of eukaryotic organisms. However, there is a dearth of such data for plants. Computational methods for prediction of relationships between proteins, primarily based on comparative genomics, provide a useful systems-level view of cellular functioning and can be used to extend information about other eukaryotes to plants. We have predicted networks for Arabidopsis thaliana, Oryza sativa indica and japonica and several plant pathogens using the Bioverse (http://bioverse.compbio.washington.edu) and show that they are similar to experimentally-derived interaction networks. Predicted interaction networks for plants can be used to provide novel functional annotations and predictions about plant phenotypes and aid in rational engineering of biosynthesis pathways.

  6. The GMOD Drupal Bioinformatic Server Framework

    PubMed Central

    Papanicolaou, Alexie; Heckel, David G.

    2010-01-01

    Motivation: Next-generation sequencing technologies have led to the widespread use of -omic applications. As a result, there is now a pronounced bioinformatic bottleneck. The general model organism database (GMOD) tool kit (http://gmod.org) has produced a number of resources aimed at addressing this issue. It lacks, however, a robust online solution that can deploy heterogeneous data and software within a Web content management system (CMS). Results: We present a bioinformatic framework for the Drupal CMS. It consists of three modules. First, GMOD-DBSF is an application programming interface module for the Drupal CMS that simplifies the programming of bioinformatic Drupal modules. Second, the Drupal Bioinformatic Software Bench (biosoftware_bench) allows for a rapid and secure deployment of bioinformatic software. An innovative graphical user interface (GUI) guides both use and administration of the software, including the secure provision of pre-publication datasets. Third, we present genes4all_experiment, which exemplifies how our work supports the wider research community. Conclusion: Given the infrastructure presented here, the Drupal CMS may become a powerful new tool set for bioinformaticians. The GMOD-DBSF base module is an expandable community resource that decreases development time of Drupal modules for bioinformatics. The biosoftware_bench module can already enhance biologists' ability to mine their own data. The genes4all_experiment module has already been responsible for archiving of more than 150 studies of RNAi from Lepidoptera, which were previously unpublished. Availability and implementation: Implemented in PHP and Perl. Freely available under the GNU Public License 2 or later from http://gmod-dbsf.googlecode.com Contact: alexie@butterflybase.org PMID:20971988

  7. The European Bioinformatics Institute’s data resources 2014

    PubMed Central

    Brooksbank, Catherine; Bergman, Mary Todd; Apweiler, Rolf; Birney, Ewan; Thornton, Janet

    2014-01-01

    Molecular Biology has been at the heart of the ‘big data’ revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff’s ‘Atlas of Protein Sequence and Structure’ through the Human Genome Project in the late 1990s and early 2000s to today’s population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI’s database collection to complement the reviews of individual databases provided elsewhere in this issue. PMID:24271396

  8. Prioritizing the human genome: knowledge management for drug discovery.

    PubMed

    Golden, James B

    2003-05-01

    This review covers recent methods to create a manageable subset of drug targets for development by prioritizing novel genes from the Human Genome Project. The ability to organize genomic data into a distinct set of drug discovery assets can be viewed as a form of knowledge management. While bioinformatics systems have been built to manage genomics-based data, the central theme in creating any bioinformatics infrastructure should be organization-specific knowledge management. PMID:12833662

  9. Bioinformatics and cancer: an essential alliance.

    PubMed

    Dopazo, Joaquín

    2006-06-01

    Modern research in cancer has been revolutionized by the introduction of new high-throughput methodologies such as DNA microarrays. Keeping the pace with these technologies, the bioinformatics offer new solutions for data analysis and, what is more important, it permits to formulate a new class of hypothesis inspired in systems biology, more oriented to blocks of functionally-related genes. Although software implementations for this new methodologies is new there are some options already available. Bioinformatic solutions for other high-throughput techniques such as array-CGH of large-scale genotyping is also revised.

  10. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula.

    PubMed

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-04-04

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula.

  11. Bioinformatics Analysis of MAPKKK Family Genes in Medicago truncatula

    PubMed Central

    Li, Wei; Xu, Hanyun; Liu, Ying; Song, Lili; Guo, Changhong; Shu, Yongjun

    2016-01-01

    Mitogen-activated protein kinase kinase kinase (MAPKKK) is a component of the MAPK cascade pathway that plays an important role in plant growth, development, and response to abiotic stress, the functions of which have been well characterized in several plant species, such as Arabidopsis, rice, and maize. In this study, we performed genome-wide and systemic bioinformatics analysis of MAPKKK family genes in Medicago truncatula. In total, there were 73 MAPKKK family members identified by search of homologs, and they were classified into three subfamilies, MEKK, ZIK, and RAF. Based on the genomic duplication function, 72 MtMAPKKK genes were located throughout all chromosomes, but they cluster in different chromosomes. Using microarray data and high-throughput sequencing-data, we assessed their expression profiles in growth and development processes; these results provided evidence for exploring their important functions in developmental regulation, especially in the nodulation process. Furthermore, we investigated their expression in abiotic stresses by RNA-seq, which confirmed their critical roles in signal transduction and regulation processes under stress. In summary, our genome-wide, systemic characterization and expressional analysis of MtMAPKKK genes will provide insights that will be useful for characterizing the molecular functions of these genes in M. truncatula. PMID:27049397

  12. The genomic landscapes of human breast and colorectal cancers.

    PubMed

    Wood, Laura D; Parsons, D Williams; Jones, Siân; Lin, Jimmy; Sjöblom, Tobias; Leary, Rebecca J; Shen, Dong; Boca, Simina M; Barber, Thomas; Ptak, Janine; Silliman, Natalie; Szabo, Steve; Dezso, Zoltan; Ustyanksky, Vadim; Nikolskaya, Tatiana; Nikolsky, Yuri; Karchin, Rachel; Wilson, Paul A; Kaminker, Joshua S; Zhang, Zemin; Croshaw, Randal; Willis, Joseph; Dawson, Dawn; Shipitsin, Michail; Willson, James K V; Sukumar, Saraswati; Polyak, Kornelia; Park, Ben Ho; Pethiyagoda, Charit L; Pant, P V Krishna; Ballinger, Dennis G; Sparks, Andrew B; Hartigan, James; Smith, Douglas R; Suh, Erick; Papadopoulos, Nickolas; Buckhaults, Phillip; Markowitz, Sanford D; Parmigiani, Giovanni; Kinzler, Kenneth W; Velculescu, Victor E; Vogelstein, Bert

    2007-11-16

    Human cancer is caused by the accumulation of mutations in oncogenes and tumor suppressor genes. To catalog the genetic changes that occur during tumorigenesis, we isolated DNA from 11 breast and 11 colorectal tumors and determined the sequences of the genes in the Reference Sequence database in these samples. Based on analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene "mountains" and a much larger number of gene "hills" that are mutated at low frequency. We describe statistical and bioinformatic tools that may help identify mutations with a role in tumorigenesis. These results have implications for understanding the nature and heterogeneity of human cancers and for using personal genomics for tumor diagnosis and therapy.

  13. Proceedings of the Second Annual Conference of the MidSouth Computational Biology and Bioinformatics Society

    PubMed Central

    Wren, Jonathan D; Slikker, William

    2005-01-01

    The MCBIOS 2004 conference brought together regional researchers and students in biology, computer science and bioinformatics on October 7th-9th 2004 to present their latest work. This editorial describes the conference itself and introduces the twelve peer-reviewed manuscripts accepted for publication in the Proceedings of the MCBIOS 2004 Conference. These manuscripts included new methods for analysis of high-throughput gene expression experiments, EST clustering, analysis of mass spectrometry data and genomic analysis PMID:16026594

  14. BIOINFORMATIC RESOURCES FOR SOYBEAN GENETIC AND GENOMIC RESEARCH

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In the last 10 years, soybean researchers have produced huge amounts of sequence-based data. The molecular genetic map has expanded to include over 2,000 RFLP, RAPD, SSR and SNP markers. Over a thousand QTL have been mapped representing ~90 agronomically important traits. Over 350,000 Expressed S...

  15. Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins

    PubMed Central

    Carmi, Shai; Hui, Ken Y.; Kochav, Ethan; Liu, Xinmin; Xue, James; Grady, Fillan; Guha, Saurav; Upadhyay, Kinnari; Ben-Avraham, Dan; Mukherjee, Semanti; Bowen, B. Monica; Thomas, Tinu; Vijai, Joseph; Cruts, Marc; Froyen, Guy; Lambrechts, Diether; Plaisance, Stéphane; Van Broeckhoven, Christine; Van Damme, Philip; Van Marck, Herwig; Barzilai, Nir; Darvasi, Ariel; Offit, Kenneth; Bressman, Susan; Ozelius, Laurie J.; Peter, Inga; Cho, Judy H.; Ostrer, Harry; Atzmon, Gil; Clark, Lorraine N.; Lencz, Todd; Pe’er, Itsik

    2014-01-01

    The Ashkenazi Jewish (AJ) population is a genetic isolate close to European and Middle Eastern groups, with genetic diversity patterns conducive to disease mapping. Here we report high-depth sequencing of 128 complete genomes of AJ controls. Compared with European samples, our AJ panel has 47% more novel variants per genome and is eightfold more effective at filtering benign variants out of AJ clinical genomes. Our panel improves imputation accuracy for AJ SNP arrays by 28%, and covers at least one haplotype in ≈67% of any AJ genome with long, identical-by-descent segments. Reconstruction of recent AJ history from such segments confirms a recent bottleneck of merely ≈350 individuals. Modelling of ancient histories for AJ and European populations using their joint allele frequency spectrum determines AJ to be an even admixture of European and likely Middle Eastern origins. We date the split between the two ancestral populations to ≈12–25 Kyr, suggesting a predominantly Near Eastern source for the repopulation of Europe after the Last Glacial Maximum. PMID:25203624

  16. Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins.

    PubMed

    Carmi, Shai; Hui, Ken Y; Kochav, Ethan; Liu, Xinmin; Xue, James; Grady, Fillan; Guha, Saurav; Upadhyay, Kinnari; Ben-Avraham, Dan; Mukherjee, Semanti; Bowen, B Monica; Thomas, Tinu; Vijai, Joseph; Cruts, Marc; Froyen, Guy; Lambrechts, Diether; Plaisance, Stéphane; Van Broeckhoven, Christine; Van Damme, Philip; Van Marck, Herwig; Barzilai, Nir; Darvasi, Ariel; Offit, Kenneth; Bressman, Susan; Ozelius, Laurie J; Peter, Inga; Cho, Judy H; Ostrer, Harry; Atzmon, Gil; Clark, Lorraine N; Lencz, Todd; Pe'er, Itsik

    2014-09-09

    The Ashkenazi Jewish (AJ) population is a genetic isolate close to European and Middle Eastern groups, with genetic diversity patterns conducive to disease mapping. Here we report high-depth sequencing of 128 complete genomes of AJ controls. Compared with European samples, our AJ panel has 47% more novel variants per genome and is eightfold more effective at filtering benign variants out of AJ clinical genomes. Our panel improves imputation accuracy for AJ SNP arrays by 28%, and covers at least one haplotype in ≈ 67% of any AJ genome with long, identical-by-descent segments. Reconstruction of recent AJ history from such segments confirms a recent bottleneck of merely ≈ 350 individuals. Modelling of ancient histories for AJ and European populations using their joint allele frequency spectrum determines AJ to be an even admixture of European and likely Middle Eastern origins. We date the split between the two ancestral populations to ≈ 12-25 Kyr, suggesting a predominantly Near Eastern source for the repopulation of Europe after the Last Glacial Maximum.

  17. Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins.

    PubMed

    Carmi, Shai; Hui, Ken Y; Kochav, Ethan; Liu, Xinmin; Xue, James; Grady, Fillan; Guha, Saurav; Upadhyay, Kinnari; Ben-Avraham, Dan; Mukherjee, Semanti; Bowen, B Monica; Thomas, Tinu; Vijai, Joseph; Cruts, Marc; Froyen, Guy; Lambrechts, Diether; Plaisance, Stéphane; Van Broeckhoven, Christine; Van Damme, Philip; Van Marck, Herwig; Barzilai, Nir; Darvasi, Ariel; Offit, Kenneth; Bressman, Susan; Ozelius, Laurie J; Peter, Inga; Cho, Judy H; Ostrer, Harry; Atzmon, Gil; Clark, Lorraine N; Lencz, Todd; Pe'er, Itsik

    2014-01-01

    The Ashkenazi Jewish (AJ) population is a genetic isolate close to European and Middle Eastern groups, with genetic diversity patterns conducive to disease mapping. Here we report high-depth sequencing of 128 complete genomes of AJ controls. Compared with European samples, our AJ panel has 47% more novel variants per genome and is eightfold more effective at filtering benign variants out of AJ clinical genomes. Our panel improves imputation accuracy for AJ SNP arrays by 28%, and covers at least one haplotype in ≈ 67% of any AJ genome with long, identical-by-descent segments. Reconstruction of recent AJ history from such segments confirms a recent bottleneck of merely ≈ 350 individuals. Modelling of ancient histories for AJ and European populations using their joint allele frequency spectrum determines AJ to be an even admixture of European and likely Middle Eastern origins. We date the split between the two ancestral populations to ≈ 12-25 Kyr, suggesting a predominantly Near Eastern source for the repopulation of Europe after the Last Glacial Maximum. PMID:25203624

  18. Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community

    PubMed Central

    Pickett, Brett E.; Greer, Douglas S.; Zhang, Yun; Stewart, Lucy; Zhou, Liwei; Sun, Guangyu; Gu, Zhiping; Kumar, Sanjeev; Zaremba, Sam; Larsen, Christopher N.; Jen, Wei; Klem, Edward B.; Scheuermann, Richard H.

    2012-01-01

    Several viruses within the Coronaviridae family have been categorized as either emerging or re-emerging human pathogens, with Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) being the most well known. The NIAID-sponsored Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org) supports bioinformatics workflows for a broad range of human virus pathogens and other related viruses, including the entire Coronaviridae family. ViPR provides access to sequence records, gene and protein annotations, immune epitopes, 3D structures, host factor data, and other data types through an intuitive web-based search interface. Records returned from these queries can then be subjected to web-based analyses including: multiple sequence alignment, phylogenetic inference, sequence variation determination, BLAST comparison, and metadata-driven comparative genomics statistical analysis. Additional tools exist to display multiple sequence alignments, view phylogenetic trees, visualize 3D protein structures, transfer existing reference genome annotations to new genomes, and store or share results from any search or analysis within personal private ‘Workbench’ spaces for future access. All of the data and integrated analysis and visualization tools in ViPR are made available without charge as a service to the Coronaviridae research community to facilitate the research and development of diagnostics, prophylactics, vaccines and therapeutics against these human pathogens. PMID:23202522

  19. Proteomics, genomics and the future of medical education.

    PubMed

    Pike, Linda J; Sadler, J Evan

    2004-01-01

    The completion of the human genome project in 2003 ushered in the era of genomics, the systematic study of our DNA sequence. Proteomics, the study of the full complement of proteins present in a cell, is a natural extension of genomics. Together, the information obtainable through genomics and proteomics has tremendous potential to change clinical practice. The application of such information to medical diagnosis and treatment will require significant changes in the training of physicians. All students and physicians in training will need to acquire enough knowledge of the underlying science, including medical genetics, epidemiology, bioinformatics and statistics, so they will intuitively understand the technology and recognize the strengths and limitations of genomic/proteomic tests. Because genomic or proteomic testing may yield extensive information about a person's genetic makeup and disease risks, consideration will need to be given throughout the medical curriculum to the ethical issues raised by the application of this new technology to the diagnosis and treatment of patients. PMID:15535026

  20. High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics.

    PubMed

    Yang, Mary Qu; Athey, Brian D; Arabnia, Hamid R; Sung, Andrew H; Liu, Qingzhong; Yang, Jack Y; Mao, Jinghe; Deng, Youping

    2009-07-07

    The advent of high-throughput next generation sequencing technologies have fostered enormous potential applications of supercomputing techniques in genome sequencing, epi-genetics, metagenomics, personalized medicine, discovery of non-coding RNAs and protein-binding sites. To this end, the 2008 International Conference on Bioinformatics and Computational Biology (Biocomp) - 2008 World Congress on Computer Science, Computer Engineering and Applied Computing (Worldcomp) was designed to promote synergistic inter/multidisciplinary research and education in response to the current research trends and advances. The conference attracted more than two thousand scientists, medical doctors, engineers, professors and students gathered at Las Vegas, Nevada, USA during July 14-17 and received great success. Supported by International Society of Intelligent Biological Medicine (ISIBM), International Journal of Computational Biology and Drug Design (IJCBDD), International Journal of Functional Informatics and Personalized Medicine (IJFIPM) and the leading research laboratories from Harvard, M.I.T., Purdue, UIUC, UCLA, Georgia Tech, UT Austin, U. of Minnesota, U. of Iowa etc, the conference received thousands of research papers. Each submitted paper was reviewed by at least three reviewers and accepted papers were required to satisfy reviewers' comments. Finally, the review board and the committee decided to select only 19 high-quality research papers for inclusion in this supplement to BMC Genomics based on the peer reviews only. The conference committee was very grateful for the Plenary Keynote Lectures given by: Dr. Brian D. Athey (University of Michigan Medical School), Dr. Vladimir N. Uversky (Indiana University School of Medicine), Dr. David A. Patterson (Member of United States National Academy of Sciences and National Academy of Engineering, University of California at Berkeley) and Anousheh Ansari (Prodea Systems, Space Ambassador). The theme of the conference to promote

  1. Bioinformatics: A History of Evolution "In Silico"

    ERIC Educational Resources Information Center

    Ondrej, Vladan; Dvorak, Petr

    2012-01-01

    Bioinformatics, biological databases, and the worldwide use of computers have accelerated biological research in many fields, such as evolutionary biology. Here, we describe a primer of nucleotide sequence management and the construction of a phylogenetic tree with two examples; the two selected are from completely different groups of organisms:…

  2. Bioinformatics in Undergraduate Education: Practical Examples

    ERIC Educational Resources Information Center

    Boyle, John A.

    2004-01-01

    Bioinformatics has emerged as an important research tool in recent years. The ability to mine large databases for relevant information has become increasingly central to many different aspects of biochemistry and molecular biology. It is important that undergraduates be introduced to the available information and methodologies. We present a…

  3. Bioboxes: standardised containers for interchangeable bioinformatics software.

    PubMed

    Belmann, Peter; Dröge, Johannes; Bremges, Andreas; McHardy, Alice C; Sczyrba, Alexander; Barton, Michael D

    2015-01-01

    Software is now both central and essential to modern biology, yet lack of availability, difficult installations, and complex user interfaces make software hard to obtain and use. Containerisation, as exemplified by the Docker platform, has the potential to solve the problems associated with sharing software. We propose bioboxes: containers with standardised interfaces to make bioinformatics software interchangeable.

  4. Bioboxes: standardised containers for interchangeable bioinformatics software.

    PubMed

    Belmann, Peter; Dröge, Johannes; Bremges, Andreas; McHardy, Alice C; Sczyrba, Alexander; Barton, Michael D

    2015-01-01

    Software is now both central and essential to modern biology, yet lack of availability, difficult installations, and complex user interfaces make software hard to obtain and use. Containerisation, as exemplified by the Docker platform, has the potential to solve the problems associated with sharing software. We propose bioboxes: containers with standardised interfaces to make bioinformatics software interchangeable. PMID:26473029

  5. Implementing bioinformatic workflows within the bioextract server

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Computational workflows in bioinformatics are becoming increasingly important in the achievement of scientific advances. These workflows typically require the integrated use of multiple, distributed data sources and analytic tools. The BioExtract Server (http://bioextract.org) is a distributed servi...

  6. "Extreme Programming" in a Bioinformatics Class

    ERIC Educational Resources Information Center

    Kelley, Scott; Alger, Christianna; Deutschman, Douglas

    2009-01-01

    The importance of Bioinformatics tools and methodology in modern biological research underscores the need for robust and effective courses at the college level. This paper describes such a course designed on the principles of cooperative learning based on a computer software industry production model called "Extreme Programming" (EP). The…

  7. SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.

    EPA Science Inventory

    Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...

  8. Analysis of genomic DNA with the UCSC genome browser.

    PubMed

    Pevsner, Jonathan

    2009-01-01

    Genomic DNA is being sequenced and annotated at a rapid rate, with terabases of DNA currently deposited in GenBank and other repositories. Genome browsers provide an essential collection of resources to visualize and analyze chromosomal DNA. The University of California, Santa Cruz (UCSC) Genome Browser provides annotations from the level of single nucleotides to whole chromosomes for four dozen metazoan and other species. The Genome Browser may be used to address a wide range of problems in bioinformatics (e.g., sequence analysis), comparative genomics, and evolution.

  9. An integrated framework for reporting clinically relevant biomarkers from paired tumor/normal genomic and transcriptomic sequencing data in support of clinical trials in personalized medicine.

    PubMed

    Nasser, Sara; Kurdolgu, Ahmet A; Izatt, Tyler; Aldrich, Jessica; Russell, Megan L; Christoforides, Alexis; Tembe, Wiabhav; Keifer, Jeffery A; Corneveaux, Jason J; Byron, Sara A; Forman, Karen M; Zuccaro, Clarice; Keats, Jonathan J; Lorusso, Patricia M; Carpten, John D; Trent, Jeffrey M; Craig, David W

    2015-01-01

    The ability to rapidly sequence the tumor and germline DNA of an individual holds the eventual promise of revolutionizing our ability to match targeted therapies to tumors harboring the associated genetic biomarkers. Analyzing high throughput genomic data consisting of millions of base pairs and discovering alterations in clinically actionable genes in a structured and real time manner is at the crux of personalized testing. This requires a computational architecture that can monitor and track a system within a regulated environment as terabytes of data are reduced to a small number of therapeutically relevant variants, delivered as a diagnostic laboratory developed test. These high complexity assays require data structures that enable real-time and retrospective ad-hoc analysis, with a capability of updating to keep up with the rapidly changing genomic and therapeutic options, all under a regulated environment that is relevant under both CMS and FDA depending on application. We describe a flexible computational framework that uses a paired tumor/normal sample allowing for complete analysis and reporting in approximately 24 hours, providing identification of single nucleotide changes, small insertions and deletions, chromosomal rearrangements, gene fusions and gene expression with positive predictive values over 90%. In this paper we present the challenges in integrating clinical, genomic and annotation databases to provide interpreted draft reports which we utilize within ongoing clinical research protocols. We demonstrate the need to retire from existing performance measurements of accuracy and specificity and measure metrics that are meaningful to a genomic diagnostic environment. This paper presents a three-tier infrastructure that is currently being used to analyze an individual genome and provide available therapeutic options via a clinical report. Our framework utilizes a non-relational variant-centric database that is scaleable to a large amount of data and

  10. Navigating the changing learning landscape: perspective from bioinformatics.ca

    PubMed Central

    Ouellette, B. F. Francis

    2013-01-01

    With the advent of YouTube channels in bioinformatics, open platforms for problem solving in bioinformatics, active web forums in computing analyses and online resources for learning to code or use a bioinformatics tool, the more traditional continuing education bioinformatics training programs have had to adapt. Bioinformatics training programs that solely rely on traditional didactic methods are being superseded by these newer resources. Yet such face-to-face instruction is still invaluable in the learning continuum. Bioinformatics.ca, which hosts the Canadian Bioinformatics Workshops, has blended more traditional learning styles with current online and social learning styles. Here we share our growing experiences over the past 12 years and look toward what the future holds for bioinformatics training programs. PMID:23515468

  11. Tuberculosis: from genome to vaccine.

    PubMed

    de Jonge, Marien I; Brosch, Roland; Brodin, Priscille; Demangel, Caroline; Cole, Stewart T

    2005-08-01

    The availability of mycobacterial genome sequences has paved the way to identifying potential tuberculosis vaccine candidates in order to replace the currently used bacillus Calmette-Guérin (BCG) vaccines that show variable protective efficacy in adults. Genomics provides the basis for bioinformatic, transcriptomic and proteomic analysis, increases screening efficiency and enables valuable information concerning the biology and virulence of the mycobacterial species to be extracted by comparative genomics. Although in silico results must be confirmed in vitro and in vivo, bioinformatic analysis of the genomes is highlighting candidates for testing. For designing subunit vaccines, attenuated or improved recombinant whole-cell live vaccines, information from the genomes of the human host and pathogenic mycobacterial species is of great help.

  12. Component-Based Approach for Educating Students in Bioinformatics

    ERIC Educational Resources Information Center

    Poe, D.; Venkatraman, N.; Hansen, C.; Singh, G.

    2009-01-01

    There is an increasing need for an effective method of teaching bioinformatics. Increased progress and availability of computer-based tools for educating students have led to the implementation of a computer-based system for teaching bioinformatics as described in this paper. Bioinformatics is a recent, hybrid field of study combining elements of…

  13. Candidate genes for nicotine dependence via linkage, epistasis, and bioinformatics.

    PubMed

    Sullivan, Patrick F; Neale, Benjamin M; van den Oord, Edwin; Miles, Michael F; Neale, Michael C; Bulik, Cynthia M; Joyce, Peter R; Straub, Richard E; Kendler, Kenneth S

    2004-04-01

    Many smoking-related phenotypes are substantially heritable. One genome scan of nicotine dependence (ND) has been published and several others are in progress and should be completed in the next 5 years. The goal of this hypothesis-generating study was two-fold. First, we present further analyses of our genome scan data for ND published by Straub et al. [1999: Mol Psychiatry 4:129-144] (PMID: 10208445). Second, we used the method described by Cox et al. [1999: Nat Genet 21:213-215] (PMID: 9988276) to search for epistatic loci across the markers used in the genome scan. The overall results of the genome scan nearly reached the rigorous Lander and Kruglyak [1995: Nat Genet 11:241-247] criteria for "significant" linkage with the best findings on chromosomes 10 and 2. We then looked for correspondence between genes located in the 10 regions implicated in affected sibling pair (ASP) and epistatic linkage analyses with a list of genes suggested by microarray studies of experimental nicotine exposure and candidate genes from the literature. We found correspondence between linkage and microarray/candidate gene studies for genes involved with the mitogen-activated protein kinase (MAPK) signaling system, nuclear factor kappa B (NFKB) complex, neuropeptide Y (NPY) neurotransmission, a nicotinic receptor subunit (CHRNA2), the vesicular monoamine transporter (SLC18A2), genes in pathways implicated in human anxiety (HTR7, TDO2, and the endozepine-related protein precursor, DKFZP434A2417), and the micro 1-opioid receptor (OPRM1). Although the hypotheses resulting from these linkage and bioinformatic analyses are plausible and intriguing, their ultimate worth depends on replication in additional linkage samples and in future experimental studies. PMID:15048644

  14. Automatic Discovery and Inferencing of Complex Bioinformatics Web Interfaces

    SciTech Connect

    Ngu, A; Rocco, D; Critchlow, T; Buttler, D

    2003-12-22

    The World Wide Web provides a vast resource to genomics researchers in the form of web-based access to distributed data sources--e.g. BLAST sequence homology search interfaces. However, the process for seeking the desired scientific information is still very tedious and frustrating. While there are several known servers on genomic data (e.g., GeneBank, EMBL, NCBI), that are shared and accessed frequently, new data sources are created each day in laboratories all over the world. The sharing of these newly discovered genomics results are hindered by the lack of a common interface or data exchange mechanism. Moreover, the number of autonomous genomics sources and their rate of change out-pace the speed at which they can be manually identified, meaning that the available data is not being utilized to its full potential. An automated system that can find, classify, describe and wrap new sources without tedious and low-level coding of source specific wrappers is needed to assist scientists to access to hundreds of dynamically changing bioinformatics web data sources through a single interface. A correct classification of any kind of Web data source must address both the capability of the source and the conversation/interaction semantics which is inherent in the design of the Web data source. In this paper, we propose an automatic approach to classify Web data sources that takes into account both the capability and the conversational semantics of the source. The ability to discover the interaction pattern of a Web source leads to increased accuracy in the classification process. At the same time, it facilitates the extraction of process semantics, which is necessary for the automatic generation of wrappers that can interact correctly with the sources.

  15. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis.

    PubMed

    Noar, Roslyn D; Daub, Margaret E

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  16. Bioinformatics Prediction of Polyketide Synthase Gene Clusters from Mycosphaerella fijiensis

    PubMed Central

    Noar, Roslyn D.; Daub, Margaret E.

    2016-01-01

    Mycosphaerella fijiensis, causal agent of black Sigatoka disease of banana, is a Dothideomycete fungus closely related to fungi that produce polyketides important for plant pathogenicity. We utilized the M. fijiensis genome sequence to predict PKS genes and their gene clusters and make bioinformatics predictions about the types of compounds produced by these clusters. Eight PKS gene clusters were identified in the M. fijiensis genome, placing M. fijiensis into the 23rd percentile for the number of PKS genes compared to other Dothideomycetes. Analysis of the PKS domains identified three of the PKS enzymes as non-reducing and two as highly reducing. Gene clusters contained types of genes frequently found in PKS clusters including genes encoding transporters, oxidoreductases, methyltransferases, and non-ribosomal peptide synthases. Phylogenetic analysis identified a putative PKS cluster encoding melanin biosynthesis. None of the other clusters were closely aligned with genes encoding known polyketides, however three of the PKS genes fell into clades with clusters encoding alternapyrone, fumonisin, and solanapyrone produced by Alternaria and Fusarium species. A search for homologs among available genomic sequences from 103 Dothideomycetes identified close homologs (>80% similarity) for six of the PKS sequences. One of the PKS sequences was not similar (< 60% similarity) to sequences in any of the 103 genomes, suggesting that it encodes a unique compound. Comparison of the M. fijiensis PKS sequences with those of two other banana pathogens, M. musicola and M. eumusae, showed that these two species have close homologs to five of the M. fijiensis PKS sequences, but three others were not found in either species. RT-PCR and RNA-Seq analysis showed that the melanin PKS cluster was down-regulated in infected banana as compared to growth in culture. Three other clusters, however were strongly upregulated during disease development in banana, suggesting that they may encode

  17. Genomics and privacy: implications of the new reality of closed data for the field.

    PubMed

    Greenbaum, Dov; Sboner, Andrea; Mu, Xinmeng Jasmine; Gerstein, Mark

    2011-12-01

    Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can "slice" and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approaches-for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the

  18. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  19. Bioinformatic analysis of functional proteins involved in obesity associated with diabetes.

    PubMed

    Rao, Allam Appa; Tayaru, N Manga; Thota, Hanuman; Changalasetty, Suresh Babu; Thota, Lalitha Saroja; Gedela, Srinubabu

    2008-03-01

    The twin epidemic of diabetes and obesity pose daunting challenges worldwide. The dramatic rise in obesity-associated diabetes resulted in an alarming increase in the incidence and prevalence of obesity an important complication of diabetes. Differences among individuals in their susceptibility to both these conditions probably reflect their genetic constitutions. The dramatic improvements in genomic and bioinformatic resources are accelerating the pace of gene discovery. It is tempting to speculate the key susceptible genes/proteins that bridges diabetes mellitus and obesity. In this regard, we evaluated the role of several genes/proteins that are believed to be involved in the evolution of obesity associated diabetes by employing multiple sequence alignment using ClustalW tool and constructed a phylogram tree using functional protein sequences extracted from NCBI. Phylogram was constructed using Neighbor-Joining Algorithm a bioinformatic tool. Our bioinformatic analysis reports resistin gene as ominous link with obesity associated diabetes. This bioinformatic study will be useful for future studies towards therapeutic inventions of obesity associated type 2 diabetes. PMID:23675069

  20. Comparative modeling of proteins: a method for engaging students' interest in bioinformatics tools.

    PubMed

    Badotti, Fernanda; Barbosa, Alan Sales; Reis, André Luiz Martins; do Valle, Italo Faria; Ambrósio, Lara; Bitar, Mainá

    2014-01-01

    The huge increase in data being produced in the genomic era has produced a need to incorporate computers into the research process. Sequence generation, its subsequent storage, interpretation, and analysis are now entirely computer-dependent tasks. Universities from all over the world have been challenged to seek a way of encouraging students to incorporate computational and bioinformatics skills since undergraduation in order to understand biological processes. The aim of this article is to report the experience of awakening students' interest in bioinformatics tools during a course focused on comparative modeling of proteins. The authors start by giving a full description of the course environmental context and students' backgrounds. Then they detail each class and present a general overview of the protein modeling protocol. The positive and negative aspects of the course are also reported, and some of the results generated in class and in projects outside the classroom are discussed. In the last section of the article, general perspectives about the course from students' point of view are given. This work can serve as a guide for professors who teach subjects for which bioinformatics tools are useful and for universities that plan to incorporate bioinformatics into the curriculum. PMID:24167006

  1. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  2. Nanoinformatics: an emerging area of information technology at the intersection of bioinformatics, computational chemistry and nanobiotechnology.

    PubMed

    González-Nilo, Fernando; Pérez-Acle, Tomás; Guínez-Molinos, Sergio; Geraldo, Daniela A; Sandoval, Claudia; Yévenes, Alejandro; Santos, Leonardo S; Laurie, V Felipe; Mendoza, Hegaly; Cachau, Raúl E

    2011-01-01

    After the progress made during the genomics era, bioinformatics was tasked with supporting the flow of information generated by nanobiotechnology efforts. This challenge requires adapting classical bioinformatic and computational chemistry tools to store, standardize, analyze, and visualize nanobiotechnological information. Thus, old and new bioinformatic and computational chemistry tools have been merged into a new sub-discipline: nanoinformatics. This review takes a second look at the development of this new and exciting area as seen from the perspective of the evolution of nanobiotechnology applied to the life sciences. The knowledge obtained at the nano-scale level implies answers to new questions and the development of new concepts in different fields. The rapid convergence of technologies around nanobiotechnologies has spun off collaborative networks and web platforms created for sharing and discussing the knowledge generated in nanobiotechnology. The implementation of new database schemes suitable for storage, processing and integrating physical, chemical, and biological properties of nanoparticles will be a key element in achieving the promises in this convergent field. In this work, we will review some applications of nanobiotechnology to life sciences in generating new requirements for diverse scientific fields, such as bioinformatics and computational chemistry.

  3. The Complete Mitochondrial Genome of the Foodborne Parasitic Pathogen Cyclospora cayetanensis

    PubMed Central

    Cinar, Hediye Nese; Gopinath, Gopal; Jarvis, Karen; Murphy, Helen R.

    2015-01-01

    Cyclospora cayetanensis is a human-specific coccidian parasite responsible for several food and water-related outbreaks around the world, including the most recent ones involving over 900 persons in 2013 and 2014 outbreaks in the USA. Multicopy organellar DNA such as mitochondrion genomes have been particularly informative for detection and genetic traceback analysis in other parasites. We sequenced the C. cayetanensis genomic DNA obtained from stool samples from patients infected with Cyclospora in Nepal using the Illumina MiSeq platform. By bioinformatically filtering out the metagenomic reads of non-coccidian origin sequences and concentrating the reads by targeted alignment, we were able to obtain contigs containing Eimeria-like mitochondrial, apicoplastic and some chromosomal genomic fragments. A mitochondrial genomic sequence was assembled and confirmed by cloning and sequencing targeted PCR products amplified from Cyclospora DNA using primers based on our draft assembly sequence. The results show that the C. cayetanensis mitochondrion genome is 6274 bp in length, with 33% GC content, and likely exists in concatemeric arrays as in Eimeria mitochondrial genomes. Phylogenetic analysis of the C. cayetanensis mitochondrial genome places this organism in a tight cluster with Eimeria species. The mitochondrial genome of C. cayetanensis contains three protein coding genes, cytochrome (cytb), cytochrome C oxidase subunit 1 (cox1), and cytochrome C oxidase subunit 3 (cox3), in addition to 14 large subunit (LSU) and nine small subunit (SSU) fragmented rRNA genes. PMID:26042787

  4. The 2011 Bioinformatics Links Directory update: more resources, tools and databases and features to empower the bioinformatics community.

    PubMed

    Brazas, Michelle D; Yim, David S; Yamada, Joseph T; Ouellette, B F Francis

    2011-07-01

    The Bioinformatics Links Directory continues its collaboration with Nucleic Acids Research to collaboratively publish and compile a freely accessible, online collection of tools, databases and resource materials for bioinformatics and molecular biology research. The July 2011 Web Server issue of Nucleic Acids Research adds an additional 78 web server tools and 14 updates to the directory at http://bioinformatics.ca/links_directory/.

  5. Genomic Profiling of Metastatic Gastroenteropancreatic Neuroendocrine Tumor (GEP-NET) Patients in the Personalized-Medicine Era

    PubMed Central

    Kim, Seung Tae; Lee, Su Jin; Park, Se Hoon; Park, Joon Oh; Lim, Ho Yeong; Kang, Won Ki; Lee, Jeeyun; Park, Young Suk

    2016-01-01

    Background: We have conducted molecular profiling through a high-throughput molecular test as part of our clinical practice for patients with advanced gastrointestinal (GI) cancer or rare cancers including gastroenteropancreatic neuroendocrine tumors (GEP-NETs). Herein, we report on the molecular characterization of 14 metastatic GEP-NET patients. Methods: We conducted the Ion AmpliSeq Cancer Hotspot Panel v2 (detecting 2,855 oncogenic mutations in 50 commonly mutated genes) and nCounter Copy Number Variation Assay, which was designed with 21 genes based on available targeted agents, as a high throughput genomic platform in 14 patients with metastatic GEP-NETs. Results: Among the 14 GEP-NET patients analyzed in this study, 8 patients had grade III neuroendocrine carcinoma (NEC) and 6 had grade I/II NET. Primary sites included pancreas (n=3), small intestine and ascending colon (n=3), distal colon and rectum (n=5), and unknown primary origin (n=3). The most common metastatic site was the liver. Of 14 GEP-NET patients available for mutational profiling, 7 (50.0%) patients had one or more aberrations detected. Common aberrations were as follows: SMARCB1 mutation (n=2), TP53 mutation (n=2), STK11 mutation (n=1), RET mutation (n=1), and BRAF mutation (n=1). Gene amplification by nCounter was detected in only 1 patient, showing CCNE1 amplification, and this patient also had a TP53 mutation. Conclusions: This high throughput genomic test may be useful to identify new drug targets in metastatic GEP-NET patients. Currently, we plan to conduct further genomic analysis to develop predictive and prognostic biomarkers in a larger number of GEP-NET patients. PMID:27326246

  6. [Ethical issues raised by direct-to-consumer personal genome analysis and whole body scans: discussion and contextualisation of a report by the Nuffield Council on Bioethics].

    PubMed

    Buyx, Alena M; Strech, Daniel; Schmidt, Harald

    2012-01-01

    The paradigm of personalised medicine has many different facets, further to the application of pharmacogenetics. We examine here (direct-to-consumer) personal genome analysis and whole body scans and summarise findings from the Nuffield Council's on Bioethics recent report "Medical profiling and online medicine: the ethics of 'personalised healthcare' in a consumer age". We describe the current situation in Germany with regard to access to such services, and contextualise the Nuffield Council's report with summaries of position statements by German professional bodies. We conclude with three points that merit examination further to the analyses of the Nuffield Council's report and the German professional bodies. These concern the role of indirect evidence in considering restrictive policies, the question of whether regulations should require commercial providers to contribute to the generation of better evidence, and the option of using data from evaluations in combination with indirect evidence in justifying restrictive policies.

  7. Should We Have Blind Faith in Bioinformatics Software? Illustrations from the SNAP Web-Based Tool

    PubMed Central

    Robiou-du-Pont, Sébastien; Li, Aihua; Christie, Shanice; Sohani, Zahra N.; Meyre, David

    2015-01-01

    Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any ‘false positive’ SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen’s Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation. PMID:25742008

  8. Should we have blind faith in bioinformatics software? Illustrations from the SNAP web-based tool.

    PubMed

    Robiou-du-Pont, Sébastien; Li, Aihua; Christie, Shanice; Sohani, Zahra N; Meyre, David

    2015-01-01

    Bioinformatics tools have gained popularity in biology but little is known about their validity. We aimed to assess the early contribution of 415 single nucleotide polymorphisms (SNPs) associated with eight cardio-metabolic traits at the genome-wide significance level in adults in the Family Atherosclerosis Monitoring In earLY Life (FAMILY) birth cohort. We used the popular web-based tool SNAP to assess the availability of the 415 SNPs in the Illumina Cardio-Metabochip genotyped in the FAMILY study participants. We then compared the SNAP output with the Cardio-Metabochip file provided by Illumina using chromosome and chromosomal positions of SNPs from NCBI Human Genome Browser (Genome Reference Consortium Human Build 37). With the HapMap 3 release 2 reference, 201 out of 415 SNPs were reported as missing in the Cardio-Metabochip by the SNAP output. However, the Cardio-Metabochip file revealed that 152 of these 201 SNPs were in fact present in the Cardio-Metabochip array (false negative rate of 36.6%). With the more recent 1000 Genomes Project release, we found a false-negative rate of 17.6% by comparing the outputs of SNAP and the Illumina product file. We did not find any 'false positive' SNPs (SNPs specified as available in the Cardio-Metabochip by SNAP, but not by the Cardio-Metabochip Illumina file). The Cohen's Kappa coefficient, which calculates the percentage of agreement between both methods, indicated that the validity of SNAP was fair to moderate depending on the reference used (the HapMap 3 or 1000 Genomes). In conclusion, we demonstrate that the SNAP outputs for the Cardio-Metabochip are invalid. This study illustrates the importance of systematically assessing the validity of bioinformatics tools in an independent manner. We propose a series of guidelines to improve practices in the fast-moving field of bioinformatics software implementation.

  9. Characterization of microRNA expression profiles in blood and saliva using the Ion Personal Genome Machine(®) System (Ion PGM™ System).

    PubMed

    Wang, Zheng; Zhou, Di; Cao, Yandong; Hu, Zhen; Zhang, Suhua; Bian, Yingnan; Hou, Yiping; Li, Chengtao

    2016-01-01

    MicroRNA (miRNA) expression profiling is gaining interest in the forensic community because the intrinsically short fragment and tissue-specific expression pattern enable miRNAs as a useful biomarker for body fluid identification. Measuring the quantity of miRNAs in forensically relevant body fluids is an important step to screen specific miRNAs for body fluid identification. The recent introduction of massively parallel sequencing (MPS) has the potential for screening miRNA biomarkers at the genome-wide level, which allows both the detection of expression pattern and miRNA sequences. In this study, we employed the Ion Personal Genome Machine(®) System (Ion PGM™ System, Thermo Fisher) to characterize the distribution and expression of 2588 human mature miRNAs (miRBase v21) in 5 blood samples and 5 saliva samples. An average of 1,885,000 and 1,356,000 sequence reads were generated in blood and saliva respectively. Based on miRDong, a Perl-based tool developed for semi-automated miRNA distribution designations, and manually ascertained, 6 and 19 miRNAs were identified respectively as potentially blood and saliva-specific biomarkers. Herein, this study describes a complete and reliable miRNA workflow solution based on Ion PGM™ System, starting from efficient RNA extraction, followed by small RNA library construction and sequencing. With this workflow solution and miRDong analysis it will be possible to measure miRNA expression pattern at the genome-wide level in other forensically relevant body fluids.

  10. Bioinformatics approaches to cancer gene discovery.

    PubMed

    Narayanan, Ramaswamy

    2007-01-01

    The Cancer Gene Anatomy Project (CGAP) database of the National Cancer Institute has thousands of known and novel expressed sequence tags (ESTs). These ESTs, derived from diverse normal and tumor cDNA libraries, offer an attractive starting point for cancer gene discovery. Data-mining the CGAP database led to the identification of ESTs that were predicted to be specific to select solid tumors. Two genes from these efforts were taken to proof of concept for diagnostic and therapeutics indications of cancer. Microarray technology was used in conjunction with bioinformatics to understand the mechanism of one of the targets discovered. These efforts provide an example of gene discovery by using bioinformatics approaches. The strengths and weaknesses of this approach are discussed in this review.

  11. Machine learning: an indispensable tool in bioinformatics.

    PubMed

    Inza, Iñaki; Calvo, Borja; Armañanzas, Rubén; Bengoetxea, Endika; Larrañaga, Pedro; Lozano, José A

    2010-01-01

    The increase in the number and complexity of biological databases has raised the need for modern and powerful data analysis tools and techniques. In order to fulfill these requirements, the machine learning discipline has become an everyday tool in bio-laboratories. The use of machine learning techniques has been extended to a wide spectrum of bioinformatics applications. It is broadly used to investigate the underlying mechanisms and interactions between biological molecules in many diseases, and it is an essential tool in any biomarker discovery process. In this chapter, we provide a basic taxonomy of machine learning algorithms, and the characteristics of main data preprocessing, supervised classification, and clustering techniques are shown. Feature selection, classifier evaluation, and two supervised classification topics that have a deep impact on current bioinformatics are presented. We make the interested reader aware of a set of popular web resources, open source software tools, and benchmarking data repositories that are frequently used by the machine learning community. PMID:19957143

  12. A toolbox for developing bioinformatics software

    PubMed Central

    Potrzebowski, Wojciech; Puton, Tomasz; Rother, Magdalena; Wywial, Ewa; Bujnicki, Janusz M.

    2012-01-01

    Creating useful software is a major activity of many scientists, including bioinformaticians. Nevertheless, software development in an academic setting is often unsystematic, which can lead to problems associated with maintenance and long-term availibility. Unfortunately, well-documented software development methodology is difficult to adopt, and technical measures that directly improve bioinformatic programming have not been described comprehensively. We have examined 22 software projects and have identified a set of practices for software development in an academic environment. We found them useful to plan a project, support the involvement of experts (e.g. experimentalists), and to promote higher quality and maintainability of the resulting programs. This article describes 12 techniques that facilitate a quick start into software engineering. We describe 3 of the 22 projects in detail and give many examples to illustrate the usage of particular techniques. We expect this toolbox to be useful for many bioinformatics programming projects and to the training of scientific programmers. PMID:21803787

  13. Discovery and Classification of Bioinformatics Web Services

    SciTech Connect

    Rocco, D; Critchlow, T

    2002-09-02

    The transition of the World Wide Web from a paradigm of static Web pages to one of dynamic Web services provides new and exciting opportunities for bioinformatics with respect to data dissemination, transformation, and integration. However, the rapid growth of bioinformatics services, coupled with non-standardized interfaces, diminish the potential that these Web services offer. To face this challenge, we examine the notion of a Web service class that defines the functionality provided by a collection of interfaces. These descriptions are an integral part of a larger framework that can be used to discover, classify, and wrapWeb services automatically. We discuss how this framework can be used in the context of the proliferation of sites offering BLAST sequence alignment services for specialized data sets.

  14. Hydroxysteroid dehydrogenases (HSDs) in bacteria: a bioinformatic perspective.

    PubMed

    Kisiela, Michael; Skarka, Adam; Ebert, Bettina; Maser, Edmund

    2012-03-01

    Steroidal compounds including cholesterol, bile acids and steroid hormones play a central role in various physiological processes such as cell signaling, growth, reproduction, and energy homeostasis. Hydroxysteroid dehydrogenases (HSDs), which belong to the superfamily of short-chain dehydrogenases/reductases (SDR) or aldo-keto reductases (AKR), are important enzymes involved in the steroid hormone metabolism. HSDs function as an enzymatic switch that controls the access of receptor-active steroids to nuclear hormone receptors and thereby mediate a fine-tuning of the steroid response. The aim of this study was the identification of classified functional HSDs and the bioinformatic annotation of these proteins in all complete sequenced bacterial genomes followed by a phylogenetic analysis. For the bioinformatic annotation we constructed specific hidden Markov models in an iterative approach to provide a reliable identification for the specific catalytic groups of HSDs. Here, we show a detailed phylogenetic analysis of 3α-, 7α-, 12α-HSDs and two further functional related enzymes (3-ketosteroid-Δ(1)-dehydrogenase, 3-ketosteroid-Δ(4)(5α)-dehydrogenase) from the superfamily of SDRs. For some bacteria that have been previously reported to posses a specific HSD activity, we could annotate the corresponding HSD protein. The dominating phyla that were identified to express HSDs were that of Actinobacteria, Proteobacteria, and Firmicutes. Moreover, some evolutionarily more ancient microorganisms (e.g., Cyanobacteria and Euryachaeota) were found as well. A large number of HSD-expressing bacteria constitute the normal human gastro-intestinal flora. Another group of bacteria were originally isolated from natural habitats like seawater, soil, marine and permafrost sediments. These bacteria include polycyclic aromatic hydrocarbons-degrading species such as Pseudomonas, Burkholderia and Rhodococcus. In conclusion, HSDs are found in a wide variety of microorganisms including

  15. Bioinformatic Analysis of HIV-1 Entry and Pathogenesis

    PubMed Central

    Aiamkitsumrit, Benjamas; Dampier, Will; Antell, Gregory; Rivera, Nina; Martin-Garcia, Julio; Pirrone, Vanessa; Nonnemacher, Michael R.; Wigdahl, Brian

    2015-01-01

    The evolution of human immunodeficiency virus type 1 (HIV-1) with respect to co-receptor utilization has been shown to be relevant to HIV-1 pathogenesis and disease. The CCR5-utilizing (R5) virus has been shown to be important in the very early stages of transmission and highly prevalent during asymptomatic infection and chronic disease. In addition, the R5 virus has been proposed to be involved in neuroinvasion and central nervous system (CNS) disease. In contrast, the CXCR4-utilizing (X4) virus is more prevalent during the course of disease progression and concurrent with the loss of CD4+ T cells. The dual-tropic virus is able to utilize both co-receptors (CXCR4 and CCR5) and has been thought to represent an intermediate transitional virus that possesses properties of both X4 and R5 viruses that can be encountered at many stages of disease. The use of computational tools and bioinformatic approaches in the prediction of HIV-1 co-receptor usage has been growing in importance with respect to understanding HIV-1 pathogenesis and disease, developing diagnostic tools, and improving the efficacy of therapeutic strategies focused on blocking viral entry. Current strategies have enhanced the sensitivity, specificity, and reproducibility relative to the prediction of co-receptor use; however, these technologies need to be improved with respect to their efficient and accurate use across the HIV-1 subtypes. The most effective approach may center on the combined use of different algorithms involving sequences within and outside of the env-V3 loop. This review focuses on the HIV-1 entry process and on co-receptor utilization, including bioinformatic tools utilized in the prediction of co-receptor usage. It also provides novel preliminary analyses for enabling identification of linkages between amino acids in V3 with other components of the HIV-1 genome and demonstrates that these linkages are different between X4 and R5 viruses. PMID:24862329

  16. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    NASA Technical Reports Server (NTRS)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  17. An active registry for bioinformatics web services

    PubMed Central

    Pettifer, S.; Thorne, D.; McDermott, P.; Attwood, T.; Baran, J.; Bryne, J. C.; Hupponen, T.; Mowbray, D.; Vriend, G.

    2009-01-01

    Summary: The EMBRACE Registry is a web portal that collects and monitors web services according to test scripts provided by the their administrators. Users are able to search for, rank and annotate services, enabling them to select the most appropriate working service for inclusion in their bioinformatics analysis tasks. Availability and implementation: Web site implemented with PHP, Python, MySQL and Apache, with all major browsers supported. (www.embraceregistry.net) Contact: steve.pettifer@manchester.ac.uk PMID:19460889

  18. Broader incorporation of bioinformatics in education: opportunities and challenges.

    PubMed

    Cummings, Michael P; Temple, Glena G

    2010-11-01

    The major opportunities for broader incorporation of bioinformatics in education can be placed into three general categories: general applicability of bioinformatics in life science and related curricula; inherent fit of bioinformatics for promoting student learning in most biology programs; and the general experience and associated comfort students have with computers and technology. Conversely, the major challenges for broader incorporation of bioinformatics in education can be placed into three general categories: required infrastructure and logistics; instructor knowledge of bioinformatics and continuing education; and the breadth of bioinformatics, and the diversity of students and educational objectives. Broader incorporation of bioinformatics at all education levels requires overcoming the challenges to using transformative computer-requiring learning activities, assisting faculty in collecting assessment data on mastery of student learning outcomes, as well as creating more faculty development opportunities that span diverse skill levels, with an emphasis placed on providing resource materials that are kept up-to-date as the field and tools change.

  19. Chapter 16: text mining for translational bioinformatics.

    PubMed

    Cohen, K Bretonnel; Hunter, Lawrence E

    2013-04-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.

  20. A library-based bioinformatics services program*

    PubMed Central

    Yarfitz, Stuart; Ketchell, Debra S.

    2000-01-01

    Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise. PMID:10658962

  1. Adapting bioinformatics curricula for big data

    PubMed Central

    Greene, Anna C.; Giffin, Kristine A.; Greene, Casey S.

    2016-01-01

    Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs. PMID:25829469

  2. Bioinformatics on the Cloud Computing Platform Azure

    PubMed Central

    Shanahan, Hugh P.; Owen, Anne M.; Harrison, Andrew P.

    2014-01-01

    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. PMID:25050811

  3. Bioinformatic pipelines in Python with Leaf

    PubMed Central

    2013-01-01

    Background An incremental, loosely planned development approach is often used in bioinformatic studies when dealing with custom data analysis in a rapidly changing environment. Unfortunately, the lack of a rigorous software structuring can undermine the maintainability, communicability and replicability of the process. To ameliorate this problem we propose the Leaf system, the aim of which is to seamlessly introduce the pipeline formality on top of a dynamical development process with minimum overhead for the programmer, thus providing a simple layer of software structuring. Results Leaf includes a formal language for the definition of pipelines with code that can be transparently inserted into the user’s Python code. Its syntax is designed to visually highlight dependencies in the pipeline structure it defines. While encouraging the developer to think in terms of bioinformatic pipelines, Leaf supports a number of automated features including data and session persistence, consistency checks between steps of the analysis, processing optimization and publication of the analytic protocol in the form of a hypertext. Conclusions Leaf offers a powerful balance between plan-driven and change-driven development environments in the design, management and communication of bioinformatic pipelines. Its unique features make it a valuable alternative to other related tools. PMID:23786315

  4. Bringing Web 2.0 to bioinformatics.

    PubMed

    Zhang, Zhang; Cheung, Kei-Hoi; Townsend, Jeffrey P

    2009-01-01

    Enabling deft data integration from numerous, voluminous and heterogeneous data sources is a major bioinformatic challenge. Several approaches have been proposed to address this challenge, including data warehousing and federated databasing. Yet despite the rise of these approaches, integration of data from multiple sources remains problematic and toilsome. These two approaches follow a user-to-computer communication model for data exchange, and do not facilitate a broader concept of data sharing or collaboration among users. In this report, we discuss the potential of Web 2.0 technologies to transcend this model and enhance bioinformatics research. We propose a Web 2.0-based Scientific Social Community (SSC) model for the implementation of these technologies. By establishing a social, collective and collaborative platform for data creation, sharing and integration, we promote a web services-based pipeline featuring web services for computer-to-computer data exchange as users add value. This pipeline aims to simplify data integration and creation, to realize automatic analysis, and to facilitate reuse and sharing of data. SSC can foster collaboration and harness collective intelligence to create and discover new knowledge. In addition to its research potential, we also describe its potential role as an e-learning platform in education. We discuss lessons from information technology, predict the next generation of Web (Web 3.0), and describe its potential impact on the future of bioinformatics studies.

  5. Chapter 16: Text Mining for Translational Bioinformatics

    PubMed Central

    Cohen, K. Bretonnel; Hunter, Lawrence E.

    2013-01-01

    Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research—translating basic science results into new interventions—and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing. PMID:23633944

  6. Application of Bioinformatics in Chronobiology Research

    PubMed Central

    Lopes, Robson da Silva; Resende, Nathalia Maria; Honorio-França, Adenilda Cristina; França, Eduardo Luzía

    2013-01-01

    Bioinformatics and other well-established sciences, such as molecular biology, genetics, and biochemistry, provide a scientific approach for the analysis of data generated through “omics” projects that may be used in studies of chronobiology. The results of studies that apply these techniques demonstrate how they significantly aided the understanding of chronobiology. However, bioinformatics tools alone cannot eliminate the need for an understanding of the field of research or the data to be considered, nor can such tools replace analysts and researchers. It is often necessary to conduct an evaluation of the results of a data mining effort to determine the degree of reliability. To this end, familiarity with the field of investigation is necessary. It is evident that the knowledge that has been accumulated through chronobiology and the use of tools derived from bioinformatics has contributed to the recognition and understanding of the patterns and biological rhythms found in living organisms. The current work aims to develop new and important applications in the near future through chronobiology research. PMID:24187519

  7. CFGP 2.0: a versatile web-based platform for supporting comparative and evolutionary genomics of fungi and Oomycetes.

    PubMed

    Choi, Jaeyoung; Cheong, Kyeongchae; Jung, Kyongyong; Jeon, Jongbum; Lee, Gir-Won; Kang, Seogchan; Kim, Sangsoo; Lee, Yin-Won; Lee, Yong-Hwan

    2013-01-01

    In 2007, Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr/) was publicly open with 65 genomes corresponding to 58 fungal and Oomycete species. The CFGP provided six bioinformatics tools, including a novel tool entitled BLASTMatrix that enables search homologous genes to queries in multiple species simultaneously. CFGP also introduced Favorite, a personalized virtual space for data storage and analysis with these six tools. Since 2007, CFGP has grown to archive 283 genomes corresponding to 152 fungal and Oomycete species as well as 201 genomes that correspond to seven bacteria, 39 plants and 105 animals. In addition, the number of tools in Favorite increased to 27. The Taxonomy Browser of CFGP 2.0 allows users to interactively navigate through a large number of genomes according to their taxonomic positions. The user interface of BLASTMatrix was also improved to facilitate subsequent analyses of retrieved data. A newly developed genome browser, Seoul National University Genome Browser (SNUGB), was integrated into CFGP 2.0 to support graphical presentation of diverse genomic contexts. Based on the standardized genome warehouse of CFGP 2.0, several systematic platforms designed to support studies on selected gene families have been developed. Most of them are connected through Favorite to allow of sharing data across the platforms.

  8. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  9. Quantum Bio-Informatics II From Quantum Information to Bio-Informatics

    NASA Astrophysics Data System (ADS)

    Accardi, L.; Freudenberg, Wolfgang; Ohya, Masanori

    2009-02-01

    / H. Kamimura -- Massive collection of full-length complementary DNA clones and microarray analyses: keys to rice transcriptome analysis / S. Kikuchi -- Changes of influenza A(H5) viruses by means of entropic chaos degree / K. Sato and M. Ohya -- Basics of genome sequence analysis in bioinformatics - its fundamental ideas and problems / T. Suzuki and S. Miyazaki -- A basic introduction to gene expression studies using microarray expression data analysis / D. Wanke and J. Kilian -- Integrating biological perspectives: a quantum leap for microarray expression analysis / D. Wanke ... [et al.].

  10. Managing Large-Scale Genomic Datasets and Translation into Clinical Practice

    PubMed Central

    2014-01-01

    Summary Objective To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain. Method We provide a synopsis of the articles selected for the IMIA Yearbook 2014, from which we attempt to derive a synthetic overview of current and future activities in the field. A first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor evaluated independently the set of 1,851 articles and 15 articles were retained for peer-review. Results The selection and evaluation process of this Yearbook’s section on Bioinformatics and Translational Informatics yielded three excellent articles regarding data management and genome medicine. In the first article, the authors present VEST (Variant Effect Scoring Tool) which is a supervised machine learning tool for prioritizing variants found in exome sequencing projects that are more likely involved in human Mendelian diseases. In the second article, the authors show how to infer surnames of male individuals by crossing anonymous publicly available genomic data from the Y chromosome and public genealogy data banks. The third article presents a statistical framework called iCluster+ that can perform pattern discovery in integrated cancer genomic data. This framework was able to determine different tumor subtypes in colon cancer. Conclusions The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on large-scale biological, genomic, and Electronic Health Records data. Indeed, there is a need for powerful tools for managing and interpreting complex data, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts are contributing to the challenge of impacting clinically the results and even going towards a

  11. Hidden weapons of microbial destruction in plant genomes

    PubMed Central

    Manners, John M

    2007-01-01

    Recent bioinformatic analyses of sequenced plant genomes reveal a previously unrecognized abundance of genes encoding antimicrobial cysteine-rich peptides, representing a formidable and dynamic defense arsenal against plant pests and pathogens. PMID:17903311

  12. Bioinformatics of Cancer ncRNA in High Throughput Sequencing: Present State and Challenges

    PubMed Central

    Jorge, Natasha Andressa Nogueira; Ferreira, Carlos Gil; Passetti, Fabio

    2012-01-01

    The numerous genome sequencing projects produced unprecedented amount of data providing significant information to the discovery of novel non-coding RNA (ncRNA). Several ncRNAs have been described to control gene expression and display important role during cell differentiation and homeostasis. In the last decade, high throughput methods in conjunction with approaches in bioinformatics have been used to identify, classify, and evaluate the expression of hundreds of ncRNA in normal and pathological states, such as cancer. Patient outcomes have been already associated with differential expression of ncRNAs in normal and tumoral tissues, providing new insights in the development of innovative therapeutic strategies in oncology. In this review, we present and discuss bioinformatics advances in the development of computational approaches to analyze and discover ncRNA data in oncology using high throughput sequencing technologies. PMID:23251139

  13. Role of remote sensing, geographical information system (GIS) and bioinformatics in kala-azar epidemiology.

    PubMed

    Bhunia, Gouri Sankar; Dikhit, Manas Ranjan; Kesari, Shreekant; Sahoo, Ganesh Chandra; Das, Pradeep

    2011-11-01

    Visceral leishmaniasis or kala-azar is a potent parasitic infection causing death of thousands of people each year. Medicinal compounds currently available for the treatment of kala-azar have serious side effects and decreased efficacy owing to the emergence of resistant strains. The type of immune reaction is also to be considered in patients infected with Leishmania donovani (L. donovani). For complete eradication of this disease, a high level modern research is currently being applied both at the molecular level as well as at the field level. The computational approaches like remote sensing, geographical information system (GIS) and bioinformatics are the key resources for the detection and distribution of vectors, patterns, ecological and environmental factors and genomic and proteomic analysis. Novel approaches like GIS and bioinformatics have been more appropriately utilized in determining the cause of visearal leishmaniasis and in designing strategies for preventing the disease from spreading from one region to another.

  14. Role of remote sensing, geographical information system (GIS) and bioinformatics in kala-azar epidemiology

    PubMed Central

    Bhunia, Gouri Sankar; Dikhit, Manas Ranjan; Kesari, Shreekant; Sahoo, Ganesh Chandra; Das, Pradeep

    2011-01-01

    Visceral leishmaniasis or kala-azar is a potent parasitic infection causing death of thousands of people each year. Medicinal compounds currently available for the treatment of kala-azar have serious side effects and decreased efficacy owing to the emergence of resistant strains. The type of immune reaction is also to be considered in patients infected with Leishmania donovani (L. donovani). For complete eradication of this disease, a high level modern research is currently being applied both at the molecular level as well as at the field level. The computational approaches like remote sensing, geographical information system (GIS) and bioinformatics are the key resources for the detection and distribution of vectors, patterns, ecological and environmental factors and genomic and proteomic analysis. Novel approaches like GIS and bioinformatics have been more appropriately utilized in determining the cause of visearal leishmaniasis and in designing strategies for preventing the disease from spreading from one region to another. PMID:23554714

  15. cl-dash: rapid configuration and deployment of Hadoop clusters for bioinformatics research in the cloud

    PubMed Central

    Hodor, Paul; Chawla, Amandeep; Clark, Andrew; Neal, Lauren

    2016-01-01

    Summary: One of the solutions proposed for addressing the challenge of the overwhelming abundance of genomic sequence and other biological data is the use of the Hadoop computing framework. Appropriate tools are needed to set up computational environments that facilitate research of novel bioinformatics methodology using Hadoop. Here, we present cl-dash, a complete starter kit for setting up such an environment. Configuring and deploying new Hadoop clusters can be done in minutes. Use of Amazon Web Services ensures no initial investment and minimal operation costs. Two sample bioinformatics applications help the researcher understand and learn the principles of implementing an algorithm using the MapReduce programming pattern. Availability and implementation: Source code is available at https://bitbucket.org/booz-allen-sci-comp-team/cl-dash.git. Contact: hodor_paul@bah.com PMID:26428290

  16. Advances in omics and bioinformatics tools for systems analyses of plant functions.

    PubMed

    Mochida, Keiichi; Shinozaki, Kazuo

    2011-12-01

    Omics and bioinformatics are essential to understanding the molecular systems that underlie various plant functions. Recent game-changing sequencing technologies have revitalized sequencing approaches in genomics and have produced opportunities for various emerging analytical applications. Driven by technological advances, several new omics layers such as the interactome, epigenome and hormonome have emerged. Furthermore, in several plant species, the development of omics resources has progressed to address particular biological properties of individual species. Integration of knowledge from omics-based research is an emerging issue as researchers seek to identify significance, gain biological insights and promote translational research. From these perspectives, we provide this review of the emerging aspects of plant systems research based on omics and bioinformatics analyses together with their associated resources and technological advances.

  17. CucCAP - Developing genomic resources for the cucurbit community

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The U.S. cucurbit community has initiated a USDA-SCRI funded cucurbit genomics project, CucCAP: Leveraging applied genomics to increase disease resistance in cucurbit crops. Our primary objectives are: develop genomic and bioinformatic breeding tool kits for accelerated crop improvement across the...

  18. A web services choreography scenario for interoperating bioinformatics applications

    PubMed Central

    de Knikker, Remko; Guo, Youjun; Li, Jin-long; Kwan, Albert KH; Yip, Kevin Y; Cheung, David W; Cheung, Kei-Hoi

    2004-01-01

    Background Very often genome-wide data analysis requires the interoperation of multiple databases and analytic tools. A large number of genome databases and bioinformatics applications are available through the web, but it is difficult to automate interoperation because: 1) the platforms on which the applications run are heterogeneous, 2) their web interface is not machine-friendly, 3) they use a non-standard format for data input and output, 4) they do not exploit standards to define application interface and message exchange, and 5) existing protocols for remote messaging are often not firewall-friendly. To overcome these issues, web services have emerged as a standard XML-based model for message exchange between heterogeneous applications. Web services engines have been developed to manage the configuration and execution of a web services workflow. Results To demonstrate the benefit of using web services over traditional web interfaces, we compare the two implementations of HAPI, a gene expression analysis utility developed by the University of California San Diego (UCSD) that allows visual characterization of groups or clusters of genes based on the biomedical literature. This utility takes a set of microarray spot IDs as input and outputs a hierarchy of MeSH Keywords that correlates to the input and is grouped by Medical Subject Heading (MeSH) category. While the HTML output is easy for humans to visualize, it is difficult for computer applications to interpret semantically. To facilitate the capability of machine processing, we have created a workflow of three web services that replic